Documentation
SQL API

SQL API

SQL API enables Cube to deliver data over the Postgres-compatible protocol (opens in a new tab) to certain kinds of data applications, including but not limited to the following ones:

Often, the SQL API is used to enable internal or self-serve business intelligence (opens in a new tab) use cases.

SQL API has been tested with the following tools:

  • psql CLI
  • Apache Superset
  • Tableau Cloud
  • Tableau Desktop with JDBC driver
  • Power BI
  • Metabase
  • Google Looker Studio
  • Excel through Devart plugin
  • Deepnote
  • Hex
  • Observable
  • Streamlit
  • Jupyter notebook
  • Hightouch

Please use this GitHub issue (opens in a new tab) to suggest tools of your interest and vote for already proposed ones.

See SQL API reference for the list of supported commands, functions, and operators.

Configuration

Cube Core

SQL API is disabled by default. To enable the SQL API, set CUBEJS_PG_SQL_PORT to a port number you'd like to connect to with a Postgres-compatible tool.

CredentialEnvironment variable, etc.
HostHost you're running Cube at
PortSet via CUBEJS_PG_SQL_PORT
User nameSet via CUBEJS_SQL_USER
PasswordSet via CUBEJS_SQL_PASSWORD
DatabaseAny valid string, e.g., cube

You can also use checkSqlAuth, canSwitchSqlUser, and CUBEJS_SQL_SUPER_USER to configure custom authentication.

Cube Cloud

SQL API is enabled by default. You can find credentials for the SQL API on the Overview page by clicking Connect to SQL API.

You can also customize CUBEJS_PG_SQL_PORT, CUBEJS_SQL_USER, and CUBEJS_SQL_PASSWORD environment variables by navigating to Settings → Configration.

Querying Fundamentals

Under the hood, SQL API uses Apache Datafusion (opens in a new tab) as its SQL execution engine. It's responsible for query planning and execution. As the conversion process from SQL to Cube Query can be ambiguous, an additional step of query rewriting is done before the query is executed. During this step, the query plan is rewritten such that the maximum number of Cube Queries can be detected within the given query plan. Overall, rewriting is a seamless process. There are some practical considerations that you should keep in mind while querying, though.

In the SQL API, each cube is represented as a table. Measures, dimensions, and segments in this table are columns. We call these tables cube tables.

Consider the orders cube in your data model, the following query is performing a SELECT from the orders cube.

SELECt
  city,
  SUM(amount)
FROM orders
WHERE status = 'shipped'
GROUP BY 1

The SQL API transforms SELECT query fragments from cube tables into Cube's internal query format. This process is called Cube query rewrite.

The SQL query above would be rewritten into the following Cube query:

{
  "measures": ["Orders.amount"],
  "dimensions": ["Orders.city"],
  "filters": [
    {
      "member": "Orders.status",
      "operator": "equals",
      "values": ["shipped"]
    }
  ]
}

Because of this transformation, not all functions and expressions are supported in query fragments performing SELECT from cube tables. Please refer to the reference to see whether a specific expression or function is supported. For example, the following query won't work because the SQL API can't push down the case expression to Cube for processing. It is not possible to translate case expression in measure.

-- This query won't work because of the Cube query rewrite
SELECT
  city,
  CASE
    WHEN status = 'shipped' THEN 'done'
    ELSE 'in-progress'
  END real_status,
  SUM(number)
FROM orders
  CROSS JOIN Users
GROUP BY 1;

You can leverage nested queries in cases like this. You can wrap your SELECT statement from a cube table into another SELECT statement to perform calculations with expressions like CASE. This outer select is not part of the SQL query that being rewritten and thus allows you to use more SQL functions, operators and expressions. You can rewrite the above query as follows, making sure to wrap the original SELECT statement:

--- You can also use CTEs to achieve the same result
SELECT
  city,
  CASE
    WHEN status = 'shipped' THEN 'done'
    ELSE 'in-progress'
  END real_status,
  SUM(amount) AS total
FROM (
  SELECT
    Users.city AS city,
    SUM(number) AS amount,
    orders.status
  FROM orders
    CROSS JOIN Users
  GROUP BY 1, 3
) AS inner
GROUP BY 1, 2
ORDER BY 1;

The above query works because the CASE expression is supported in SELECT queries not querying cube tables.

When querying cube tables, it is important to understand fundamentals of Cube query rewrite as well as the pushdown process. Pushdown is a process of pushing the processing of a particular part of the query down to the inner SELECT from the cube table. The following sections provide an overview of Cube query rewrite and pushdown. Please refer to the reference to see whether specific functions, operators or expressions are supported in Cube query rewrite.

Aggregated vs Non-aggregated queries

There are two types of queries supported against cube tables: aggregated and non-aggregated. Aggregated are those with GROUP BY statement, and non-aggregated are those without. Cube queries issued to your database will always be aggregated, and it doesn't matter if you provide GROUP BY in a query or not.

Whenever you use a non-aggregated query you need to provide only column names in SQL:

SELECT status, count FROM orders

The same aggregated query should always aggregate measure columns using a corresponding aggregating function or special MEASURE() function:

In cases where measure columns are not aggregated or dimensions columns aren't included in GROUP BY, Projection references non-aggregate values error will be thrown. This error means there are columns that are neither in GROUP BY nor aggregated. This is a standard SQL GROUP BY operation consistency check enforced by SQL API as well. This error can be also thrown if you forget to GROUP BY all dimensions included in the query as well.

SELECT status, SUM(count) FROM orders GROUP BY 1
SELECT status, MEASURE(count) FROM orders GROUP BY 1

Filtering

Cube supports most simple equality operators like =, <>, <, <=, >, >= as well as IN and LIKE operators. Cube tries to push down all filters into Cube query. In some cases, SQL filters aren't available in Cube and can be done in a post-processing phase. Time dimension filters will be converted to time dimension date ranges whenever it's possible.

Ordering

Cube tries to push down all ORDER BY statements into Cube Query.

If it can't be done ordering part would be done in a post-processing phase. In case there are more than 50,000 rows in the result set, incorrect results can be received in this case. Please use EXPLAIN in order to check if it's the case.

Consider the following query.

SELECT
  status,
  SUM(total_value) + 2 as transformed_amount
FROM (
  SELECT * FROM orders
) orders
GROUP BY status
ORDER BY status DESC
LIMIT 100

Because of the expression SUM(total_value) + 2 in the projection of outer query, Cube can't push down ORDER.

You can run EXPLAIN against the above query to look at the plan. As you can see below, the sorting operation is done after Cube query and projection.

+ GlobalLimitExec: skip=None, fetch=100
+- SortExec: [transformed_amount@1 DESC]
+-- ProjectionExec: expr=[status@0 as status, SUM(orders.total_value)@1 + CAST(2 AS Float64) as transformed_amount]
+--- CubeScanExecutionPlan

Because of the default limit in Cube queries (50,000 rows), there is a possibility of a wrong result if there are more than 50,000 rows. Given that queries to Cube are usually aggregated, it is rare that they may return more than 50,000 rows, but keep that limitation in mind when designing your queries.

Limit

Limit push down is supported by Cube however, a limit over 50,000 can't be overridden. In future versions, paging and streaming would be used to avoid this limitation.

Examples

Consider the following data model:

YAML
JavaScript
cubes:
  - name: orders
    sql_table: orders
 
    measures:
      - name: count
        type: count
 
    dimensions:
      - name: status
        type: string
        sql: status
 
      - name: created_at
        type: time
        sql: created_at

It would be represented as table in SQL API with count, status, created columns.

To get the count of orders grouped by status we can run the following query.

cube=> SELECT count, status FROM orders;
 count |   status
-------+------------
 15513 | completed
 14652 | processing
 13829 | shipped
(3 rows)

Cube will automatically apply the GROUP BY clause in case it is missing in the query. We can also provide the GROUP BY statement to control how results are grouped. In the following example, we group orders by created month and also by status within every month.

cube=> SELECT MEASURE(count), status, DATE_TRUNC('month', created_at) date FROM orders GROUP BY date, status ORDER BY date asc;
 measure(orders.count) |   status   |            date
-----------------------+------------+----------------------------
                    31 | shipped    | 2016-01-01 00:00:00.000000
                    28 | completed  | 2016-01-01 00:00:00.000000
                    28 | processing | 2016-01-01 00:00:00.000000
                    28 | shipped    | 2016-02-01 00:00:00.000000
                    18 | processing | 2016-02-01 00:00:00.000000
                    28 | completed  | 2016-02-01 00:00:00.000000
                    54 | processing | 2016-03-01 00:00:00.000000
                    57 | completed  | 2016-03-01 00:00:00.000000
                    56 | shipped    | 2016-03-01 00:00:00.000000
                    54 | shipped    | 2016-04-01 00:00:00.000000
                    60 | completed  | 2016-04-01 00:00:00.000000
                    43 | processing | 2016-04-01 00:00:00.000000
                    55 | shipped    | 2016-05-01 00:00:00.000000

Querying Dimensions

Querying dimensions is straightforward, simply add any required fields to the SELECT clause.

cube=> SELECT status FROM orders;
   status
------------
 completed
 processing
 shipped
(3 rows)

Querying Measures

Measures can similarly be queried through Cube SQL.

Because measures are already aggregated in Cube there is no need to apply aggregate functions to them in SQL API if you don't have a GROUP BY statement in query.

cube=> SELECT count FROM orders;
 count
-------
 43994
(1 row)

Some BI systems or SQL constraints may require you to apply aggregate functions. To support this, Cube allows aggregate functions on measures as long as they match the type of the measure.

count measure in our example is of type count, It means we can apply COUNT() aggregate function to it. The below query is similar to the above one.

cube=> SELECT COUNT(count) FROM orders;
 COUNT(orders.count)
---------------------
               43994
(1 row)

There's also universal aggregate function MEASURE() that matches any measure type.

cube=> SELECT MEASURE(count) FROM orders;
 measure(orders.count)
-----------------------
                 43994
(1 row)

Let's look at more measures types:

YAML
JavaScript
cubes:
  - name: orders
    # ...
 
    measures:
      - name: count
        type: count
 
      - name: distinct_count
        type: count_distinct
        sql: id
 
      - name: approx_distinct_count
        type: count_distinct_approx
        sql: id
 
      - name: min_value
        type: min
        sql: min_value
 
      - name: max_value
        type: max
        sql: max_value

As we can see, we have a mix of measure types in the above data model. To query them, we could use the following SQL statements:

--- Both the following statements are equivalent
SELECT count FROM orders
SELECT COUNT(*) FROM orders
 
--- Count distinct, and count distinct approx
--- Both the following statements are equivalent
SELECT distinct_count FROM orders
SELECT COUNT(DISTINCT distinct_count) FROM orders
 
--- Both the following statements are equivalent
SELECT approx_distinct_count FROM orders
SELECT COUNT(DISTINCT approx_distinct_count) FROM orders
 
--- Both the following statements are equivalent
SELECT min_value FROM orders
SELECT MIN(min_value) FROM orders
 
--- Both the following statements are equivalent
SELECT max_value FROM orders
SELECT MAX(max_value) FROM orders

Querying Segments

Any segments defined in a data model can also be used in Cube SQL queries. Looking at the data model below, we have one segment is_completed:

YAML
JavaScript
cubes:
  - name: orders
    # ...
    segments:
      - name: is_completed
        sql: status = 'completed'

Segments must be used as boolean types in Cube SQL queries:

WHERE is_completed = true