Query format in the SQL API

SQL API runs queries in the Postgres dialect that can reference those tables and columns.

By default since 1.0, the SQL API executes regular queries, queries with post-processing and queries with pushdown. This page explains their format and details if they are handled differently by the SQL API.

Query pushdown in the SQL API is available in public preview. Read more (opens in a new tab) in the blog.

Data model mapping

In the SQL API, each cube or view from the data model is represented as a table. Measures, dimensions, and segments are represented as columns in these tables.

Cubes and views

Given that you have a cube or a view called orders, you can query it as if it's a table:

SELECT * FROM orders;

Dimensions

Given that your cube or view has a dimension called status, you can reference it as a column in the SELECT clause. Note that you'll also have to add it to the GROUP BY clause:

SELECT status
FROM orders
GROUP BY 1;

Measures

Given that your cube or view has a measure called count, you can reference it by wrapping with the MEASURE aggregate function:

SELECT MEASURE(count)
FROM orders;

The SQL API allows aggregate functions on measures as long as they match measure types.

Aggregate functions

The special MEASURE function works with measures of any type. Measure columns can also be aggregated with the following aggregate functions that correspond to measure types:

Measure type in Cube	Aggregate function in an aggregated query
`avg`	`MEASURE` or `AVG`
`boolean`	`MEASURE`
`count`	`MEASURE` or `COUNT`
`count_distinct`	`MEASURE` or `COUNT(DISTINCT …)`
`count_distinct_approx`	`MEASURE` or `COUNT(DISTINCT …)`
`max`	`MEASURE` or `MAX`
`min`	`MEASURE` or `MIN`
`number`	`MEASURE` or any other function from this table
`string`	`MEASURE` or `STRING_AGG`
`sum`	`MEASURE` or `SUM`
`time`	`MEASURE` or `MAX` or `MIN`

If an aggregate function doesn't match the measure type, the following error will be thrown: Measure aggregation type doesn't match.

Segments

Segments are exposed as columns of the boolean type (opens in a new tab). Given that your cube or view has a segment called is_completed, you can reference it as a column in the WHERE clause:

SELECT *
FROM orders
WHERE is_completed IS TRUE;

Joins

Please refer to this page for details on joins.

Post-processing and pushdown

Since 1.0 by default, the SQL API executes regular queries, queries with post-processing, and queries with pushdown.

Query post-processing

The following query is performing a SELECT from the orders cube:

SELECT
  city,
  SUM(amount)
FROM orders
WHERE status = 'shipped'
GROUP BY 1

For this query, the SQL API would transform SELECT query fragments into a regular query. It can be represented as follows in the REST API query format:

{
  "dimensions": [
    "orders.city"
  ],
  "measures": [
    "orders.amount"
  ],
  "filters": [
    {
      "member": "orders.status",
      "operator": "equals",
      "values": [
        "shipped"
      ]
    }
  ]
}

Because of this transformation, not all functions and expressions are supported in query fragments performing SELECT from cube tables. Please refer to the SQL API reference to see whether a specific expression or function is supported and whether it can be used in selection (opens in a new tab) (e.g., WHERE) or projection (opens in a new tab) (e.g., SELECT) parts of SQL queries.

For example, the following query won't work because the SQL API can't push down the CASE expression to Cube for processing. It is not possible to translate CASE expressions in measures.

SELECT
  city,
  MAX(CASE
    WHEN status = 'shipped'
    THEN '2-done'
    ELSE '1-in-progress'
  END) AS real_status,
  SUM(number)
FROM orders
CROSS JOIN users
GROUP BY 1;

You can leverage nested queries in cases like this. You can wrap your SELECT statement from a cube table (inner query) into another SELECT statement (outer query) to perform calculations with expressions like CASE. This outer SELECT is not part of the SQL query that being rewritten and thus allows you to use more SQL functions, operators and expressions.

You can rewrite the above query as follows, making sure to wrap the original SELECT statement:

SELECT
  city,
  MAX(CASE
    WHEN status = 'shipped' THEN '2-done'
    ELSE '1-in-progress'
  END) real_status,
  SUM(amount) AS total
FROM (
  SELECT
    users.city AS city,
    SUM(number) AS amount,
    orders.status
  FROM orders
  CROSS JOIN users
  GROUP BY 1, 3
) AS inner
GROUP BY 1, 2
ORDER BY 1;
--- You can also use CTEs to achieve the same result

The above query works because the CASE expression is supported in SELECT queries not querying cube tables.

Query pushdown

⚠️🐣 Preview

Query pushdown is currently in public preview, and the API and behavior may change in future versions.

Query pushdown is enabled by default since 1.0 and is controlled by CUBESQL_SQL_PUSH_DOWN environment variable.

Query pushdown currently has the following limitations:

No support for joins between cubes (opens in a new tab).
No support for custom aggregations in number measures (opens in a new tab).

Query pushdown provides a safe net for queries that can't be rewritten into combination of a regular query and post-processing. Such queries' SQL would be transpiled to target database query leveraging all target database capabilities for data processing.

During the rewrite process, Cube validates that the target database would support transpired SQL queries. If direct conversion is not possible, different SQL transformation rewrite rules can be applied to achieve successful translation. Please refer to the SQL API reference for the list of supported SQL functions and clauses. Support varies based on the target database.

Top-down and bottom-up evaluation

Fundamentally, every SQL operation results in a tabular data set. This is usually referred to as SQL operational closure or bottom-up SQL evaluation. However, for OLAP queries, most of the time, top-down evaluation is required. Top-down evaluation is whenever the outermost sub-query operation decides on how measures would be actually evaluated as opposed to innermost sub-query in case of standard SQL behavior.

To balance between SQL guarantees and OLAP requirements, Cube

uses top-down evaluation from the innermost aggregation operation down to all ungrouped sub-queries,
uses bottom-up evaluation from the innermost aggregation tabular result set up to the outermost sub-query.

This behavior is enabled whenever query pushdown is enabled. It's enabled by default since 1.0.

To drill-down on how this works, let's consider following example date model

cubes:
  - name: orders
    sql_table: ECOM.ORDERS
 
    dimensions:
      - name: id
        sql: ID
        type: number
        primary_key: true
 
      - name: status
        sql: STATUS
        type: string
        description: The status of the order (completed etc)
 
      - name: created_at
        sql: "{CUBE}.CREATED_AT"
        type: time
 
    measures:
      - name: count
        type: count
 
      - name: completed_count
        type: count
        filters:
          - sql: "{CUBE}.STATUS = 'completed'"
 
      - name: completed_percentage
        type: number
        sql: "(100.0 * {CUBE.completed_count} / NULLIF({CUBE.count}, 0))"
        format: percent

And the query to the SQL API:

SELECT id, status, created_at, completed_percentage FROM orders

Such a query is considered an ungrouped query and would result in the following result set:

id	status	created_at	completed_percentage
1	shipped	2024-01-01	0.0
2	completed	2024-01-01	100.0
3	completed	2024-01-02	100.0

On the other hand, a typical query that various BI tools generate:

SELECT date_trunc('day', created_at), MEASURE(completed_percentage)
FROM (
  SELECT id, status, created_at, completed_percentage FROM orders
) inner_query
GROUP BY 1

...would still yield correct results

date_trunc('day', created_at)	completed_percentage
2024-01-01	50.0
2024-01-02	100.0

For this particular query, inner_query won't be evaluated as a table. Instead, Cube would postpone its execution until wrapping GROUP BY and would use only date_trunc('day', created_at) as a dimension to evaluate completed_percentage measure instead of full set of inner_query columns id,status and created_at. To make it possible, Cube keeps track of ungrouped queries and evaluates them only on the first occurrence of a GROUP BY query in case there's one.

Aggregated and non-aggregated queries

SQL API supports two types of queries against cube tables: aggregated (those with GROUP BY statement) and non-aggregated (those without).

Without query pushdown, queries that Cube runs against your database will always be aggregated, regardless of whether you use aggregated (with GROUP BY) or non-aggregated queries with the SQL API. Whenever you enable query pushdown, queries which do not contain GROUP BY clause will be executed as ungrouped queries.

A non-aggregated query would only include bare column names in SQL:

SELECT
  status,  -- dimension
  count    -- measure
FROM orders

With query pushdown disabled, Cube will still use GROUP BY to execute such a query.

Automatic use of GROUP BY is disabled by default.

Whenever query pushdown is enabled, such query would run as ungrouped query. As with REST API such queries do not use GROUP BY and render measures as if those would be grouped by primary key of a cube.

If query pushdown is enabled, calculated number, string or time measures queried by SQL API can't use aggregation function definitions with it's sql paremeter. Such measures can still reference other aggregate type measures though.

Aggregated query must aggregate all measure columns and group by all dimension columns. You can use the special MEASURE aggregate function for measures of any type. This is quite convenient, especially in case you're manually writing ad-hoc queries:

SELECT
  status,         -- dimension
  MEASURE(count)  -- measure
FROM orders
GROUP BY 1

If any measure columns are not aggregated or any dimension columns aren't included in GROUP BY, the following error will be thrown: Projection references non-aggregate values. This is a standard SQL consistency check for the GROUP BY operation, and it's enforced by the SQL API as well.

Filtering

Without query pushdown, Cube supports most simple equality operators like =, <>, <, <=, >, >= as well as IN and LIKE operators. Cube tries to push down all filters into a regular query. In some cases, filtering can only be done during post-processing. Time dimension filters will be converted to time dimension date ranges whenever it's possible.

Ordering

Without query pushdown, Cube tries to push down all ORDER BY statements into a regular query.

Row limit edge case

If the ORDER BY statement can't be pushed down, ordering would be performed during post-processing. If there are more than 50,000 rows in the result set, this can yield incorrect results.

However, given that queries to Cube are usually aggregated, this is a very rare case; anyway, please keep this limitation in mind when designing your queries.

Consider the following query. Because of the SUM(total_value) + 2 expression in the projection of the outer query, thr SQL API can't push down ORDER BY:

SELECT
  status,
  SUM(total_value) + 2 AS transformed_amount
FROM (
  SELECT * FROM orders
) AS orders
GROUP BY status
ORDER BY status DESC
LIMIT 100;

You can use EXPLAIN against the above query to look at the query plan. As you can see, the sorting operation is done after the regular query and the projection:

+ GlobalLimitExec: skip=None, fetch=100
+- SortExec: [transformed_amount@1 DESC]
+-- ProjectionExec: expr=[status@0 as status, SUM(orders.total_value)@1 + CAST(2 AS Float64) as transformed_amount]
+--- CubeScanExecutionPlan

SQL API Joins