Documentation
Queries

Querying data APIs

After creating a data model, you would like to ask questions to it, i.e., run queries against this data model. This page describes the common concepts of querying Cube through its data APIs.

All queries share the same query defaults and each query fall into one of query types.

Query defaults

The following defaults apply to all queries run by Cube.

Order

By default, Cube will apply ordering to any query that does not set it explicitly. The default heuristics are to sort the result set by a time dimension, if any; otherwise, sort it by the first measure, if any; otherwise, sort it by the first dimension, if any. This will provide a result set that works great for most data visualization purposes.

SQL API queries do not get a default ordering if they are ungrouped. You can opt out from default ordering for all SQL API queries by setting CUBESQL_SQL_NO_IMPLICIT_ORDER to true.

For REST API and GraphQL API queries, you can opt out from default ordering by passing an empty array ([]) object as order.

Row limit

By default, any query that does not explicitly specify the row limit, will return no more than 10,000 rows in the result set. You can use the CUBEJS_DB_QUERY_DEFAULT_LIMIT environment variable to override it. This serves as a safeguard against data scraping and denial-of-service (DoS) attacks if Cube is exposed to untrusted environments.

The maximum allowed limit is 50,000 rows. You can use the CUBEJS_DB_QUERY_LIMIT environment variable to override it. You can also implement pagination to fetch more rows than the maximum limit. Finally, if you're using the SQL API, you can enable the streaming mode to ignore the maximum row limit.

Time zone

All time-based calculations performed by Cube are time zone-aware.

By default, Cube assumes that time values in your queries (e.g., in date range filters) are in the UTC time zone (opens in a new tab). Similarly, it will use the same time zone for time dimension values in result sets.

You can use the timezone option with REST API or GraphQL API to specify the time zone for a query. Also, you can use the SQL_UTILS context variable to apply the time zone conversion to dimensions that are not used as time dimensions in a query.

Additionally, note that time zones have impact on pre-aggregation matching.

Query types

Most commonly, you will run regular queries. See the table and the sections below for details on each query type.

Query typeSupported by APIsSupported in Playground
Regular querySQL API, REST API, GraphQL API✅ Yes
Query with post-processingSQL API only❌ No
Query with pushdownSQL API only❌ No
Compare date range queryREST API only❌ No
Total queryREST API only❌ No
Ungrouped querySQL API, REST API, GraphQL API❌ No

Regular query

This is the most common type of queries. Regular queries include:

  • Lists of dimensions and measures that you'd like to see in the result set.
  • Optionally, filters to apply before returning the result set.
  • Optionally, a row limit and an offset for the result set.

For regular queries, Cube generates the SQL for the upstream data sources that always includes all dimensions in the GROUP BY statement. See ungrouped queries if you'd like to override this behavior.

Regular queries immensely benefit from in-memory cache and pre-aggregations. They can also be modified before execution with query_rewrite.

Example

See an example of a regular query using the SQL API syntax:

SELECT
  users.state,
  users.city,
  orders.status,
  MEASURE(orders.count)
FROM orders
CROSS JOIN users
WHERE
  users.state != 'us-wa'
  AND orders.created_at BETWEEN '2020-01-01' AND '2021-01-01'
GROUP BY 1, 2, 3
LIMIT 10;

The same query using the REST API syntax looks as follows:

{
  "dimensions": [
    "users.state",
    "users.city",
    "orders.status"
  ],
  "measures": [
    "orders.count"
  ],
  "filters": [
    {
      "member": "users.state",
      "operator": "notEquals",
      "values": ["us-wa"]
    }
  ],
  "timeDimensions": [
    {
      "dimension": "orders.created_at",
      "dateRange": ["2020-01-01", "2021-01-01"]
    }
  ],
  "limit": 10
}

Query with post-processing

Queries with post-processing are specific to the SQL API. They are structured in such a way that a regular query is part of a FROM clause or a common table expression (CTE):

-- Regular query in FROM
SELECT ...
FROM (
  -- Regular query
) AS regular_query_in_from
GROUP BY ...
ORDER BY ...
 
-- Regular query in CTE
WITH regular_query_in_cte AS (
  -- Regular query
)
SELECT ...
FROM regular_query_in_cte
GROUP BY ...
ORDER BY ...

To execute the regular query part, Cube will generate the SQL for an upstream data source. Then, Cube will do post-processing by executing the rest of the query using an internal SQL execution engine. Learn more in the SQL API documentation.

Queries with post-processing, since they include regular queries, benefit from in-memory cache and pre-aggregations. They can also be modified before execution with query_rewrite.

However, the trade-off is that queries with post-processing support only a limited set of SQL functions and operators.

Example

See an example of a query with post-processing. In this query, we derive new dimensions, post-aggregate measures, and perform additional filtering:

SELECT
  UPPER(SUBSTRING(state FROM 4)) AS state,
  CHAR_LENGTH(city) AS city_name_length,
  CASE WHEN status = 'completed' THEN 'done' ELSE 'pending' END AS status,
  SUM(count) AS total_count
FROM (
  SELECT
    users.state,
    users.city,
    orders.status,
    MEASURE(orders.count) AS count
  FROM orders
  CROSS JOIN users
  WHERE
    users.state != 'us-wa'
    AND orders.created_at BETWEEN '2020-01-01' AND '2021-01-01'
  GROUP BY 1, 2, 3
  LIMIT 10
) AS regular_query
GROUP BY 1, 2, 3
HAVING city_name_length > 10;

Let's assume that we run the query above and the following pre-aggregation is defined:

pre_aggregations:
  - name: main
    measures:
      - orders.count
    dimensions:
      - users.state
      - users.city
      - orders.status
    time_dimension: orders.created_at
    granularity: day

Cube will use this pre-aggregation, run the following SQL in Cube Store for the regular query, and then do post-processing to produce the result set:

SELECT
  `users__state`,
  `users__city`,
  `orders__status`,
  sum(`orders__count`) `orders__count`
FROM
  prod_pre_aggregations.orders_main_s3ebmau3_mw1bchy_1j4umn8
WHERE
  (
    `users__state` <> 'us-wa'
    OR `users__state` IS NULL
  )
  AND (
    `orders__created_at_day` >= to_timestamp('2020-01-01T00:00:00.000')
    AND `orders__created_at_day` <= to_timestamp('2021-01-01T23:59:59.999')
  )
GROUP BY 1, 2, 3
ORDER BY 4 DESC
LIMIT 10;

Query with pushdown

Queries with pushdown are specific to the SQL API. Unlike queries with post-processing, they can have arbitrary structure and contain pretty much any SQL functions and operators.

Thus, queries with pushdown are not reducible to regular queries and post-processing on top of them. To execute a query with pushdown, Cube will need to transform it and generate the SQL for an upstream data source. Learn more in the SQL API documentation.

Query pushdown in the SQL API is available in public preview. Read more (opens in a new tab) in the blog.

Queries with pushdown, since they don't include regular queries, can not utilize pre-aggregations; however, they still benefit from in-memory cache. Queries with pushdown can also be modified before execution with query_rewrite; however, only applying additional filters is supported.

Example

See an example of a query with pushdown. In this query, the filter in WHERE references a subquery. So, it can't be executed as a query with post-processing. It will be executed as a query with pushdown instead:

SELECT
  users.state,
  users.city
FROM users
WHERE
  users.orders_made = (
    SELECT MAX(orders_made)
    FROM users
  )
GROUP BY 1, 2;

Compare date range query

Compare date range queries are special cases of regular queries. Similarly to regular queries, they include lists of dimensions and measures, filters, etc. and return a result set.

However, unlike regular queries, they provide a convenient way to retrieve measure values for more than one date range for a time dimension.

You can make a compare date range query by using the compareDateRange option with the REST API. Note that you also need to set the queryType parameter of the /v1/load endpoint to multi.

Example

{
  "dimensions": ["orders.city"],
  "measures": ["orders.amount"],
  "filters": [
    {
      "member": "orders.status",
      "operator": "equals",
      "values": ["shipped"]
    }
  ],
  "timeDimensions": [{
    "dimension": "orders.created_at",
    "compareDateRange": [
      ["2024-01-01", "2024-12-31"],
      ["2023-01-01", "2023-12-31"],
      ["2022-01-01", "2022-12-31"],
    ],
    "granularity": "year",
  }],
  "limit": 100
}

See this blog post (opens in a new tab) for more details and examples.

For the SQL API, you can write an equivalent query using the UNION ALL statement.

Total query

Total queries are special cases of regular queries. Similarly to regular queries, they include lists of dimensions and measures, filters, etc. and return a result set.

In addition to that, they provide a convenient way to retrieve the total number of rows in the result set as if no row limit or offset are set in the query. This is useful for creating user interfaces with pagination.

You can make a total query by using the total option with the REST API. For the SQL API, you can write an equivalent query using the UNION ALL statement.

Ungrouped query

Similarly to regular queries, ungrouped queries include lists of dimensions and measures, filters, etc. and return a result set.

However, unlike for regular queries, Cube will not add the GROUP BY statement when generating the SQL for an upstream data source. Instead, raw results after filtering and joining will be returned without any grouping. Measures will be rendered as their sql without any aggregation. Time dimensions will be truncated by granularity as usual, however, not grouped by.

You can make a regular query ungrouped by using the ungrouped option with REST API or GraphQL API. For the SQL API, you can omit the GROUP BY statement from the SQL API query.

By default, for security purposes, ungrouped queries require primary keys of all cubes involved in a query to be added as dimensions. You can use the allow_ungrouped_without_primary_key configration option to override this.

Additionally, note that ungrouped queries have additional requirements for pre-aggregation matching.