Querying data APIs

After creating a data model, you would like to ask questions to it, i.e., run queries against this data model. This page describes the common concepts of querying Cube through its data APIs.

All queries share the same query defaults and each query fall into one of query types.

Query defaults

The following defaults apply to all queries run by Cube.

Order

By default, for REST API and GraphQL API queries, Cube will apply ordering to any query that does not set it explicitly. The default heuristics are to sort the result set by a time dimension, if any; otherwise, sort it by the first measure, if any; otherwise, sort it by the first dimension, if any. These heuristics provide a result set that works great for most data visualization purposes. You can opt out from default ordering by passing an empty array ([]) object as order.

SQL API queries do not get a default ordering in case there's no ORDER BY clause in the query. You can opt in for default ordering for SQL API queries (except for ungrouped ones) by setting CUBESQL_SQL_NO_IMPLICIT_ORDER to false.

Row limit

By default, any query that does not explicitly specify the row limit, will return no more than 10,000 rows in the result set. You can use the CUBEJS_DB_QUERY_DEFAULT_LIMIT environment variable to override it. This serves as a safeguard against data scraping and denial-of-service (DoS) attacks if Cube is exposed to untrusted environments.

The maximum allowed limit is 50,000 rows. You can use the CUBEJS_DB_QUERY_LIMIT environment variable to override it. You can also implement pagination to fetch more rows than the maximum limit. Finally, if you're using the SQL API, you can enable the streaming mode to ignore the maximum row limit.

Time zone

All time-based calculations performed by Cube are time zone-aware.

By default, Cube assumes that time values in your queries (e.g., in date range filters) are in the UTC time zone (opens in a new tab). Similarly, it will use the same time zone for time dimension values in result sets.

You can use the timezone option with REST API or GraphQL API to specify the time zone for a query. Also, you can use the SQL_UTILS context variable to apply the time zone conversion to dimensions that are not used as time dimensions in a query.

Additionally, note that time zones have impact on pre-aggregation matching.

Query types

Most commonly, you will run regular queries. See the table and the sections below for details on each query type.

Query type	Supported by APIs	Supported in Playground
Regular query	SQL API, REST API, GraphQL API	✅ Yes
Query with post-processing	SQL API only	❌ No
Query with pushdown	SQL API only	❌ No
Compare date range query	REST API only	❌ No
Total query	REST API only	❌ No
Ungrouped query	SQL API, REST API, GraphQL API	❌ No

Regular query

This is the most common type of queries. Regular queries include:

Lists of dimensions and measures that you'd like to see in the result set.
Optionally, filters to apply before returning the result set.
Optionally, a row limit and an offset for the result set.

For regular queries, Cube generates the SQL for the upstream data sources that always includes all dimensions in the GROUP BY statement. See ungrouped queries if you'd like to override this behavior.

Regular queries immensely benefit from in-memory cache and pre-aggregations. They can also be modified before execution with query_rewrite.

Example

See an example of a regular query using the SQL API syntax:

SELECT
  users.state,
  users.city,
  orders.status,
  MEASURE(orders.count)
FROM orders
CROSS JOIN users
WHERE
  users.state != 'us-wa'
  AND orders.created_at BETWEEN '2020-01-01' AND '2021-01-01'
GROUP BY 1, 2, 3
LIMIT 10;

The same query using the REST API syntax looks as follows:

{
  "dimensions": [
    "users.state",
    "users.city",
    "orders.status"
  ],
  "measures": [
    "orders.count"
  ],
  "filters": [
    {
      "member": "users.state",
      "operator": "notEquals",
      "values": ["us-wa"]
    }
  ],
  "timeDimensions": [
    {
      "dimension": "orders.created_at",
      "dateRange": ["2020-01-01", "2021-01-01"]
    }
  ],
  "limit": 10
}

Query with post-processing

Queries with post-processing are specific to the SQL API. Generally, they are structured in such a way that a regular query is part of a FROM clause or a common table expression (CTE):

-- Regular query in FROM
SELECT ...
FROM (
  -- Regular query
) AS regular_query_in_from
GROUP BY ...
ORDER BY ...
 
-- Regular query in CTE
WITH regular_query_in_cte AS (
  -- Regular query
)
SELECT ...
FROM regular_query_in_cte
GROUP BY ...
ORDER BY ...

To execute the regular query part, Cube will generate the SQL for an upstream data source. Then, Cube will do post-processing by executing the rest of the query using an internal SQL execution engine. Learn more in the SQL API documentation.

Queries with post-processing, since they include regular queries, benefit from in-memory cache and pre-aggregations. They can also be modified before execution with query_rewrite.

However, the trade-off is that queries with post-processing support only a limited set of SQL functions and operators.

Example

The simplest example of a query with post-processing:

SELECT VERSION();

This query invokes a function that is implemented by the SQL API and executed without querying the upstream data source.

Now, see a more complex example of a query with post-processing. In this query, we derive new dimensions, post-aggregate measures, and perform additional filtering:

SELECT
  UPPER(SUBSTRING(state FROM 4)) AS state,
  CHAR_LENGTH(city) AS city_name_length,
  CASE WHEN status = 'completed' THEN 'done' ELSE 'pending' END AS status,
  SUM(count) AS total_count
FROM (
  SELECT
    users.state,
    users.city,
    orders.status,
    MEASURE(orders.count) AS count
  FROM orders
  CROSS JOIN users
  WHERE
    users.state != 'us-wa'
    AND orders.created_at BETWEEN '2020-01-01' AND '2021-01-01'
  GROUP BY 1, 2, 3
  LIMIT 10
) AS regular_query
GROUP BY 1, 2, 3
HAVING city_name_length > 10;

Let's assume that we run the query above and the following pre-aggregation is defined:

pre_aggregations:
  - name: main
    measures:
      - orders.count
    dimensions:
      - users.state
      - users.city
      - orders.status
    time_dimension: orders.created_at
    granularity: day

Cube will use this pre-aggregation, run the following SQL in Cube Store for the regular query, and then do post-processing to produce the result set:

SELECT
  `users__state`,
  `users__city`,
  `orders__status`,
  sum(`orders__count`) `orders__count`
FROM
  prod_pre_aggregations.orders_main_s3ebmau3_mw1bchy_1j4umn8
WHERE
  (
    `users__state` <> 'us-wa'
    OR `users__state` IS NULL
  )
  AND (
    `orders__created_at_day` >= to_timestamp('2020-01-01T00:00:00.000')
    AND `orders__created_at_day` <= to_timestamp('2021-01-01T23:59:59.999')
  )
GROUP BY 1, 2, 3
ORDER BY 4 DESC
LIMIT 10;

Query with pushdown

Queries with pushdown are specific to the SQL API. Unlike queries with post-processing, they can have arbitrary structure and contain pretty much any SQL functions and operators.

Thus, queries with pushdown are not reducible to regular queries and post-processing on top of them. To execute a query with pushdown, Cube will need to transform it and generate the SQL for an upstream data source. Learn more in the SQL API documentation.

Query pushdown in the SQL API is available in public preview. Read more (opens in a new tab) in the blog.

Queries with pushdown, since they don't include regular queries, can not utilize pre-aggregations; however, they still benefit from in-memory cache. Queries with pushdown can also be modified before execution with query_rewrite; however, only applying additional filters is supported.

Example

See an example of a query with pushdown. In this query, the filter in WHERE references a subquery. So, it can't be executed as a query with post-processing. It will be executed as a query with pushdown instead:

SELECT
  users.state,
  users.city
FROM users
WHERE
  users.orders_made = (
    SELECT MAX(orders_made)
    FROM users
  )
GROUP BY 1, 2;

Compare date range query

Compare date range queries are special cases of regular queries. Similarly to regular queries, they include lists of dimensions and measures, filters, etc. and return a result set.

However, unlike regular queries, they provide a convenient way to retrieve measure values for more than one date range for a time dimension.

You can make a compare date range query by using the compareDateRange option with the REST API. Note that you also need to set the queryType parameter of the /v1/load endpoint to multi.

Example

{
  "dimensions": ["orders.city"],
  "measures": ["orders.amount"],
  "filters": [
    {
      "member": "orders.status",
      "operator": "equals",
      "values": ["shipped"]
    }
  ],
  "timeDimensions": [{
    "dimension": "orders.created_at",
    "compareDateRange": [
      ["2024-01-01", "2024-12-31"],
      ["2023-01-01", "2023-12-31"],
      ["2022-01-01", "2022-12-31"],
    ],
    "granularity": "year",
  }],
  "limit": 100
}

See this blog post (opens in a new tab) for more details and examples.

For the SQL API, you can write an equivalent query using the UNION ALL statement.

Total query

Total queries are special cases of regular queries. Similarly to regular queries, they include lists of dimensions and measures, filters, etc. and return a result set.

In addition to that, they provide a convenient way to retrieve the total number of rows in the result set as if no row limit or offset are set in the query. This is useful for creating user interfaces with pagination.

You can make a total query by using the total option with the REST API. For the SQL API, you can write an equivalent query using the UNION ALL statement.

Ungrouped query

Similarly to regular queries, ungrouped queries include lists of dimensions and measures, filters, etc. and return a result set.

However, unlike for regular queries, Cube will not add the GROUP BY statement when generating the SQL for an upstream data source. Instead, raw results after filtering and joining will be returned without any grouping. Measures will be rendered as their sql without any aggregation. Time dimensions will be truncated by granularity as usual, however, not grouped by.

You can make a regular query ungrouped by using the ungrouped option with REST API or GraphQL API. For the SQL API, you can omit the GROUP BY statement from the SQL API query.

By default, for security purposes, ungrouped queries require primary keys of all cubes involved in a query to be added as dimensions. You can use the allow_ungrouped_without_primary_key configration option to override this.

Additionally, note that ungrouped queries have additional requirements for pre-aggregation matching.

APIs & integrations DAX API