Documentation
Data modeling
Subquery dimensions

Subquery dimensions

You can use subquery dimensions to reference measures from other cubes inside a dimension. Under the hood, it behaves as a correlated subquery (opens in a new tab), but is implemented via joins for optimal performance and portability.

You cannot use subquery dimensions to reference measures from the same cube.

Consider the following tables, where we have deals and sales_managers. deals belong to sales_managers and have the amount dimension. What we want is to calculate the amount of deals for sales_managers:

Subquery Example with Deals and SalesManager cubes

To calculate the deals amount for sales managers in pure SQL, we can use a correlated subquery, which will look like this:

SELECT
  id,
  (
    SELECT SUM(amount)
    FROM deals
    WHERE deals.sales_manager_id = sales_managers.id
  ) AS deals_amount
FROM sales_managers
GROUP BY 1

Cube makes subqueries easy and efficient. Subqueries are defined as regular dimensions with the parameter sub_query set to true.

YAML
JavaScript
cubes:
  - name: deals
    sql_table: deals
 
    measures:
      - name: amount
        sql: amount
        type: sum
 
    dimensions:
      - name: sales_manager_id
        sql: sales_manager_id
        type: string
 
  - name: sales_managers
    sql_table: sales_managers
 
    joins:
      - name: deals
        relationship: one_to_many
        sql: "{sales_managers}.id = {deals.sales_manager_id}"
 
    measures:
      - name: average_deal_amount
        sql: "{deals_amount}"
        type: avg
 
    dimensions:
      - name: id
        sql: id
        type: string
        primary_key: true
 
      - name: deals_amount
        sql: "{deals.amount}"
        type: number
        sub_query: true

A subquery requires referencing at least one measure in its definition. Generally speaking, all the columns used to define a subquery dimension should first be defined as measures on their respective cubes and then referenced from a subquery dimension over a join. For example the following data model will not work:

YAML
JavaScript
cubes:
  - name: deals
    sql_table: deals
 
    measures:
      - name: count
        type: count
 
  - name: sales_managers
    # ...
 
    dimensions:
      # ...
 
      - name: deals_amount
        sql: "SUM({deals}.amount)" # Doesn't work, because `amount` is not a measure on `deals`
        type: number
        sub_query: true

You can reference subquery dimensions in measures as usual dimensions. The example below shows the definition of an average deal amount per sales manager:

YAML
JavaScript
cubes:
  - name: sales_managers
 
    measures:
      - name: average_deals_amount
        sql: "{deals_amount}"
        type: avg
 
    dimensions:
      - name: id
        sql: id
        type: string
        primary_key: true
 
      - name: deals_amount
        sql: "{deals.amount}"
        type: number
        sub_query: true

Under the hood

Based on the subquery dimension definition, Cube will create a query that will include the primary key dimension of the main cube and all measures and dimensions included in the SQL definition of the subquery dimension.

This query will be joined as a LEFT JOIN onto the main SQL query. For example, when using the sales_managers.deals_amount subquery dimension, the following query will be generated:

{
  "measures": ["sales_managers.deals_amount"],
  "dimensions": ["sales_managers.id"]
}

If a query includes the sales_managers.average_deal_amount measure, the following SQL will be generated:

SELECT
  AVG(sales_managers__average_deal_amount)
FROM sales_managers
LEFT JOIN (
  SELECT
    sales_managers.id sales_managers__id,
    SUM(deals.amount) sales_managers__average_deal_amount
  FROM sales_managers
  LEFT JOIN deals
    ON sales_managers.id = deals.sales_manager_id
  GROUP BY 1
) sales_managers__average_deal_amount_subquery
  ON sales_managers__average_deal_amount_subquery.sales_managers__id = sales_managers.id