Documentation
Data Modeling
Fundamentals
Additional Concepts

Additional Concepts

Drilldowns

Drilldowns are a powerful feature to facilitate data exploration. It allows building an interface to let users dive deeper into visualizations and data tables. See ResultSet.drillDown() on how to use this feature on the client side.

A drilldown is defined on the measure level in your data model. It’s defined as a list of dimensions called drill members. Once defined, these drill members will always be used to show underlying data when drilling into that measure.

Let’s consider the following example of our imaginary e-commerce store. We have the orders cube, which describes orders in our store. It’s connected to users and products.

YAML
JavaScript
cubes:
  - name: orders
    sql_table: orders
 
    joins:
      - name: users
        relationship: many_to_one
        sql: "{CUBE}.user_id = {users.id}"
 
      - name: products
        relationship: many_to_one
        sql: "{CUBE}.product_id = {products.id}"
 
    measures:
      - name: count
        type: count
        # Here we define all possible properties we might want
        # to "drill down" on from our front-end
        drill_members:
          - id
          - status
          - products.name
          - users.city
 
    dimensions:
      - name: id
        sql: id
        type: number
        primary_key: true
        public: true
 
      - name: status
        sql: status
        type: string

You can follow this tutorial (opens in a new tab) to learn more about building a UI for drilldowns.

Subquery

You can use subquery dimensions to reference measures from other cubes inside a dimension. Under the hood, it behaves as a correlated subquery (opens in a new tab), but is implemented via joins for optimal performance and portability.

You cannot use subquery dimensions to reference measures from the same cube.

Consider the following tables, where we have deals and sales_managers. deals belong to sales_managers and have the amount dimension. What we want is to calculate the amount of deals for sales_managers:

Subquery Example with Deals and SalesManager cubes

To calculate the deals amount for sales managers in pure SQL, we can use a correlated subquery, which will look like this:

SELECT
  id,
  (
    SELECT SUM(amount)
    FROM deals
    WHERE deals.sales_manager_id = sales_managers.id
  ) AS deals_amount
FROM sales_managers
GROUP BY 1

Cube makes subqueries easy and efficient. Subqueries are defined as regular dimensions with the parameter sub_query set to true.

YAML
JavaScript
cubes:
  - name: deals
    sql_table: deals
 
    measures:
      - name: amount
        sql: amount
        type: sum
 
    dimensions:
      - name: sales_manager_id
        sql: sales_manager_id
        type: string
 
  - name: sales_managers
    sql_table: sales_managers
 
    joins:
      - name: deals
        relationship: one_to_many
        sql: "{sales_managers}.id = {deals.sales_manager_id}"
 
    measures:
      - name: average_deal_amount
        sql: "{deals_amount}"
        type: avg
 
    dimensions:
      - name: id
        sql: id
        type: string
        primary_key: true
 
      - name: deals_amount
        sql: "{deals.amount}"
        type: number
        sub_query: true

A subquery requires referencing at least one measure in its definition. Generally speaking, all the columns used to define a subquery dimension should first be defined as measures on their respective cubes and then referenced from a subquery dimension over a join. For example the following data model will not work:

YAML
JavaScript
cubes:
  - name: deals
    sql_table: deals
 
    measures:
      - name: count
        type: count
 
  - name: sales_managers
    # ...
 
    dimensions:
      # ...
 
      - name: deals_amount
        sql: "SUM({deals}.amount)" # Doesn't work, because `amount` is not a measure on `deals`
        type: number
        sub_query: true

You can reference subquery dimensions in measures as usual dimensions. The example below shows the definition of an average deal amount per sales manager:

YAML
JavaScript
cubes:
  - name: sales_managers
 
    measures:
      - name: average_deals_amount
        sql: "{deals_amount}"
        type: avg
 
    dimensions:
      - name: id
        sql: id
        type: string
        primary_key: true
 
      - name: deals_amount
        sql: "{deals.amount}"
        type: number
        sub_query: true

Under the hood

Based on the subquery dimension definition, Cube will create a query that will include the primary key dimension of the main cube and all measures and dimensions included in the SQL definition of the subquery dimension.

This query will be joined as a LEFT JOIN onto the main SQL query. For example, when using the sales_managers.deals_amount subquery dimension, the following query will be generated:

{
  "measures": ["sales_managers.deals_amount"],
  "dimensions": ["sales_managers.id"]
}

If a query includes the sales_managers.average_deal_amount measure, the following SQL will be generated:

SELECT
  AVG(sales_managers__average_deal_amount)
FROM sales_managers
LEFT JOIN (
  SELECT
    sales_managers.id sales_managers__id,
    SUM(deals.amount) sales_managers__average_deal_amount
  FROM sales_managers
  LEFT JOIN deals
    ON sales_managers.id = deals.sales_manager_id
  GROUP BY 1
) sales_managers__average_deal_amount_subquery
  ON sales_managers__average_deal_amount_subquery.sales_managers__id = sales_managers.id

String time dimensions

Cube always expects a timestamp with timezone (or compatible type) as an input to the time dimension. However, there are a lot of cases when the underlying table's datetime information is stored as a string. Most SQL databases support datetime parsing which allows converting strings to timestamps. Let's consider an example cube for BigQuery:

YAML
JavaScript
cubes:
  - name: events
    sql_table: schema.events
 
    dimensions:
      - name: date
        sql: PARSE_TIMESTAMP('%Y-%m-%d', date)
        type: time

In this particular cube, the date column will be parsed using the %Y-%m-%d format. Please note that as we do not pass timezone parameter to PARSE_TIMESTAMP (opens in a new tab), it will set UTC as the timezone by default. You should always set timezone appropriately for parsed timestamps as Cube always does timezone conversions according to user settings.

Although query performance of big data backends like BigQuery or Presto won't likely suffer from date parsing, performance of RDBMS backends like Postgres most likely will. Adding timestamp columns with indexes should strongly be considered in this case.