Cube Logo0.33.23

Additional concepts

Drilldowns are a powerful feature to facilitate data exploration. It allows building an interface to let users dive deeper into visualizations and data tables. See ResultSet.drillDown() on how to use this feature on the client side.

A drilldown is defined on the measure level in your data model. It’s defined as a list of dimensions called drill members. Once defined, these drill members will always be used to show underlying data when drilling into that measure.

Let’s consider the following example of our imaginary e-commerce store. We have the orders cube, which describes orders in our store. It’s connected to users and products.

YAML
JS
cubes:
  - name: orders
    sql_table: orders

    joins:
      - name: users
        relationship: many_to_one
        sql: "{CUBE}.user_id = {users.id}"

      - name: products
        relationship: many_to_one
        sql: "{CUBE}.product_id = {products.id}"

    measures:
      - name: count
        type: count
        # Here we define all possible properties we might want
        # to "drill down" on from our front-end
        drill_members:
          - id
          - status
          - products.name
          - users.city

    dimensions:
      - name: id
        sql: id
        type: number
        primary_key: true
        public: true

      - name: status
        sql: status
        type: string

You can follow this tutorial to learn more about building a UI for drilldowns.

You can use subquery dimensions to reference measures from other cubes inside a dimension. Under the hood, it behaves as a correlated subquery, but is implemented via joins for optimal performance and portability.

You cannot use subquery dimensions to reference measures from the same cube.

Consider the following tables, where we have deals and sales_managers. deals belong to sales_managers and have the amount dimension. What we want is to calculate the amount of deals for sales_managers:

Subquery Example with Deals and SalesManager cubes

To calculate the deals amount for sales managers in pure SQL, we can use a correlated subquery, which will look like this:

SELECT
  id,
  (
    SELECT SUM(amount)
    FROM deals
    WHERE deals.sales_manager_id = sales_managers.id
  ) AS deals_amount
FROM sales_managers
GROUP BY 1

Cube makes subqueries easy and efficient. Subqueries are defined as regular dimensions with the parameter sub_query set to true.

YAML
JS
cubes:
  - name: deals
    sql_table: deals

    measures:
      - name: amount
        sql: amount
        type: sum

    dimensions:
      - name: sales_manager_id
        sql: sales_manager_id
        type: string

  - name: sales_managers
    sql_table: sales_managers

    joins:
      - name: deals
        relationship: one_to_many
        sql: "{sales_managers}.id = {deals.sales_manager_id}"

    measures:
      - name: average_deal_amount
        sql: "{deals_amount}"
        type: avg

    dimensions:
      - name: id
        sql: id
        type: string
        primary_key: true

      - name: deals_amount
        sql: "{deals.amount}"
        type: number
        sub_query: true

A subquery requires referencing at least one measure in its definition. Generally speaking, all the columns used to define a subquery dimension should first be defined as measures on their respective cubes and then referenced from a subquery dimension over a join. For example the following data model will not work:

YAML
JS
cubes:
  - name: deals
    sql_table: deals

    measures:
      - name: count
        type: count

  - name: sales_managers
    # ...

    dimensions:
      # ...

      - name: deals_amount
        sql: "SUM({deals}.amount)" # Doesn't work, because `amount` is not a measure on `deals`
        type: number
        sub_query: true

You can reference subquery dimensions in measures as usual dimensions. The example below shows the definition of an average deal amount per sales manager:

YAML
JS
cubes:
  - name: sales_managers

    measures:
      - name: average_deals_amount
        sql: "{deals_amount}"
        type: avg

    dimensions:
      - name: id
        sql: id
        type: string
        primary_key: true

      - name: deals_amount
        sql: "{deals.amount}"
        type: number
        sub_query: true

Based on the subquery dimension definition, Cube will create a query that will include the primary key dimension of the main cube and all measures and dimensions included in the SQL definition of the subquery dimension.

This query will be joined as a LEFT JOIN onto the main SQL query. For example, when using the sales_managers.deals_amount subquery dimension, the following query will be generated:

{
  "measures": ["sales_managers.deals_amount"],
  "dimensions": ["sales_managers.id"]
}

If a query includes the sales_managers.average_deal_amount measure, the following SQL will be generated:

SELECT
  AVG(sales_managers__average_deal_amount)
FROM sales_managers
LEFT JOIN (
  SELECT
    sales_managers.id sales_managers__id,
    SUM(deals.amount) sales_managers__average_deal_amount
  FROM sales_managers
  LEFT JOIN deals
    ON sales_managers.id = deals.sales_manager_id
  GROUP BY 1
) sales_managers__average_deal_amount_subquery
  ON sales_managers__average_deal_amount_subquery.sales_managers__id = sales_managers.id

Cube always expects a timestamp with timezone (or compatible type) as an input to the time dimension. However, there are a lot of cases when the underlying table's datetime information is stored as a string. Most SQL databases support datetime parsing which allows converting strings to timestamps. Let's consider an example cube for BigQuery:

YAML
JS
cubes:
  - name: events
    sql_table: schema.events

    dimensions:
      - name: date
        sql: PARSE_TIMESTAMP('%Y-%m-%d', date)
        type: time

In this particular cube, the date column will be parsed using the %Y-%m-%d format. Please note that as we do not pass timezone parameter to PARSE_TIMESTAMP, it will set UTC as the timezone by default. You should always set timezone appropriately for parsed timestamps as Cube always does timezone conversions according to user settings.

Although query performance of big data backends like BigQuery or Presto won't likely suffer from date parsing, performance of RDBMS backends like Postgres most likely will. Adding timestamp columns with indexes should strongly be considered in this case.

Did you find this page useful?