Additional Concepts
Drilldowns
Drilldowns are a powerful feature to facilitate data exploration. It allows
building an interface to let users dive deeper into visualizations and data
tables. See ResultSet.drillDown()
on how to use this feature on the client side.
A drilldown is defined on the measure level in your data model. It’s defined as a list of dimensions called drill members. Once defined, these drill members will always be used to show underlying data when drilling into that measure.
Let’s consider the following example of our imaginary e-commerce store. We have
the orders
cube, which describes orders in our store. It’s connected to
users
and products
.
cubes:
- name: orders
sql_table: orders
joins:
- name: users
relationship: many_to_one
sql: "{CUBE}.user_id = {users.id}"
- name: products
relationship: many_to_one
sql: "{CUBE}.product_id = {products.id}"
measures:
- name: count
type: count
# Here we define all possible properties we might want
# to "drill down" on from our front-end
drill_members:
- id
- status
- products.name
- users.city
dimensions:
- name: id
sql: id
type: number
primary_key: true
public: true
- name: status
sql: status
type: string
You can follow this tutorial (opens in a new tab) to learn more about building a UI for drilldowns.
Subquery
You can use subquery dimensions to reference measures from other cubes inside a dimension. Under the hood, it behaves as a correlated subquery (opens in a new tab), but is implemented via joins for optimal performance and portability.
You cannot use subquery dimensions to reference measures from the same cube.
Consider the following tables, where we have deals
and sales_managers
.
deals
belong to sales_managers
and have the amount
dimension. What we want
is to calculate the amount of deals for sales_managers
:

To calculate the deals amount for sales managers in pure SQL, we can use a correlated subquery, which will look like this:
SELECT
id,
(
SELECT SUM(amount)
FROM deals
WHERE deals.sales_manager_id = sales_managers.id
) AS deals_amount
FROM sales_managers
GROUP BY 1
Cube makes subqueries easy and efficient. Subqueries are defined as regular
dimensions with the parameter sub_query
set to true.
cubes:
- name: deals
sql_table: deals
measures:
- name: amount
sql: amount
type: sum
dimensions:
- name: sales_manager_id
sql: sales_manager_id
type: string
- name: sales_managers
sql_table: sales_managers
joins:
- name: deals
relationship: one_to_many
sql: "{sales_managers}.id = {deals.sales_manager_id}"
measures:
- name: average_deal_amount
sql: "{deals_amount}"
type: avg
dimensions:
- name: id
sql: id
type: string
primary_key: true
- name: deals_amount
sql: "{deals.amount}"
type: number
sub_query: true
A subquery requires referencing at least one measure in its definition. Generally speaking, all the columns used to define a subquery dimension should first be defined as measures on their respective cubes and then referenced from a subquery dimension over a join. For example the following data model will not work:
cubes:
- name: deals
sql_table: deals
measures:
- name: count
type: count
- name: sales_managers
# ...
dimensions:
# ...
- name: deals_amount
sql: "SUM({deals}.amount)" # Doesn't work, because `amount` is not a measure on `deals`
type: number
sub_query: true
You can reference subquery dimensions in measures as usual dimensions. The example below shows the definition of an average deal amount per sales manager:
cubes:
- name: sales_managers
measures:
- name: average_deals_amount
sql: "{deals_amount}"
type: avg
dimensions:
- name: id
sql: id
type: string
primary_key: true
- name: deals_amount
sql: "{deals.amount}"
type: number
sub_query: true
Under the hood
Based on the subquery dimension definition, Cube will create a query that will include the primary key dimension of the main cube and all measures and dimensions included in the SQL definition of the subquery dimension.
This query will be joined as a LEFT JOIN
onto the main SQL query. For example,
when using the sales_managers.deals_amount
subquery dimension, the following
query will be generated:
{
"measures": ["sales_managers.deals_amount"],
"dimensions": ["sales_managers.id"]
}
If a query includes the sales_managers.average_deal_amount
measure, the
following SQL will be generated:
SELECT
AVG(sales_managers__average_deal_amount)
FROM sales_managers
LEFT JOIN (
SELECT
sales_managers.id sales_managers__id,
SUM(deals.amount) sales_managers__average_deal_amount
FROM sales_managers
LEFT JOIN deals
ON sales_managers.id = deals.sales_manager_id
GROUP BY 1
) sales_managers__average_deal_amount_subquery
ON sales_managers__average_deal_amount_subquery.sales_managers__id = sales_managers.id
String time dimensions
Cube always expects a timestamp with timezone (or compatible type) as an input to the time dimension. However, there are a lot of cases when the underlying table's datetime information is stored as a string. Most SQL databases support datetime parsing which allows converting strings to timestamps. Let's consider an example cube for BigQuery:
cubes:
- name: events
sql_table: schema.events
dimensions:
- name: date
sql: PARSE_TIMESTAMP('%Y-%m-%d', date)
type: time
In this particular cube, the date
column will be parsed using the %Y-%m-%d
format. Please note that as we do not pass timezone parameter to
PARSE_TIMESTAMP
(opens in a new tab), it will set UTC
as the timezone by
default. You should always set timezone appropriately for parsed timestamps as
Cube always does timezone conversions according to user settings.
Although query performance of big data backends like BigQuery or Presto won't likely suffer from date parsing, performance of RDBMS backends like Postgres most likely will. Adding timestamp columns with indexes should strongly be considered in this case.