Cubes
A cube
represents a table of data in Cube. Cubes are typically declared in
separate files with one cube per file. Within each cube are definitions of
measures, dimensions, and joins between cubes.
cubes:
- name: users
sql_table: users
joins:
- name: organizations
relationship: many_to_one
sql: "{CUBE.organization_id} = {organizations.id}"
measures:
- name: count
type: count
sql: id
dimensions:
- name: organization_id
sql: organization_id
type: number
primary_key: true
- name: created_at
sql: created_at
type: time
- name: country
sql: country
type: string
Members and referencing
Cubes have three types of members: measures, dimensions and segments. Each
member can be referenced either by its fully-qualified name
(<cube_name>.<member_name>
) or by its short version (<member_name>
) if
member exists within the same cube. There is also a handy CUBE
context
variable which references the current cube. Important difference between same
cube references is CUBE.<member-Name>
references are resolved runtime as
opposed to compile time <member_name>
references.
Using the CUBE
referencing causes its alias to be used in SQL queries. For
example, to avoid ambiguity in the name
dimension in the following example, we
use the CUBE
context variable to explicitly reference the name
property from
the users
cube:
cubes:
- name: users
sql_table: users
joins:
- name: contacts
sql: "{CUBE.contact_id} = {contacts.id}"
relationship: one_to_one
dimensions:
# Primary key
- name: name
sql: "COALESCE({CUBE.name}, {contacts.name})"
type: string
- name: contacts
sql_table: contacts
dimensions:
- name: id
sql: id
type: number
primary_key: true
- name: name
sql: name
type: string
Referencing a foreign cube in the sql
parameter instructs Cube to build an
implicit join to this cube. Using the data model above, we'll use a query as an
example:
{
"dimensions": ["users.name"]
}
The resulting SQL output from /v1/sql
would look like:
SELECT COALESCE("users".name, "contacts".name) "users__name"
FROM users "users"
LEFT JOIN contacts "contacts"
ON "users".contact_id = "contacts".id
Parameters
name
The name
parameter serves as the identifier of a cube. It must be unique among
all cubes and views within a deployment and follow the naming
conventions.
cubes:
- name: orders
sql_table: orders
data_source
Each cube can have its own data_source
name to support scenarios where data
should be fetched from multiple databases. The value of the data_source
parameter will be passed to the driverFactory()
function as part of the context
parameter. By default, each cube has a
default
value for its data_source
; to override it you can use:
cubes:
- name: order_facts
data_source: prod_db
sql_table: orders
description
Use a description in your cubes to allow your team to better understand what this cube is about. It is a very simple and yet useful tool that gives a hint to everyone and makes sure data is interpreted correctly by users.
cubes:
- name: orders
sql_table: orders
title: Product Orders
description: All orders related information
extends
You can extend cubes in order to reuse all declared members of a cube. In the
example below, extended_order_facts
will reuse the sql
and count
measures
from order_facts
:
cubes:
- name: order_facts
sql_table: orders
measures:
- name: count
type: count
sql: id
- name: extended_order_facts
extends: order_facts
measures:
- name: double_count
type: number
sql: "{count} * 2"
You can also omit the cube name while defining a cube in JavaScript. This way,
Cube doesn't register this cube globally; instead it returns a reference which
you can use while combining cubes. It makes sense to use it for dynamic data
model generation and reusing with extends
. Previous example without defining
order_facts
cube globally:
const order_facts = cube({
sql: `orders`,
measures: {
count: {
type: `count`,
sql: `id`,
},
},
});
cube(`extended_order_facts`, {
extends: order_facts,
measures: {
double_count: {
type: `number`,
sql: `${count} * 2`,
},
},
});
public
Prior to v0.33, this property was called shown
.
The public
property is used to manage the visibility of a cube. Valid values
for public
are true
and false
. When set to false
, this cube cannot
be queried through the API. Defaults to true
.
cubes:
- name: orders
sql_table: public.orders
public: false
To learn more about using public
to control visibility based on security
context, read the Controlling access to cubes and views
recipe.
refresh_key
Cube's caching layer uses refresh_key
queries to get the current version of
content for a specific cube. If a query result changes, Cube will invalidate all
queries that rely on that cube.
The default values for refresh_key
are
every: '2 minute'
for BigQuery, Athena, Snowflake, and Presto.every: '10 second'
for all other databases.
Refresh key of a query is a concatenation of all cubes refresh keys involved in query. For rollup queries pre-aggregation table name is used as a refresh key.
You can set up a custom refresh check SQL by changing refresh_key
property.
Often, a MAX(updated_at_timestamp)
for OLTP data is a viable option, or
examining a metadata table for whatever system is managing the data to see when
it last ran. timestamp in that case.
cubes:
- name: order_facts
sql_table: orders
refresh_key:
sql: SELECT MAX(updated_at_timestamp) FROM orders
You can use interval-based refresh_key
. For example:
cubes:
- name: order_facts
sql_table: orders
refresh_key:
every: 1 hour
every
- can be set as an interval with granularities second
, minute
,
hour
, day
, and week
or accept CRON string with some limitations. If you
set every
as CRON string, you can use the timezone
property.
For example:
cubes:
- name: order_facts
sql_table: orders
refresh_key:
every: 30 5 * * 5
timezone: America/Los_Angeles
every
can accept only equal time intervals - so "Day of month" and "month"
intervals in CRON expressions are not supported.
Cube supports two different formats of CRON expressions: standard and advanced with support for seconds.
Such refresh_key
is just a syntactic sugar over refresh_key
SQL. It's
guaranteed that refresh_key
change it's value at least once during every
interval. It will be converted to appropriate SQL select which value will change
over time based on interval value. Values of interval based refresh_key
are
tried to be checked ten times within defined interval but not more than once per
1 second
and not less than once per 5 minute
. For example if interval is
10 minute
it's refreshKeyRenewalThreshold
will be 60 seconds and generated
refresh_key
SQL (Postgres) would be:
SELECT FLOOR(EXTRACT(EPOCH FROM NOW()) / 600)
For 5 second
interval refreshKeyRenewalThreshold
will be just 1 second and
SQL will be:
SELECT FLOOR(EXTRACT(EPOCH FROM NOW()) / 5)
Supported cron formats
- Standard cron syntax
* * * * *
┬ ┬ ┬ ┬ ┬
│ │ │ │ |
│ │ │ │ └ day of week (0 - 7) (0 or 7 is Sun)
│ │ │ └───── month (1 - 12)
│ │ └────────── day of month (1 - 31, L)
│ └─────────────── hour (0 - 23)
└──────────────────── minute (0 - 59)
- Advanced cron format with support for seconds
* * * * * *
┬ ┬ ┬ ┬ ┬ ┬
│ │ │ │ │ |
│ │ │ │ │ └ day of week (0 - 7) (0 or 7 is Sun)
│ │ │ │ └───── month (1 - 12)
│ │ │ └────────── day of month (1 - 31, L)
│ │ └─────────────── hour (0 - 23)
│ └──────────────────── minute (0 - 59)
└───────────────────────── second (0 - 59, optional)
sql
The sql
parameter specifies the SQL that will be used to generate a table that
will be queried by a cube. It can be any valid SQL query, but usually it takes
the form of a SELECT * FROM my_table
query. Please note that you don't need to
use GROUP BY
in a SQL query on the cube level. This query should return a
plain table, without aggregations.
cubes:
- name: orders
sql: SELECT * FROM orders
With JavaScript models, you can also reference other cubes' SQL statements for code reuse:
cube(`companies`, {
sql: `
SELECT users.company_name, users.company_id
FROM ${users.sql()} AS users
`,
});
It is recommended to prefer the sql_table
property
over the sql
property for all cubes that are supposed to use queries like
this: SELECT * FROM table
.
sql_table
The sql_table
property is used as a concise way for defining a cube that uses
a query like this: SELECT * FROM table
. Instead of using the
sql
property, use sql_table
with the table name that this
cube will query.
cubes:
- name: orders
sql_table: public.orders
sql_alias
Use sql_alias
when auto-generated cube alias prefix is too long and truncated
by DB such as Postgres:
cubes:
- name: order_facts
sql_table: orders
sql_alias: ofacts
It'll generate aliases for members such as ofacts__count
. sql_alias
affects
all member names including pre-aggregation table names.
title
Use title
to change the display name of the cube. By default, Cube will
humanize the cube's name, so for instance, users_orders
would become
Users Orders
. If default humanizing doesn't work in your case, please use the
title parameter. It is highly recommended to give human readable names to your
cubes. It will help everyone on a team better understand the data structure and
will help maintain a consistent set of definitions across an organization.
cubes:
- name: orders
sql_table: orders
title: Product Orders
Context Variables
Filter Params
FILTER_PARAMS
allows you to use filter values during SQL generation. You can
add it for any valid SQL expression as in the case of dimensions. This is useful
if you want to hint your database optimizer to use a specific index or filter
out partitions or shards in your cloud data warehouse so you won't be billed for
scanning those.
It has the following structure:
FILTER_PARAMS.<CUBE_NAME>.<FILTER_NAME>.filter(expression)
Heavy usage of FILTER_PARAMS is considered a bad practice. It usually leads to hard to maintain data model.
Good rule of thumb is to use FILTER_PARAMS only for predicate pushdown performance optimizations.
If you find yourself relying a lot on FILTER_PARAMS, it might mean you'll need to
- Rethink your data modeling and potentially move some transformations upstream
- Reconsider choice of underlying database
The filter()
function accepts the expression, which could be either String
or Function
. See the examples below.
cubes:
- name: order_facts
sql: >
SELECT * FROM orders WHERE {FILTER_PARAMS.order_facts.date.filter('date')}
measures:
- name: count
type: count
dimensions:
- name: date
sql: date
type: time
This will generate the following SQL:
SELECT *
FROM orders
WHERE
date >= '2018-01-01 00:00:00' AND
date <= '2018-12-31 23:59:59'
for the ['2018-01-01', '2018-12-31']
date range passed for the
order_facts.date
dimension as in following query:
{
"measures": ["order_facts.count"],
"time_dimensions": [
{
"dimension": "order_facts.date",
"granularity": "day",
"dateRange": ["2018-01-01", "2018-12-31"]
}
]
}
You can also pass a function instead of an SQL expression as a filter()
argument. This way you can add BigQuery sharding filtering for events, which
will reduce your billing cost.
NOTE: When you're passing function to the
filter()
function, params are passed as string parameters from driver and it's your responsibility to handle type conversions in this case.
cube(`events`, {
sql: `
SELECT *
FROM schema.\`events*\`
WHERE ${FILTER_PARAMS.events.date.filter(
(from, to) => `
_TABLE_SUFFIX >= FORMAT_TIMESTAMP('%Y%m%d', TIMESTAMP(${from})) AND
_TABLE_SUFFIX <= FORMAT_TIMESTAMP('%Y%m%d', TIMESTAMP(${to}))
`
)}
`,
dimensions: {
date: {
sql: `date`,
type: `time`,
},
},
});
Security Context
SECURITY_CONTEXT
is deprecated and may be removed in the upcoming versions.
Use queryRewrite
instead.
SECURITY_CONTEXT
is a user security object that is passed by the Cube Client.
Please read the Security Context page for more information on how
to set SECURITY_CONTEXT
.
Security context is suitable for the row level security implementation. For
example, if you have an orders
table that contains an email
field you can
restrict all queries to render results that belong only to the current user as
follows:
cubes:
- name: orders
sql: >
SELECT * FROM orders WHERE {SECURITY_CONTEXT.email.filter('email')}
dimensions:
- name: date
sql: date
type: time
To ensure filter value presents for all requests requiredFilter
can be used:
cubes:
- name: orders
sql: >
SELECT * FROM orders WHERE
{SECURITY_CONTEXT.email.requiredFilter('email')}
dimensions:
- name: date
sql: date
type: time
You can access values of context variables directly in JavaScript in order to use it during your SQL generation. For example:
Use of this feature entails SQL injection security risk. Use it with caution.
cube(`orders`, {
sql: `
SELECT *
FROM ${
SECURITY_CONTEXT.type.unsafeValue() === "employee" ? "employee" : "public"
}.orders
`,
dimensions: {
date: {
sql: `date`,
type: `time`,
},
},
});
SQL Utils
convertTz
In case you need to convert your timestamp to user request timezone in cube or
member SQL you can use SQL_UTILS.convertTz()
method. Note that Cube will
automatically convert timezones for timeDimensions
fields in
queries.
Dimensions that use SQL_UTILS.convertTz()
should not be used as
timeDimensions
in queries. Doing so will apply the conversion multiple times
and yield wrong results.
In case the same database field needs to be queried in dimensions
and
timeDimensions
, create dedicated dimensions in the cube definition for the
respective use:
cubes:
- name: visitors
# ...
dimensions:
# Do not use in timeDimensions query property
- name: created_at_converted
sql: "{SQL_UTILS.convertTz(`created_at`)}"
type: time
# Use in timeDimensions query property
- name: created_at
sql: created_at
type: time
Compile context
There's a global COMPILE_CONTEXT
that captured as
RequestContext
at the time of data model compilation. It
contains securityContext
and any other variables provided by
extendContext
.
While securityContext
defined in COMPILE_CONTEXT
it doesn't change it's
value for different users. It may change however for different tenants.
cubes:
- name: users
sql_table: "user_{COMPILE_CONTEXT.security_context.deployment_id}.users"