Cubes
A cube
represents a table of data in Cube. Cubes are typically declared in
separate files with one cube per file. Within each cube are definitions of
measures, dimensions, and joins between cubes. A cube should always be declared
with a capital letter.
cube(`Users`, {
sql: `SELECT * FROM users`,
joins: {
Organizations: {
relationship: `belongsTo`,
sql: `${Users.organization_id} = ${Organizations.id}`,
},
},
measures: {
count: {
type: `count`,
sql: `id`,
},
},
dimensions: {
organization_id: {
sql: `organization_id`,
type: `number`,
primaryKey: true,
},
created_at: {
type: `time`,
sql: `created_at`,
},
country: {
type: `string`,
sql: `country`,
},
},
});
There are certain rules to follow for a cube and cube member names:
- Be unique within the project
- Start with a letter
- Consist of numbers, letters and
_
As a convention, cube names start with upper case letters and member names with lower case letters. As in JavaScript, camel case is used for multi-word cube and member names.
Cubes have three types of members: measures, dimensions and segments. Each
member can be referenced either by its fully-qualified name
(<CubeName>.<memberName>
) or by its short version (<memberName>
) if member
exists within the same cube. There is also a handy CUBE
context variable which
references the current cube. Important difference between same cube references
is CUBE.<memberName>
references are resolved runtime as opposed to compile
time <memberName>
references.
Using the CUBE
referencing causes its alias to be used in SQL queries. For
example, to avoid ambiguity in the name
dimension in the following example, we
use the CUBE
context variable to explicitly reference the name
property from
the Users
cube:
cube(`Users`, {
sql: `SELECT * FROM users`,
joins: {
Contacts: {
sql: `${CUBE}.contact_id = ${Contacts.id}`,
relationship: `hasOne`
}
},
dimensions: {
// primary key,
name: {
sql: `COALESCE(${CUBE}.name, ${Contacts.name})`,
type: `string`
}
}
});
cube(`Contacts`, {
sql: `select * from contacts`,
dimensions: {
id: {
sql: `id`,
type: `number`,
primaryKey: true
},
name: {
sql: `name`,
type: `string`
}
}
});
Referencing a foreign cube in the sql
parameter instructs Cube to build an
implicit join to this cube. Using the schema above, we'll use a query as an
example:
{
"dimensions": ["Users.name"]
}
The resulting SQL output from /v1/sql
would look like:
select COALESCE("users".name, "contacts".name) "users__name"
FROM users "users"
LEFT JOIN contacts "contacts" ON "users".contact_id = "contacts".id
This feature is only available for Cube models authored in JavaScript.
Abstract cubes can be defined by simply omitting the first parameter to the
cube()
function. Cubes defined in this way can still be extended, but will be
"hidden" from the Developer Playground and calls to the
/meta
API endpoint.
const Users = cube({
sql: `select * from users`,
dimensions: {
// primary key,
name: {
sql: `${CUBE}.name`,
type: `string`,
},
},
});
cube(`Contacts`, {
extends: Users,
});
Each cube in schema can have its own dataSource
name to support scenarios
where data should be fetched from multiple databases. The value of the
dataSource
parameter will be passed to the
driverFactory()
function as part of the context
parameter. By default, each cube has a default
value for its dataSource
; to
override it you can use:
cube(`OrderFacts`, {
sql: `SELECT * FROM orders`,
dataSource: `prod_db`,
});
Use a description in your cubes to allow your team to better understand what this cube is about. It is a very simple and yet useful tool that gives a hint to everyone and makes sure data is interpreted correctly by users.
cube(`Orders`, {
sql: `SELECT * FROM orders`,
title: `Product Orders`,
description: `All orders related information`,
});
You can extend cubes in order to reuse all declared members of a cube. In the
example below, ExtendedOrderFacts
will reuse the sql
and count
measures
from OrderFacts
:
cube(`OrderFacts`, {
sql: `SELECT * FROM orders`,
measures: {
count: {
type: `count`,
sql: `id`,
},
},
});
cube(`ExtendedOrderFacts`, {
extends: OrderFacts,
measures: {
double_count: {
type: `number`,
sql: `${count} * 2`,
},
},
});
You can also omit the cube name while defining a cube in JavaScript. This way, Cube doesn't
register this cube globally; instead it returns a reference which you can
use while combining cubes. It makes sense to use it for dynamic schema
generation and reusing with extends
. Previous example without defining
OrderFacts
cube globally:
const OrderFacts = cube({
sql: `SELECT * FROM orders`,
measures: {
count: {
type: `count`,
sql: `id`,
},
},
});
cube(`ExtendedOrderFacts`, {
extends: OrderFacts,
measures: {
double_count: {
type: `number`,
sql: `${count} * 2`,
},
},
});
Cube's caching layer uses refreshKey
queries to get the current version of
content for a specific cube. If a query result changes, Cube will invalidate all
queries that rely on that cube.
The default values for refreshKey
are
every: '2 minute'
for BigQuery, Athena, Snowflake, and Presto.every: '10 second'
for all other databases.
Refresh key of a query is a concatenation of all cubes refresh keys involved in query. For rollup queries pre-aggregation table name is used as a refresh key.
You can set up a custom refresh check SQL by changing refreshKey
property.
Often, a MAX(updated_at_timestamp)
for OLTP data is a viable option, or
examining a metadata table for whatever system is managing the data to see when
it last ran. timestamp in that case.
cube(`OrderFacts`, {
sql: `SELECT * FROM orders`,
// With this refreshKey Cube will only refresh the data if
// the value of previous MAX(updated_at_timestamp) changed
// By default Cube will check this refreshKey every 10 seconds
refreshKey: {
sql: `SELECT MAX(updated_at_timestamp) FROM orders`,
},
});
You can use interval based refreshKey
. For example:
cube(`OrderFacts`, {
sql: `SELECT * FROM orders`,
refreshKey: {
every: `1 hour`,
},
});
every
- can be set as an interval with granularities second
, minute
,
hour
, day
, and week
or accept CRON string with some limitations. If you
set every
as CRON string, you can use the timezone
property.
For example:
cube(`OrderFacts`, {
sql: `SELECT * FROM orders`,
refreshKey: {
every: '30 5 * * 5',
timezone: 'America/Los_Angeles',
},
});
every
can accept only equal time intervals - so "Day of month" and "month"
intervals in CRON expressions are not supported.
Cube supports two different formats of CRON expressions: standard and advanced with support for seconds.
Such refreshKey
is just a syntactic sugar over refreshKey
SQL. It's
guaranteed that refreshKey
change it's value at least once during every
interval. It will be converted to appropriate SQL select which value will change
over time based on interval value. Values of interval based refreshKey
are
tried to be checked ten times within defined interval but not more than once per
1 second
and not less than once per 5 minute
. For example if interval is
10 minute
it's refreshKeyRenewalThreshold
will be 60 seconds and generated
refreshKey
SQL (Postgres) would be:
SELECT FLOOR(EXTRACT(EPOCH FROM NOW()) / 600)
For 5 second
interval refreshKeyRenewalThreshold
will be just 1 second and
SQL will be:
SELECT FLOOR(EXTRACT(EPOCH FROM NOW()) / 5)
Supported cron formats
- Standard cron syntax
* * * * *
┬ ┬ ┬ ┬ ┬
│ │ │ │ |
│ │ │ │ └ day of week (0 - 7) (0 or 7 is Sun)
│ │ │ └───── month (1 - 12)
│ │ └────────── day of month (1 - 31, L)
│ └─────────────── hour (0 - 23)
└──────────────────── minute (0 - 59)
- Advanced cron format with support for seconds
* * * * * *
┬ ┬ ┬ ┬ ┬ ┬
│ │ │ │ │ |
│ │ │ │ │ └ day of week (0 - 7) (0 or 7 is Sun)
│ │ │ │ └───── month (1 - 12)
│ │ │ └────────── day of month (1 - 31, L)
│ │ └─────────────── hour (0 - 23)
│ └──────────────────── minute (0 - 59)
└───────────────────────── second (0 - 59, optional)
Set this flag to true if you want Cube to rewrite your queries after final SQL has been generated. This may be helpful to apply filter pushdown optimizations or reduce unnecessary query nesting. For example:
cube(`Tickets`, {
rewriteQueries: true,
});
The shown
property is used to manage the visibility of a cube. Valid values
for shown
are true
and false
.
cube(`Orders`, {
sql: `SELECT * FROM orders`,
shown: false,
});
To learn more about using shown
to control visibility based on security
context, read the Controlling access to cubes and views
recipe.
The sql
parameter specifies the SQL that will be used to generate a table that
will be queried by a cube. It can be any valid SQL query, but usually it takes
the form of a select * from my_table
query. Please note that you don't need to
use GROUP BY
in a SQL query on the cube level. This query should return a
plain table, without aggregations.
cube(`Orders`, {
sql: `SELECT * FROM orders`,
});
With JavaScript models, you can also reference other cubes' SQL statements for code reuse:
cube(`Companies`, {
sql: `SELECT users.company_name, users.company_id FROM ${Users.sql()} AS users`,
});
Use sqlAlias
when auto-generated cube alias prefix is too long and truncated
by DB such as Postgres:
cube(`OrderFacts`, {
sql: `SELECT * FROM orders`,
sqlAlias: `ofacts`,
// ...
});
It'll generate aliases for members such as ofacts__count
. sqlAlias
affects
all member names including pre-aggregation table names.
Use title
to change the display name of the cube. By default, Cube will
humanize the cube's name, so for instance, UsersOrder
would become
Users Orders
. If default humanizing doesn't work in your case, please use the
title parameter. It is highly recommended to give human readable names to your
cubes. It will help everyone on a team better understand the data structure and
will help maintain a consistent set of definitions across an organization.
cube(`Orders`, {
sql: `SELECT * FROM orders`,
title: `Product Orders`,
});
FILTER_PARAMS
allows you to use filter values during SQL generation. You can
add it for any valid SQL expression as in the case of dimensions. This is useful
if you want to hint your database optimizer to use a specific index or filter
out partitions or shards in your cloud data warehouse so you won't be billed for
scanning those.
It has the following structure:
FILTER_PARAMS.<CUBE_NAME>.<FILTER_NAME>.filter(expression)
The filter()
function accepts the expression, which could be either String
or
Function
. See the examples below.
cube(`OrderFacts`, {
sql: `SELECT * FROM orders WHERE ${FILTER_PARAMS.OrderFacts.date.filter(
'date'
)}`,
measures: {
count: {
type: `count`,
},
},
dimensions: {
date: {
sql: `date`,
type: `time`,
},
},
});
This will generate the following SQL:
SELECT * FROM orders WHERE date >= '2018-01-01 00:00:00' and date <= '2018-12-31 23:59:59'
for the ['2018-01-01', '2018-12-31']
date range passed for the
OrderFacts.date
dimension as in following query:
{
"measures": ["OrderFacts.count"],
"timeDimensions": [{
"dimension": "OrderFacts.date",
"granularity": "day",
"dateRange": ["2018-01-01", "2018-12-31"]
}]
}
You can also pass a function instead of an SQL expression as a filter()
argument. This way you can add BigQuery sharding filtering for events, which
will reduce your billing cost.
NOTE: When you're passing function to the
filter()
function, params are passed as string parameters from driver and it's your responsibility to handle type conversions in this case.
cube(`Events`, {
sql: `
SELECT * FROM schema.\`events*\`
WHERE ${FILTER_PARAMS.Events.date.filter(
(from, to) =>
`_TABLE_SUFFIX >= FORMAT_TIMESTAMP('%Y%m%d', TIMESTAMP(${from})) AND _TABLE_SUFFIX <= FORMAT_TIMESTAMP('%Y%m%d', TIMESTAMP(${to}))`
)}
`,
dimensions: {
date: {
sql: `date`,
type: `time`,
},
},
});
SECURITY_CONTEXT
is a user security object that is passed by the Cube Client.
Please read the Security Context page for more information on how
to set SECURITY_CONTEXT
.
As a general rule, we strongly recommend using
queryRewrite
instead of SECURITY_CONTEXT
wherever
possible.
Security context is suitable for the row level security implementation. For
example, if you have an orders
table that contains an email
field you can
restrict all queries to render results that belong only to the current user as
follows:
cube(`Orders`, {
sql: `SELECT * FROM orders WHERE ${SECURITY_CONTEXT.email.filter('email')}`,
dimensions: {
date: {
sql: `date`,
type: `time`,
},
},
});
To ensure filter value presents for all requests requiredFilter
can be used:
cube(`Orders`, {
sql: `SELECT * FROM orders WHERE ${SECURITY_CONTEXT.email.requiredFilter(
'email'
)}`,
dimensions: {
date: {
sql: `date`,
type: `time`,
},
},
});
Use of this feature entails SQL injection security risk. Use it with caution.
You can access values of context variables directly in javascript in order to use it during your SQL generation. For example:
cube(`Orders`, {
sql: `SELECT * FROM ${
SECURITY_CONTEXT.type.unsafeValue() === 'employee' ? 'employee' : 'public'
}.orders`,
dimensions: {
date: {
sql: `date`,
type: `time`,
},
},
});
convertTz
In case you need to convert your timestamp to user request timezone in cube or
member SQL you can use SQL_UTILS.convertTz()
method. Note that Cube will
automatically convert timezones for timeDimensions
fields in
queries.
Dimensions that use SQL_UTILS.convertTz()
should not be used as
timeDimensions
in queries. Doing so will apply the conversion multiple times
and yield wrong results.
In case the same database field needs to be queried in dimensions
and
timeDimensions
, create dedicated dimensions in the cube definition for the
respective use:
cube(`visitors`, {
// ...
dimensions: {
created_at_converted: {
// do not use in timeDimensions query property
type: 'time',
sql: SQL_UTILS.convertTz(`created_at`),
},
created_at: {
// use in timeDimensions query property
type: 'time',
sql: `created_at`,
},
},
});
There's a global COMPILE_CONTEXT
that captured as
RequestContext
at the time of schema compilation. It
contains securityContext
and any other variables provided by
extendContext
.
While securityContext
defined in COMPILE_CONTEXT
it doesn't change it's
value for different users. It may change however for different tenants.
const {
securityContext: { deploymentId },
} = COMPILE_CONTEXT;
const schemaName = `user_${deploymentId}`;
cube(`Users`, {
sql: `SELECT * FROM ${schemaName}.users`,
});
Did you find this page useful?