Cube Logo0.33.23

Cubes

A cube represents a table of data in Cube. Cubes are typically declared in separate files with one cube per file. Within each cube are definitions of measures, dimensions, and joins between cubes.

YAML
JS
cubes:
  - name: users
    sql_table: users

    joins:
      - name: organizations
        relationship: many_to_one
        sql: "{CUBE.organization_id} = {organizations.id}"

    measures:
      - name: count
        type: count
        sql: id

    dimensions:
      - name: organization_id
        sql: organization_id
        type: number
        primary_key: true

      - name: created_at
        sql: created_at
        type: time

      - name: country
        sql: country
        type: string

Cubes have three types of members: measures, dimensions and segments. Each member can be referenced either by its fully-qualified name (<cube_name>.<member_name>) or by its short version (<member_name>) if member exists within the same cube. There is also a handy CUBE context variable which references the current cube. Important difference between same cube references is CUBE.<member-Name> references are resolved runtime as opposed to compile time <member_name> references.

Using the CUBE referencing causes its alias to be used in SQL queries. For example, to avoid ambiguity in the name dimension in the following example, we use the CUBE context variable to explicitly reference the name property from the users cube:

YAML
JS
cubes:
  - name: users
    sql_table: users

    joins:
      - name: contacts
        sql: "{CUBE.contact_id} = {contacts.id}"
        relationship: one_to_one

    dimensions:
      # Primary key

      - name: name
        sql: "COALESCE({CUBE.name}, {contacts.name})"
        type: string

  - name: contacts
    sql_table: contacts

    dimensions:
      - name: id
        sql: id
        type: number
        primary_key: true

      - name: name
        sql: name
        type: string

Referencing a foreign cube in the sql parameter instructs Cube to build an implicit join to this cube. Using the data model above, we'll use a query as an example:

{
  "dimensions": ["users.name"]
}

The resulting SQL output from /v1/sql would look like:

SELECT COALESCE("users".name, "contacts".name) "users__name"
FROM users "users"
LEFT JOIN contacts "contacts"
  ON "users".contact_id = "contacts".id

The name parameter serves as the identifier of a cube. It must be unique among all cubes and views within a deployment and follow the naming conventions.

YAML
JS
cubes:
  - name: orders
    sql_table: orders

Each cube can have its own data_source name to support scenarios where data should be fetched from multiple databases. The value of the data_source parameter will be passed to the driverFactory() function as part of the context parameter. By default, each cube has a default value for its data_source; to override it you can use:

YAML
JS
cubes:
  - name: order_facts
    data_source: prod_db
    sql_table: orders

Use a description in your cubes to allow your team to better understand what this cube is about. It is a very simple and yet useful tool that gives a hint to everyone and makes sure data is interpreted correctly by users.

YAML
JS
cubes:
  - name: orders
    sql_table: orders
    title: Product Orders
    description: All orders related information

You can extend cubes in order to reuse all declared members of a cube. In the example below, extended_order_facts will reuse the sql and count measures from order_facts:

YAML
JS
cubes:
  - name: order_facts
    sql_table: orders

    measures:
      - name: count
        type: count
        sql: id

  - name: extended_order_facts
    extends: order_facts

    measures:
      - name: double_count
        type: number
        sql: "{count} * 2"

You can also omit the cube name while defining a cube in JavaScript. This way, Cube doesn't register this cube globally; instead it returns a reference which you can use while combining cubes. It makes sense to use it for dynamic data model generation and reusing with extends. Previous example without defining order_facts cube globally:

const order_facts = cube({
  sql: `orders`,

  measures: {
    count: {
      type: `count`,
      sql: `id`,
    },
  },
});

cube(`extended_order_facts`, {
  extends: order_facts,

  measures: {
    double_count: {
      type: `number`,
      sql: `${count} * 2`,
    },
  },
});

Prior to v0.33, this property was called shown.

The public property is used to manage the visibility of a cube. Valid values for public are true and false. When set to false, this cube cannot be queried through the API. Defaults to true.

YAML
JS
cubes:
  - name: orders
    sql_table: public.orders
    public: false

To learn more about using public to control visibility based on security context, read the Controlling access to cubes and views recipe.

Cube's caching layer uses refresh_key queries to get the current version of content for a specific cube. If a query result changes, Cube will invalidate all queries that rely on that cube.

The default values for refresh_key are

  • every: '2 minute' for BigQuery, Athena, Snowflake, and Presto.
  • every: '10 second' for all other databases.

Refresh key of a query is a concatenation of all cubes refresh keys involved in query. For rollup queries pre-aggregation table name is used as a refresh key.

You can set up a custom refresh check SQL by changing refresh_key property. Often, a MAX(updated_at_timestamp) for OLTP data is a viable option, or examining a metadata table for whatever system is managing the data to see when it last ran. timestamp in that case.

YAML
JS
cubes:
  - name: order_facts
    sql_table: orders
    refresh_key:
      sql: SELECT MAX(updated_at_timestamp) FROM orders

You can use interval-based refresh_key. For example:

YAML
JS
cubes:
  - name: order_facts
    sql_table: orders
    refresh_key:
      every: 1 hour

every - can be set as an interval with granularities second, minute, hour, day, and week or accept CRON string with some limitations. If you set every as CRON string, you can use the timezone property.

For example:

YAML
JS
cubes:
  - name: order_facts
    sql_table: orders

    refresh_key:
      every: 30 5 * * 5
      timezone: America/Los_Angeles

every can accept only equal time intervals - so "Day of month" and "month" intervals in CRON expressions are not supported.

Cube supports two different formats of CRON expressions: standard and advanced with support for seconds.

Such refresh_key is just a syntactic sugar over refresh_key SQL. It's guaranteed that refresh_key change it's value at least once during every interval. It will be converted to appropriate SQL select which value will change over time based on interval value. Values of interval based refresh_key are tried to be checked ten times within defined interval but not more than once per 1 second and not less than once per 5 minute. For example if interval is 10 minute it's refreshKeyRenewalThreshold will be 60 seconds and generated refresh_key SQL (Postgres) would be:

SELECT FLOOR(EXTRACT(EPOCH FROM NOW()) / 600)

For 5 second interval refreshKeyRenewalThreshold will be just 1 second and SQL will be:

SELECT FLOOR(EXTRACT(EPOCH FROM NOW()) / 5)

Supported cron formats

  • Standard cron syntax
*    *    *    *    *
┬    ┬    ┬    ┬    ┬
│    │    │    │    |
│    │    │    │    └ day of week (0 - 7) (0 or 7 is Sun)
│    │    │    └───── month (1 - 12)
│    │    └────────── day of month (1 - 31, L)
│    └─────────────── hour (0 - 23)
└──────────────────── minute (0 - 59)
  • Advanced cron format with support for seconds
*    *    *    *    *    *
┬    ┬    ┬    ┬    ┬    ┬
│    │    │    │    │    |
│    │    │    │    │    └ day of week (0 - 7) (0 or 7 is Sun)
│    │    │    │    └───── month (1 - 12)
│    │    │    └────────── day of month (1 - 31, L)
│    │    └─────────────── hour (0 - 23)
│    └──────────────────── minute (0 - 59)
└───────────────────────── second (0 - 59, optional)

Set this flag to true if you want Cube to rewrite your queries after final SQL has been generated. This may be helpful to apply filter pushdown optimizations or reduce unnecessary query nesting. For example:

YAML
JS
cubes:
  - name: tickets
    # ...

    rewrite_queries: true

The sql parameter specifies the SQL that will be used to generate a table that will be queried by a cube. It can be any valid SQL query, but usually it takes the form of a SELECT * FROM my_table query. Please note that you don't need to use GROUP BY in a SQL query on the cube level. This query should return a plain table, without aggregations.

YAML
JS
cubes:
  - name: orders
    sql: SELECT * FROM orders

With JavaScript models, you can also reference other cubes' SQL statements for code reuse:

cube(`companies`, {
  sql: `
    SELECT users.company_name, users.company_id
    FROM ${users.sql()} AS users
  `,
});

It is recommended to prefer the sql_table property over the sql property for all cubes that are supposed to use queries like this: SELECT * FROM table.

The sql_table property is used as a concise way for defining a cube that uses a query like this: SELECT * FROM table. Instead of using the sql property, use sql_table with the table name that this cube will query.

YAML
JS
cubes:
  - name: orders
    sql_table: public.orders

Use sql_alias when auto-generated cube alias prefix is too long and truncated by DB such as Postgres:

YAML
JS
cubes:
  - name: order_facts
    sql_table: orders
    sql_alias: ofacts

It'll generate aliases for members such as ofacts__count. sql_alias affects all member names including pre-aggregation table names.

Use title to change the display name of the cube. By default, Cube will humanize the cube's name, so for instance, users_orders would become Users Orders. If default humanizing doesn't work in your case, please use the title parameter. It is highly recommended to give human readable names to your cubes. It will help everyone on a team better understand the data structure and will help maintain a consistent set of definitions across an organization.

YAML
JS
cubes:
  - name: orders
    sql_table: orders
    title: Product Orders

FILTER_PARAMS allows you to use filter values during SQL generation. You can add it for any valid SQL expression as in the case of dimensions. This is useful if you want to hint your database optimizer to use a specific index or filter out partitions or shards in your cloud data warehouse so you won't be billed for scanning those.

It has the following structure:

FILTER_PARAMS.<CUBE_NAME>.<FILTER_NAME>.filter(expression)

Heavy usage of FILTER_PARAMS is considered a bad practice. It usually leads to hard to maintain data model.

Good rule of thumb is to use FILTER_PARAMS only for predicate pushdown performance optimizations.

If you find yourself relying a lot on FILTER_PARAMS, it might mean you'll need to

  1. Rethink your data modeling and potentially move some transformations upstream
  2. Reconsider choice of underlying database

The filter() function accepts the expression, which could be either String or Function. See the examples below.

YAML
JS
cubes:
  - name: order_facts
    sql: >
      SELECT *
      FROM orders
      WHERE {FILTER_PARAMS.order_facts.date.filter('date')}

    measures:
      - name: count
        type: count

    dimensions:
      - name: date
        sql: date
        type: time

This will generate the following SQL:

SELECT *
FROM orders
WHERE
  date >= '2018-01-01 00:00:00' AND
  date <= '2018-12-31 23:59:59'

for the ['2018-01-01', '2018-12-31'] date range passed for the order_facts.date dimension as in following query:

{
  "measures": ["order_facts.count"],
  "time_dimensions": [{
    "dimension": "order_facts.date",
    "granularity": "day",
    "dateRange": ["2018-01-01", "2018-12-31"]
  }]
}

You can also pass a function instead of an SQL expression as a filter() argument. This way you can add BigQuery sharding filtering for events, which will reduce your billing cost.

NOTE: When you're passing function to the filter() function, params are passed as string parameters from driver and it's your responsibility to handle type conversions in this case.

cube(`events`, {
  sql: `
    SELECT *
    FROM schema.\`events*\`
    WHERE ${FILTER_PARAMS.events.date.filter((from, to) => `
      _TABLE_SUFFIX >= FORMAT_TIMESTAMP('%Y%m%d', TIMESTAMP(${from})) AND
      _TABLE_SUFFIX <= FORMAT_TIMESTAMP('%Y%m%d', TIMESTAMP(${to}))
    `)}
  `,

  dimensions: {
    date: {
      sql: `date`,
      type: `time`,
    },
  },
});

SECURITY_CONTEXT is deprecated and may be removed in the upcoming versions. Use queryRewrite instead.

SECURITY_CONTEXT is a user security object that is passed by the Cube Client. Please read the Security Context page for more information on how to set SECURITY_CONTEXT.

Security context is suitable for the row level security implementation. For example, if you have an orders table that contains an email field you can restrict all queries to render results that belong only to the current user as follows:

YAML
JS
cubes:
  - name: orders
    sql: >
      SELECT *
      FROM orders
      WHERE {SECURITY_CONTEXT.email.filter('email')}

    dimensions:
      - name: date
        sql: date
        type: time

To ensure filter value presents for all requests requiredFilter can be used:

YAML
JS
cubes:
  - name: orders
    sql: >
      SELECT *
      FROM orders
      WHERE {SECURITY_CONTEXT.email.requiredFilter('email')}

    dimensions:
      - name: date
        sql: date
        type: time

You can access values of context variables directly in JavaScript in order to use it during your SQL generation. For example:

Use of this feature entails SQL injection security risk. Use it with caution.

cube(`orders`, {
  sql: `
    SELECT *
    FROM ${SECURITY_CONTEXT.type.unsafeValue() === "employee" ? "employee" : "public"}.orders
  `,

  dimensions: {
    date: {
      sql: `date`,
      type: `time`,
    },
  },
});

convertTz

In case you need to convert your timestamp to user request timezone in cube or member SQL you can use SQL_UTILS.convertTz() method. Note that Cube will automatically convert timezones for timeDimensions fields in queries.

Dimensions that use SQL_UTILS.convertTz() should not be used as timeDimensions in queries. Doing so will apply the conversion multiple times and yield wrong results.

In case the same database field needs to be queried in dimensions and timeDimensions, create dedicated dimensions in the cube definition for the respective use:

YAML
JS
cubes:
  - name: visitors
    # ...

    dimensions:
      # Do not use in timeDimensions query property
      - name: created_at_converted
        sql: "{SQL_UTILS.convertTz(`created_at`)}"
        type: time

      # Use in timeDimensions query property
      - name: created_at
        sql: created_at
        type: time

There's a global COMPILE_CONTEXT that captured as RequestContext at the time of data model compilation. It contains securityContext and any other variables provided by extendContext.

While securityContext defined in COMPILE_CONTEXT it doesn't change it's value for different users. It may change however for different tenants.

YAML
JS
cubes:
  - name: users
    sql_table: "user_{COMPILE_CONTEXT.security_context.deployment_id}.users"

Did you find this page useful?