Cubes

A cube represents a table of data in Cube.

Cubes are typically declared in separate files with one cube per file. Within each cube are definitions of measures, dimensions, hierarchies, segments, joins between cubes, pre-aggregations, and access policies.


cube(`users`, {
  sql_table: `users`,
 
  joins: {
    organizations: {
      relationship: `many_to_one`,
      sql: `${users.organization_id} = ${organizations.id}`
    }
  },
 
  measures: {
    count: {
      type: `count`,
      sql: `id`
    }
  },
 
  dimensions: {
    organization_id: {
      sql: `organization_id`,
      type: `number`,
      primary_key: true
    },
 
    created_at: {
      sql: `created_at`,
      type: `time`
    },
 
    country: {
      sql: `country`,
      type: `string`
    }
  }
})


cubes:
  - name: users
    sql_table: users
 
    joins:
      - name: organizations
        relationship: many_to_one
        sql: "{CUBE.organization_id} = {organizations.id}"
 
    measures:
      - name: count
        type: count
        sql: id
 
    dimensions:
      - name: organization_id
        sql: organization_id
        type: number
        primary_key: true
 
      - name: created_at
        sql: created_at
        type: time
 
      - name: country
        sql: country
        type: string

Parameters

`name`

The name parameter serves as the identifier of a cube. It must be unique among all cubes and views within a deployment and follow the naming conventions.


cube(`orders`, {
  sql_table: orders
})


cubes:
  - name: orders
    sql_table: orders

`sql_alias`

Use sql_alias when auto-generated cube alias prefix is too long and truncated by databases such as Postgres:


cube(`order_facts_about_literally_everything_in_the_world`, {
  sql_table: `orders`,
  sql_alias: `order_facts`
})


cubes:
  - name: order_facts_about_literally_everything_in_the_world
    sql_table: orders
    sql_alias: order_facts

It’ll generate aliases for members such as order_facts__count. sql_alias affects all member names including pre-aggregation table names.

`extends`

You can use the extends parameter to extend cubes in order to reuse all declared members of a cube.

In the example below, extended_order_facts will reuse the sql and count measures from order_facts:


cube(`order_facts`, {
  sql_table: `orders`,
 
  measures: {
    count: {
      type: `count`,
      sql: `id`
    }
  }
})
 
cube(`extended_order_facts`, {
  extends: order_facts,
 
  measures: {
    double_count: {
      type: `number`,
      sql: `${count} * 2`
    }
  }
})


cubes:
  - name: order_facts
    sql_table: orders
 
    measures:
      - name: count
        type: count
        sql: id
 
  - name: extended_order_facts
    extends: order_facts
 
    measures:
      - name: double_count
        type: number
        sql: "{count} * 2"

You can also omit the cube name while defining a cube in JavaScript. This way, Cube doesn’t register this cube globally; instead it returns a reference which you can use while combining cubes. It makes sense to use it for dynamic data model generation and reusing with extends. Previous example without defining order_facts cube globally:


const order_facts = cube({
  sql: `orders`,
 
  measures: {
    count: {
      type: `count`,
      sql: `id`
    }
  }
})
 
cube(`extended_order_facts`, {
  extends: order_facts,
 
  measures: {
    double_count: {
      type: `number`,
      sql: `${count} * 2`
    }
  }
})

`data_source`

Each cube can have its own data_source name to support scenarios where data should be fetched from multiple databases. The value of the data_source parameter will be passed to the driverFactory() function as part of the context parameter. By default, each cube has a default value for its data_source; to override it you can use:


cube(`order_facts`, {
  data_source: `prod_db`,
  sql_table: `orders`
})


cubes:
  - name: order_facts
    data_source: prod_db
    sql_table: orders

`sql`

The sql parameter specifies the SQL that will be used to generate a table that will be queried by a cube. It can be any valid SQL query, but usually it takes the form of a SELECT * FROM my_table query. Please note that you don’t need to use GROUP BY in a SQL query on the cube level. This query should return a plain table, without aggregations.


cube(`orders`, {
  sql: `SELECT * FROM orders`
})


cubes:
  - name: orders
    sql: SELECT * FROM orders

You can also reference other cubes’ SQL statements for code reuse using the {cube.sql()} function:


cube(`companies`, {
  sql: `
    SELECT
      users.company_name,
      users.company_id
    FROM ${users.sql()} AS users
  `
})


cubes:
  - name: companies
    sql:
      SELECT
        users.company_name,
        users.company_id
      FROM {users.sql()} AS users

It is recommended to prefer the sql_table parameter over the sql parameter for all cubes that are supposed to use queries like this: SELECT * FROM table.

`sql_table`

The sql_table parameter is used as a concise way for defining a cube that uses a query like this: SELECT * FROM table. Instead of using the sql parameter, use sql_table with the table name that this cube will query.


cube(`orders`, {
  sql_table: `public.orders`
})


cubes:
  - name: orders
    sql_table: public.orders

`title`

Use the title parameter to change the display name of the cube.

By default, Cube will humanize the cube’s name, so for instance, users_orders would become Users Orders. If default humanizing doesn’t work in your case, please use the title parameter. It is highly recommended to give human readable names to your cubes. It will help everyone on a team better understand the data structure and will help maintain a consistent set of definitions across an organization.


cube(`orders`, {
  sql_table: `orders`,
  title: `Product Orders`
})


cubes:
  - name: orders
    sql_table: orders
    title: Product Orders

`description`

This parameter provides a human-readable description of a cube. When applicable, it will be displayed in Playground and exposed to data consumers via APIs and integrations.

A description can give a hint both to your team and end users, making sure they interpret the data correctly.


cube(`orders`, {
  sql_table: `orders`,
  title: `Product Orders`,
  description: `All orders-related information`
})


cubes:
  - name: orders
    sql_table: orders
    title: Product Orders
    description: All orders-related information

`public`

The public parameter is used to manage the visibility of a cube. Valid values for public are true and false. When set to false, this cube cannot be queried through the API. Defaults to true.


cube(`orders`, {
  sql_table: `public.orders`,
  public: false
})


cubes:
  - name: orders
    sql_table: public.orders
    public: false

To learn more about using public to control visibility based on security context, read the Controlling access to cubes and views recipe.

`refresh_key`

Cube’s caching layer uses refresh_key queries to get the current version of content for a specific cube. If a query result changes, Cube will invalidate all queries that rely on that cube.

The default values for refresh_key are

every: '2 minute' for BigQuery, Athena, Snowflake, Presto, and Firebolt.
every: '10 second' for all other databases.

Refresh key of a query is a concatenation of all cubes refresh keys involved in query. For rollup queries pre-aggregation table name is used as a refresh key.

You can set up a custom refresh check SQL by changing the refresh_key parameter. Often, a MAX(updated_at_timestamp) for OLTP data is a viable option, or examining a metadata table for whatever system is managing the data to see when it last ran. timestamp in that case.


cube(`order_facts`, {
  sql_table: `orders`,
 
  // With this refresh_key Cube will only refresh the data if
  // the value of previous MAX(updated_at_timestamp) changed.
  // By default Cube will check this refreshKey every 10 seconds
  refresh_key: {
    sql: `SELECT MAX(updated_at_timestamp) FROM orders`
  }
})


cubes:
  - name: order_facts
    sql_table: orders
    refresh_key:
      sql: SELECT MAX(updated_at_timestamp) FROM orders

You can use interval-based refresh_key. For example:


cube(`order_facts`, {
  sql_table: `orders`,
 
  refresh_key: {
    every: `1 hour`
  }
})


cubes:
  - name: order_facts
    sql_table: orders
    refresh_key:
      every: 1 hour

The every parameter guarantees that the refresh key will be executed at least once during the specified interval. However, it does not guarantee that it will be executed at most once. The refresh key may be checked more frequently.

When using every with sql, the purpose is to reduce the load from running the refresh key query itself, not to minimize the number of cache refreshes. The refresh key SQL should be designed to return consistent results and only change when the underlying data needs to be updated, ensuring minimal refreshes even if the query is executed multiple times.

every - can be set as an interval with granularities second, minute, hour, day, and week or accept CRON string with some limitations. If you set every as CRON string, you can use the timezone parameter. It takes precedence over the query timezone.

For example:


cube(`order_facts`, {
  sql_table: `orders`,
 
  refresh_key: {
    every: "30 5 * * 5",
    timezone: "America/Los_Angeles"
  }
})


cubes:
  - name: order_facts
    sql_table: orders
 
    refresh_key:
      every: 30 5 * * 5
      timezone: America/Los_Angeles

every can accept only equal time intervals - so “Day of month” and “month” intervals in CRON expressions are not supported.

Cube supports two different formats of CRON expressions: standard and advanced with support for seconds.

Such refresh_key is just a syntactic sugar over refresh_key SQL. It’s guaranteed that refresh_key change it’s value at least once during every interval. It will be converted to appropriate SQL select which value will change over time based on interval value. Values of interval based refresh_key are tried to be checked ten times within defined interval but not more than once per 1 second and not less than once per 5 minute. For example if interval is 10 minute it’s refreshKeyRenewalThreshold will be 60 seconds and generated refresh_key SQL (Postgres) would be:


SELECT FLOOR(EXTRACT(EPOCH FROM NOW()) / 600)

For 5 second interval refreshKeyRenewalThreshold will be just 1 second and SQL will be:


SELECT FLOOR(EXTRACT(EPOCH FROM NOW()) / 5)

Supported cron formats

Standard cron syntax


*    *    *    *    *
┬    ┬    ┬    ┬    ┬
│    │    │    │    |
│    │    │    │    └ day of week (0 - 7) (0 or 7 is Sun)
│    │    │    └───── month (1 - 12)
│    │    └────────── day of month (1 - 31, L)
│    └─────────────── hour (0 - 23)
└──────────────────── minute (0 - 59)

Advanced cron format with support for seconds


*    *    *    *    *    *
┬    ┬    ┬    ┬    ┬    ┬
│    │    │    │    │    |
│    │    │    │    │    └ day of week (0 - 7) (0 or 7 is Sun)
│    │    │    │    └───── month (1 - 12)
│    │    │    └────────── day of month (1 - 31, L)
│    │    └─────────────── hour (0 - 23)
│    └──────────────────── minute (0 - 59)
└───────────────────────── second (0 - 59, optional)

`meta`

Custom metadata. Can be used to pass any information to the frontend.


cube(`orders`, {
  sql_table: `orders`,
  title: `Product Orders`,
  meta: {
    any: `value`
  }
})


cubes:
  - name: orders
    sql_table: orders
    title: Product Orders
    meta:
      any: value

`calendar`

The calendar parameter is used to mark calendar cubes. It’s set to false by default.

When set to true, Cube will treat this cube as a calendar cube and allow to override time-shifts and granularities on its time dimensions. This can be useful for time-shift calculations with a custom calendar.

`pre_aggregations`

The pre_aggregations parameter is used to configure pre-aggregations.

`joins`

The joins parameter is used to configure joins.

`dimensions`

The dimensions parameter is used to configure dimensions.

`hierarchies`

The hierarchies parameter is used to configure hierarchies.

`segments`

The segments parameter is used to configure segments.

`measures`

The measures parameter is used to configure measures.

`access_policy`

The access_policy parameter is used to configure access policies.

Was this page useful?