Reference
Data modeling
Joins

Joins

The joins parameter declares a block to define relationships between cubes. It allows users to access and compare fields from two or more cubes at the same time.

YAML
JavaScript
cubes:
  - name: my_cube
    # ...
 
    joins:
      - name: target_cube
        relationship: one_to_one || one_to_many || many_to_one
        sql: SQL ON clause

All joins are generated as LEFT JOIN. The cube which defines the join serves as a main table, and any cubes referenced inside the joins property are used in the LEFT JOIN clause. Learn more about direction of joins here.

The semantics of INNER JOIN can be achieved with additional filtering. For example, a simple check of whether the column value IS NOT NULL by using set filter satisfies this requirement.

There's also no way to define FULL OUTER JOIN and RIGHT OUTER JOIN for the sake of join modeling simplicity. To get RIGHT OUTER JOIN semantics just define join from other side of relationship. The FULL OUTER JOIN can be built inside cube sql parameter. Quite frequently, FULL OUTER JOIN is used to solve Data Blending or similar problems. In that case, it's best practice to have a separate cube for such an operation.

Parameters

name

The name must match the name of the joined cube and, thus, follow the naming conventions.

For example, when the products cube is joined on to the orders cube, we would define the join as follows:

YAML
JavaScript
cubes:
  - name: orders
    # ...
 
    joins:
      - name: products
        relationship: many_to_one
        sql: "{CUBE.id} = {products.order_id}"

relationship

The relationship property is used to describe the type of the relationship between joined cubes. It’s important to properly define the type of relationship so Cube can accurately calculate measures.

The cube that declares the join is considered left in terms of the left join (opens in a new tab) semantics, and the joined cube is considered right. It means that all rows of the left cube are selected, while only those rows of the right cube that match the condition are selected as well. For more information and specific examples, please see join directions.

The join does not need to be defined on both cubes, but the definition can affect the join direction.

You can use the following types of relationships:

The types of relationships listed above were introduced in v0.32.19 for clarity as they are commonly used in the data space. The following aliases were used before and are still valid, so there's no need to update existing data models:

  • one_to_one was known as has_one or hasOne
  • one_to_many was known as has_many or hasMany
  • many_to_one was known as belongs_to or belongsTo

One-to-one

The one_to_one type indicates a one-to-one (opens in a new tab) relationship between the declaring cube and the joined cube. It means that one row in the declaring cube can match only one row in the joined cube.

For example, in a data model containing users and their profiles, the users cube would declare the following join:

YAML
JavaScript
cubes:
  - name: users
    # ...
 
    joins:
      - name: profiles
        relationship: one_to_one
        sql: "{users}.id = {profiles.user_id}"

One-to-many

The one_to_many type indicates a one-to-many (opens in a new tab) relationship between the declaring cube and the joined cube. It means that one row in the declaring cube can match many rows in the joined cube.

For example, in a data model containing authors and the books they have written, the authors cube would declare the following join:

YAML
JavaScript
cubes:
  - name: authors
    # ...
 
    joins:
      - name: books
        relationship: one_to_many
        sql: "{authors}.id = {books.author_id}"

Many-to-one

The many_to_one type indicates the many-to-one relationship between the declaring cube and the joined cube. You’ll often find this type of relationship on the opposite side of the one-to-many (opens in a new tab) relationship. It means that one row in the declaring cube matches a single row in the joined cube, while a row in the joined cube can match many rows in the declaring cube.

For example, in a data model containing orders and customers who made them, the orders cube would have the following join:

YAML
JavaScript
cubes:
  - name: orders
    # ...
 
    joins:
      - name: customers
        relationship: many_to_one
        sql: "{orders}.customer_id = {customers.id}"

sql

sql is necessary to indicate a related column between cubes. It is important to properly specify a matching column when creating joins. Take a look at the example below:

YAML
JavaScript
cubes:
  - name: orders
    # ...
 
    joins:
      - name: customers
        relationship: many_to_one
        sql: "{orders}.customer_id = {customers.id}"

Setting a primary key

In order for a join to work, it is necessary to define a primary_key as specified below. It is a requirement when a join is defined so that Cube can handle row multiplication issues such as chasm and fan traps.

Let's imagine you want to calculate Order Amount by Order Item Product Name. In this case, Order rows will be multiplied by the Order Item join due to the one_to_many relationship. In order to produce correct results, Cube will select distinct primary keys from Order first and then will join these primary keys with Order to get the correct Order Amount sum result. Please note that primary_key should be defined in the dimensions section.

YAML
JavaScript
cubes:
  - name: orders
    # ...
 
    dimensions:
      - name: customer_id
        sql: id
        type: number
        primary_key: true

Setting primary_key to true will change the default value of the public parameter to false. If you still want public to be true — set it manually.

YAML
JavaScript
cubes:
  - name: orders
    # ...
 
    dimensions:
      - name: customer_id
        sql: id
        type: number
        primary_key: true
        public: true

If you don't have a single column in a cube's table that can act as a primary key, you can create a composite primary key as shown below.

The example uses Postgres string concatenation; note that SQL may be different depending on your database.

YAML
JavaScript
cubes:
  - name: users
    # ...
 
    dimensions:
      - name: id
        sql:
          "{CUBE}.user_id || '-' || {CUBE}.signup_week || '-' ||
          {CUBE}.activity_week"
        type: string
        primary_key: true

Chasm and fan traps

Cube automatically detects chasm and fan traps based on the many_to_one and one_to_many relationships defined in join. When detected, Cube generates a deduplication query that evaluates all distinct primary keys within the multiplied measure's cube and then joins distinct primary keys to this cube on itself to calculate the aggregation result. If there's more than one multiplied measure in a query, then such query is generated for every such multiplied measure, and results are joined. Cube solves for chasm and fan traps during query time. If there's pre-aggregregation that fits measure multiplication requirements it'd be leveraged to serve such a query. Such pre-aggregations and queries are always considered non-additive for the purpose of pre-aggregation matching.

Let's consider an example data model:

YAML
JavaScript
cubes:
  - name: orders
    sql_table: orders
 
    dimensions:
      - name: id
        sql: id
        type: number
        primary_key: true
      - name: city
        sql: city
        type: string
 
    joins:
      - name: customers
        relationship: many_to_one
        sql: "{orders}.customer_id = {customers.id}"
 
- name: customers
    sql_table: customers
 
    dimensions:
      - name: id
        sql: id
        type: number
        primary_key: true
 
    measures:
      - name: average_age
        sql: age
        type: avg
 

If we try to query customers.average_age by orders.city, the Cube detects that the average_age measure in the customers cube would be multiplied by orders to customers and would generate SQL similar to:

SELECT
  "keys"."orders__city",
  avg("customers_key__customers".age) "customers__average_age"
FROM
  (
    SELECT
      DISTINCT "customers_key__orders".city "orders__city",
      "customers_key__customers".id "customers__id"
    FROM
      orders AS "customers_key__orders"
      LEFT JOIN customers AS "customers_key__customers" ON "customers_key__orders".customer_id = "customers_key__customers".id
  ) AS "keys"
  LEFT JOIN customers AS "customers_key__customers" ON "keys"."customers__id" = "customers_key__customers".id
GROUP BY
  1

CUBE reference

When you have several joined cubes, you should accurately use columns’ names to avoid any mistakes. One way to make no mistakes is to use the CUBE reference. It allows you to specify columns’ names in cubes without any ambiguity. During the implementation of the query, this reference will be used as an alias for a basic cube. Take a look at the following example:

YAML
JavaScript
cubes:
  - name: users
    # ...
 
    dimensions:
      - name: name
        sql: "{CUBE}.name"
        type: string

Transitive joins

Join graph is directed and a → b join is different from b → a. Learn more about it here.

Cube automatically takes care of transitive joins. For example, consider the following data model:

YAML
JavaScript
cubes:
  - name: a
    # ...
 
    joins:
      - name: b
        sql: "{a}.b_id = {b.id}"
        relationship: many_to_one
 
    measures:
      - name: count
        type: count
 
  - name: b
    # ...
 
    joins:
      - name: c
        sql: "{b}.c_id = {c.id}"
        relationship: many_to_one
 
  - name: c
    # ...
 
    dimensions:
      - name: category
        sql: category
        type: string

Assume that the following query is run:

{
  "measures": ["a.count"],
  "dimensions": ["c.category"]
}

Joins a → b and b → c will be resolved automatically. Cube uses the Dijkstra algorithm (opens in a new tab) to find a join path between cubes given requested members.

In case there are multiple join paths that can be used to join the same set of cubes, Cube will collect cube names from members in the following order:

  1. Measures
  2. Dimensions
  3. Segments
  4. Time dimensions

Cube makes join trees as predictable and stable as possible, but this isn't guaranteed in case multiple join paths exist. Please use views to address join predictability and stability.