Joins

The joins parameter declares a block to define relationships between cubes. It allows users to access and compare fields from two or more cubes at the same time.

cube('MyCube', {
  joins: {
    TargetCubeName: {
      relationship: `belongsTo` || `hasMany` || `hasOne`,
      sql: `SQL ON clause`,
    },
  },
});

All joins are generated as LEFT JOIN. The cube which defines the join serves as a main table, and any cubes referenced inside the joins property are used in the LEFT JOIN clause. Learn more about direction of joins here.

The semantics of INNER JOIN can be achieved with additional filtering. For example, a simple check of whether the column value IS NOT NULL by using set filter satisfies this requirement.

There's also no way to define FULL OUTER JOIN and RIGHT OUTER JOIN for the sake of join modeling simplicity. To get RIGHT OUTER JOIN semantics just define join from other side of relationship. The FULL OUTER JOIN can be built inside cube sql parameter. Quite frequently, FULL OUTER JOIN is used to solve Data Blending or similar problems. In that case, it's best practice to have a separate cube for such an operation.

The name of a join should match the name of the external cube. For example when a Products cube is being joined on to an Orders cube, we would define the join as follows:

cube('Orders', {
  joins: {
    Products: {
      relationship: `belongsTo`,
      sql: `${CUBE.id} = Products.orderId`,
    },
  },
});

relationship enables you to describe the join relationship between joined cubes. It’s important to properly define the type of relationship in order for Cube.js to accurately calculate measures. The relationship does not need to be defined on both cubes, but the definition can affect the join direction.

It is very important to define the correct order of cubes in a join. It affects data in the result-set greatly. The basic cube represents the left entity in a join, all others would be right. That means that all rows of the left cube are selected, while rows of the right depend on the condition. For more information and specific examples, please read about join directions here.

The three possible values for a relationship are:

hasOne

A hasOne relationship indicates a one-to-one connection with another cube. This relationship indicates that the one row in the cube can match only one row in the joined cube. For example, in a model containing users and user profiles, the users cube would have the following join:

cube('Users', {
  joins: {
    Profile: {
      relationship: `hasOne`,
      sql: `${Users}.id = ${Profile}.user_id`,
    },
  },
});

hasMany

A hasMany relationship indicates a one-to-many connection with another cube. You'll often find this relationship on the "other side" of a belongsTo relationship. This relationship indicates that the one row in the cube can match many rows in the joined cube. For example, in a model containing authors and books, the authors cube would have the following join:

cube('Authors', {
  joins: {
    Books: {
      relationship: `hasMany`,
      sql: `${Authors}.id = ${Books}.author_id`,
    },
  },
});

belongsTo

A belongsTo relationship indicates a many-to-one connection with another cube. You’ll often find this relationship on the “other side” of a hasMany relationship. This relationship indicates that the one row of the declaring cube matches a row in the joined instance, while the joined instance can have many rows in the declaring cube. For example, in a model containing orders and customers, the orders cube would have the following join:

cube('Orders', {
  joins: {
    Customers: {
      relationship: `belongsTo`,
      sql: `${Orders}.customer_id = ${Customers}.id`,
    },
  },
});

sql is necessary to indicate a related column between cubes. It is important to properly specify a matching column when creating joins. Take a look at the example below:

cube('Orders', {
  joins: {
    Customers: {
      relationship: `belongsTo`,
      // The `customer_id` field on `Orders` corresponds to the
      // `id` field on `Customers`
      sql: `${Orders}.customer_id = ${Customers}.id`,
    },
  },
});

In order for a join to work, it is necessary to define a primaryKey as specified below. It is a requirement when a join is defined so that Cube.js can handle row multiplication issues.

Let's imagine you want to calculate Order Amount by Order Item Product Name. In this case, Order rows will be multiplied by the Order Item join due to the hasMany relationship. In order to produce correct results, Cube.js will select distinct primary keys from Order first and then will join these primary keys with Order to get the correct Order Amount sum result. Please note that primaryKey should be defined in the dimensions section.

cube('Orders', {
  dimensions: {
    customerId: {
      sql: `id`,
      type: `number`,
      primaryKey: true,
    },
  },
});

Setting primaryKey to true will change the default value of the shown parameter to false. If you still want shown to be true — set it manually.

cube('Orders', {
  dimensions: {
    customerId: {
      sql: `id`,
      type: `number`,
      primaryKey: true,
      shown: true,
    },
  },
});

If you don't have a single column in a cube's table that can act as a primary key, you can create a composite primary key as shown below.

The example uses Postgres string concatenation; note that SQL may be different depending on your database.

cube('Users', {
  dimensions: {
    id: {
      sql: `${CUBE}.user_id || '-' || ${CUBE}.signup_week || '-' || ${CUBE}.activity_week`,
      type: `string`,
      primaryKey: true,
    },
  },
});

When you have several joined cubes, you should accurately use columns’ names to avoid any mistakes. One way to make no mistakes is to use the ${CUBE} reference. It allows you to specify columns’ names in cubes without any ambiguity. During the implementation of the query, this reference will be used as an alias for a basic cube. Take a look at the following example:

cube('Users', {
  dimensions: {
    name: {
      sql: `${CUBE}.name`,
      type: `string`,
    },
  },
});

Join graph is directed and A-B join is different from B-A. Learn more about it here.

Cube.js automatically takes care of transitive joins. For example if you have the following schema:

cube(`A`, {
  // ...
  joins: {
    B: {
      sql: `${A}.b_id = ${B}.id`,
      relationship: `belongsTo`,
    },
  },

  measures: {
    count: {
      type: `count`,
    },
  },
});

cube(`B`, {
  // ...
  joins: {
    C: {
      sql: `${B}.c_id = ${C}.id`,
      relationship: `belongsTo`,
    },
  },
});

cube(`C`, {
  // ...

  dimensions: {
    category: {
      sql: `category`,
      type: `string`,
    },
  },
});

and the following query:

{
  "measures": ["A.count"],
  "dimensions": ["C.category"]
}

Joins A-B and B-C will be resolved automatically. Cube.js uses Dijkstra algorithm to find join path between cubes given requested members.

Did you find this page useful?