Guides
Query acceleration
Joining data from multiple data sources

Joining data from multiple data sources

Use case

Let's imagine we store information about products and their suppliers in separate databases. We want to aggregate data from these data sources while having decent performance. In the recipe below, we'll learn how to create a rollup join between two databases to achieve our goal.

Configuration

First of all, we should define our database connections with the dataSource option in the cube.js configuration file:

module.exports = {
  driverFactory: ({ dataSource }) => {
    if (dataSource === "suppliers") {
      return {
        type: "postgres",
        database: "recipes",
        host: "demo-db-recipes.cube.dev",
        user: "cube",
        password: "12345",
        port: "5432",
      };
    }
 
    if (dataSource === "products") {
      return {
        type: "postgres",
        database: "ecom",
        host: "demo-db-recipes.cube.dev",
        user: "cube",
        password: "12345",
        port: "5432",
      };
    }
 
    throw new Error("dataSource is undefined");
  },
};

Data modeling

First, we'll define rollup pre-aggregations for products and suppliers. Note that these pre-aggregations should contain the dimension on which they're joined. In this case, it's the supplier_id dimension in the products cube, and the id dimension in the suppliers cube:

YAML
JavaScript
cubes:
  - name: products
    # ...
 
    pre_aggregations:
      - name: products_rollup
        type: rollup
        dimensions:
          - name
          - supplier_id
        indexes:
          - name: category_index
            columns:
              - supplier_id
 
    joins:
      suppliers:
        sql: "{supplier_id} = ${suppliers.id}"
        relationship: many_to_one
YAML
JavaScript
cubes:
  - name: suppliers
    # ...
 
    pre_aggregations:
      - name: suppliers_rollup
        type: rollup
        dimensions:
          - id
          - company
          - email
        indexes:
          - name: category_index
            columns:
              - id

Then, we'll also define a rollup_join pre-aggregation in the products cube, which will enable aggregating data from multiple data sources:

YAML
JavaScript
cubes:
  - name: products
    # ...
 
    pre_aggregations:
      - name: combined_rollup
        type: rollup_join
        dimensions:
          - suppliers.email
          - suppliers.company
          - name
        rollups:
          - suppliers.suppliers_rollup
          - products_rollup

Query

Let's get the product names and their suppliers' info, such as company name and email, with the following query:

{
  "order": {
    "products.name": "asc"
  },
  "dimensions": ["products.name", "suppliers.company", "suppliers.email"],
  "limit": 3
}

Result

We'll get the data from two pre-aggregations joined into one rollup_join:

[
  {
    "products.name": "Awesome Cotton Sausages",
    "suppliers.company": "Justo Eu Arcu Inc.",
    "suppliers.email": "id.risus@luctuslobortisClass.net"
  },
  {
    "products.name": "Awesome Fresh Keyboard",
    "suppliers.company": "Quisque Purus Sapien Limited",
    "suppliers.email": "Cras@consectetuercursuset.co.uk"
  },
  {
    "products.name": "Awesome Rubber Soap",
    "suppliers.company": "Tortor Inc.",
    "suppliers.email": "Mauris@ac.com"
  }
]

Source code

Please feel free to check out the full source code (opens in a new tab) or run it with the docker-compose up command. You'll see the result, including queried data, in the console.