Joins

The Cube SQL API currently does not support INNER JOIN, LEFT JOIN, RIGHT JOIN and FULL OUTER JOIN. We plan to support these types of joins in future releases.

The SQL API supports CROSS JOINs between cubes. When a CROSS JOIN is being processed by Cube, it generates the correct joining conditions for the underlying data source.

For example, the following query joins the Orders and Products tables under the hood with Orders.product_id = Products.id, exactly the same way as the REST API query does:

cube=> SELECT * FROM Orders CROSS JOIN Products LIMIT 5;
 count | avgValue | totalValue | number | value |  status   |         createdAt          |        completedAt         | __user | count |           name            |            description             |         createdAt          | __user
-------+----------+------------+--------+-------+-----------+----------------------------+----------------------------+--------+-------+---------------------------+------------------------------------+----------------------------+--------
     1 |       20 |         20 |     40 |    20 | completed | 2020-10-26 00:00:00.000000 | 2020-11-07 00:00:00.000000 |        |     1 | Incredible Fresh Chicken  | Electronics Generic Fresh Computer | 2020-06-29 00:00:00.000000 |
     1 |       20 |         20 |     14 |    20 | completed | 2021-02-07 00:00:00.000000 | 2021-03-02 00:00:00.000000 |        |     1 | Unbranded Wooden Mouse    | Outdoors Incredible Rubber Car     | 2019-07-16 00:00:00.000000 |
     1 |       20 |         20 |     23 |    20 | completed | 2022-07-23 00:00:00.000000 | 2022-08-11 00:00:00.000000 |        |     1 | Handcrafted Plastic Chair | Electronics Sleek Rubber Tuna      | 2021-02-27 00:00:00.000000 |
     1 |       20 |         20 |     86 |    20 | completed | 2023-04-19 00:00:00.000000 | 2023-04-25 00:00:00.000000 |        |     1 | Practical Metal Chicken   | Toys Awesome Frozen Chips          | 2020-07-24 00:00:00.000000 |
     1 |       20 |         20 |     27 |    20 | completed | 2019-06-27 00:00:00.000000 | 2019-07-21 00:00:00.000000 |        |     1 | Sleek Rubber Chair        | Computers Refined Cotton Shirt     | 2021-09-26 00:00:00.000000 |
(5 rows)

In the resulting query plan, you won't see any joins as you can't see those for REST API queries either:

cube=> EXPLAIN SELECT * FROM Orders CROSS JOIN Products LIMIT 5;
   plan_type   |            plan
---------------+-----------------------------
 logical_plan  | CubeScan: request={        +
               |   "measures": [            +
               |     "Orders.count",        +
               |     "Orders.avgValue",     +
               |     "Orders.totalValue",   +
               |     "Orders.number",       +
               |     "Products.count"       +
               |   ],                       +
               |   "dimensions": [          +
               |     "Orders.value",        +
               |     "Orders.status",       +
               |     "Orders.createdAt",    +
               |     "Orders.completedAt",  +
               |     "Products.name",       +
               |     "Products.description",+
               |     "Products.createdAt"   +
               |   ],                       +
               |   "segments": [],          +
               |   "limit": 5               +
               | }
 physical_plan | CubeScanExecutionPlan      +
               |
(2 rows)

This feature allows you to CROSS JOIN cubes even with transitive joins only.

Typically, in tools that allow defining custom SQL datasets, you'd use joined tables as a dataset SQL. For example:

SELECT o.count as count, p.name as product_name, p.description as product_description
FROM Orders o
CROSS JOIN Products p;

Please note we use aliasing to avoid name clashing between cube members in a resulting data set. In this case, wrapped SQL will be properly processed by Cube, pushing down all operations to Cube query:

cube=> SELECT product_name, SUM(count) FROM (
  SELECT o.count as count, p.name as product_name, p.description as product_description
  FROM Orders o CROSS
  JOIN Products p
) joined
GROUP BY 1
ORDER BY 2 DESC
LIMIT 5;
       product_name       | SUM(joined.count)
--------------------------+-------------------
 Tasty Plastic Mouse      |               121
 Intelligent Cotton Ball  |               119
 Ergonomic Steel Tuna     |               116
 Intelligent Rubber Pants |               116
 Generic Wooden Gloves    |               116
(5 rows)

We can see this by introspecting the EXPLAIN plan for this query:

cube=> EXPLAIN SELECT product_name, SUM(count) FROM (
  SELECT o.count as count, p.name as product_name, p.description as product_description
  FROM Orders o
  CROSS JOIN Products p
) joined
GROUP BY 1
ORDER BY 2 DESC
LIMIT 5;
   plan_type   |         plan
---------------+-----------------------
 logical_plan  | CubeScan: request={  +
               |   "measures": [      +
               |     "Orders.count"   +
               |   ],                 +
               |   "dimensions": [    +
               |     "Products.name"  +
               |   ],                 +
               |   "segments": [],    +
               |   "order": [         +
               |     [                +
               |       "Orders.count",+
               |       "desc"         +
               |     ]                +
               |   ],                 +
               |   "limit": 5         +
               | }
 physical_plan | CubeScanExecutionPlan+
               |
(2 rows)

Please note even if product_description is in the inner selection, it isn't evaluated in the final query as it isn't used in any way.

As an alternative to achieve joins it is also possible to define proxy dimension or measure inside the Cube.

cube(`Orders`, {
  sql: `SELECT * FROM public.orders`,

  joins: {
    Users: {
      relationship: `belongsTo`,
      sql: `${CUBE}.user_id = ${Users}.id`,
    },
  },

  measures: {
    count: {
      type: `count`,
    },
  },

  dimensions: {
    id: {
      sql: `id`,
      type: `number`,
      primaryKey: true,
    },

    // this is proxy dimension
    user_city: {
      sql: `${Users.city}`,
      type: `string`,
    },
  },
});

cube(`Users`, {
  sql: `SELECT * FROM public.users`,

  measures: {},

  dimensions: {
    id: {
      sql: `id`,
      type: `number`,
      primaryKey: true,
    },

    city: {
      sql: `city`,
      type: `string`,
    },
  },
});

Now, it is possible to get orders count by users city with the following query.

cube=> SELECT count, user_city FROM Orders;
 count |   user_city
-------+---------------
  9524 | New York
  9408 | San Francisco
  6360 | Mountain View
  6262 | Seattle
  4393 | Los Angeles
  3183 | Chicago
  3060 | Austin
  1804 | Palo Alto
(8 rows)

Did you find this page useful?