When Cube development was kicked off in open source more than 5 years ago, its codebase consisted of JavaScript only. Now, almost 60% of Cube’s codebase is in Rust, thanks to the SQL API and Cube Store. Over time, Rust has proven to be a fantastic language for data platforms, and we’re extremely happy to be able to use DataFusion, arrow-rs, and egg within Cube.

Today, we’re announcing an effort to bring the Next-Generation Data Modeling Engine (also known as Tesseract) to Cube. Not only this is going to introduce even more Rust code to Cube’s codebase, turning it even further to the Ship of Theseus. It’s also going to bring new capabilities, such as multi-stage calculations and substantially improve the performance of data model operations.

Yet Another Dimension to Cube

Tesseract is aimed at optimizing the way the data model in Cube is compiled, maintained, and used for SQL code generation. With the new data modeling engine, Cube will compile the data model faster, with a lesser memory footprint and with a greater flexibility in the SQL generation for upstream data sources and Cube Store.

The decision to create Tesseract originates from requirements to generate more complex SQL queries from simpler data modeling primitives. Examples would be period-over-period comparisons, percentage of total calculations, level of detail calculations, bucketing calculations, data blending, etc. To support such complex SQL generation logic, we decided to step up in our SQL planning process by redesigning it from the ground up to incorporate accumulated data modeling experience over the years and making it suitable for supporting much more complex scenarios such as multi-stage calculations.

Multi-Stage Calculations

Calculating a double aggregation over a fact table, also known as a nested aggregation, is one of the most annoying data modeling exercises yet it’s extremely useful for defining many popular metrics, such as year-to-date calculations. We’re happy to simplify that with multi-stage calculations in Cube, powered by Tesseract.

Here’s how you can perform period-to-date calculations, such as year-to-date (YTD), quarter-to-date (QTD), or month-to-date (MTD) analysis:

cubes:
- name: test
sql: >
SELECT 1 AS revenue, '2024-01-01'::TIMESTAMP AS time UNION ALL
SELECT 1 AS revenue, '2024-02-01'::TIMESTAMP AS time UNION ALL
SELECT 1 AS revenue, '2024-03-01'::TIMESTAMP AS time UNION ALL
SELECT 1 AS revenue, '2024-04-01'::TIMESTAMP AS time UNION ALL
SELECT 1 AS revenue, '2024-05-01'::TIMESTAMP AS time UNION ALL
SELECT 1 AS revenue, '2024-06-01'::TIMESTAMP AS time UNION ALL
SELECT 1 AS revenue, '2024-07-01'::TIMESTAMP AS time UNION ALL
SELECT 1 AS revenue, '2024-08-01'::TIMESTAMP AS time UNION ALL
SELECT 1 AS revenue, '2024-09-01'::TIMESTAMP AS time UNION ALL
SELECT 1 AS revenue, '2024-10-01'::TIMESTAMP AS time UNION ALL
SELECT 1 AS revenue, '2024-11-01'::TIMESTAMP AS time UNION ALL
SELECT 1 AS revenue, '2024-12-01'::TIMESTAMP AS time
dimensions:
- name: time
sql: time
type: time
measures:
- name: revenue_ytd
sql: revenue
type: sum
rolling_window:
type: to_date
granularity: year
- name: revenue_qtd
sql: revenue
type: sum
rolling_window:
type: to_date
granularity: quarter
- name: revenue_mtd
sql: revenue
type: sum
rolling_window:
type: to_date
granularity: month

Check the result:

Another example would be a percentage of the total calculation. Let’s pretend you want to calculate the percentage of the revenue by product generated by a specific order group. Previously, such calculations required post-processing, but now they can be incorporated into the data model:

cubes:
- name: line_items
# ...
dimensions:
# ...
- name: price
sql: "price::DECIMAL"
type: number
format: currency
measures:
# ...
- name: revenue
sql: "{price}"
type: sum
- name: product_revenue
multi_stage: true
sql: "{revenue}"
type: sum
group_by:
- product_id
- name: product_revenue_percentage
multi_stage: true
sql: "{revenue} / NULLIF({product_revenue}, 0)"
type: number

If you ever stumbled upon a period-over-period comparison calculation problem in Cube, those can be solved either by post-processing or by manually writing SQL and applying data blending practices. Now, you can do it as simply as this:

cubes:
- name: line_items
# ...
dimensions:
# ...
- name: price
sql: "price::DECIMAL"
type: number
format: currency
measures:
# ...
- name: revenue
sql: "{price}"
type: sum
- name: revenue_prior_year
multi_stage: true
sql: "{revenue}"
type: number
time_shift:
- time_dimension: orders.created_at
interval: 1 year
type: prior
- name: revenue_yoy_growth
multi_stage: true
sql: "{revenue} / NULLIF({revenue_prior_year}, 0)"
type: number

What’s Next?

Today, we’re launching Tesseract in preview. Multi-stage calculations are already available in Cube v1.0.0. Please feel welcome to upgrade to the latest version and give the examples above a try. This feature is still in active development, and more multi-stage calculations are coming soon. If you have any questions, you can always join our Slack community. We’ll be happy to hear your feedback there.

If you’d like to experience the rest of Tesseract, please reach out. We are very open to enroll you into a group of early adopters and ensure your success.