Lambda pre-aggregations
Lambda pre-aggregations follow the Lambda architecture (opens in a new tab) design to union real-time and batch data. Cube acts as a serving layer and uses pre-aggregations as a batch layer and source data or other pre-aggregations, usually streaming, as a speed layer. Due to this design, lambda pre-aggregations only work with data that is newer than the existing batched pre-aggregations.
Lambda pre-aggregations only work with Cube Store.
Use cases
Below we are looking at the most common examples of using lambda pre-aggregations.
Batch and source data
Batch data is coming from pre-aggregation and real-time data is coming from the data source.
First, you need to create pre-aggregations that will contain your batch data. In
the following example, we call it batch
. Please note, it must have a
time_dimension
and partition_granularity
specified. Cube will use these
properties to union batch data with freshly-retrieved source data.
You may also control the batch part of your data with the build_range_start
and build_range_end
properties of a pre-aggregation to determine a specific
window for your batched data.
Next, you need to create a lambda pre-aggregation. To do that, create
pre-aggregation with type rollup_lambda
, specify rollups you would like to use
with rollups
property, and finally set union_with_source_data: true
to use
source data as a real-time layer.
Please make sure that the lambda pre-aggregation definition comes first when defining your pre-aggregations.
cubes:
- name: users
# ...
pre_aggregations:
- name: lambda
type: rollup_lambda
union_with_source_data: true
rollups:
- CUBE.batch
- name: batch
measures:
- users.count
dimensions:
- users.name
time_dimension: users.created_at
granularity: day
partition_granularity: day
build_range_start:
sql: SELECT '2020-01-01'
build_range_end:
sql: SELECT '2022-05-30'
Batch and streaming data
In this scenario, batch data is comes from one pre-aggregation and real-time data comes from a streaming pre-aggregation.
You can use lambda pre-aggregations to combine data from multiple pre-aggregations, where one pre-aggregation can have batch data and another streaming.
cubes:
- name: streaming_users
# This cube uses a streaming SQL data source such as ksqlDB
# ...
pre_aggregations:
- name: streaming
type: rollup
measures:
- CUBE.count
dimensions:
- CUBE.name
time_dimension: CUBE.created_at
granularity: day,
partition_granularity: day
- name: users
# This cube uses a data source such as ClickHouse or BigQuery
# ...
pre_aggregations:
- name: batch_streaming_lambda
type: rollup_lambda
rollups:
- users.batch
- streaming_users.streaming
- name: batch
type: rollup
measures:
- users.count
dimensions:
- users.name
time_dimension: users.created_at
granularity: day
partition_granularity: day
build_range_start:
sql: SELECT '2020-01-01'
build_range_end:
sql: SELECT '2022-05-30'