Documentation
Lambda pre-aggregations

Lambda pre-aggregations

Lambda pre-aggregations follow the Lambda architecture (opens in a new tab) design to union real-time and batch data. Cube acts as a serving layer and uses pre-aggregations as a batch layer and source data or other pre-aggregations, usually streaming, as a speed layer. Due to this design, lambda pre-aggregations only work with data that is newer than the existing batched pre-aggregations.

Lambda pre-aggregations only work with Cube Store.

Use cases

Below we are looking at the most common examples of using lambda pre-aggregations.

Batch and source data

Batch data is coming from pre-aggregation and real-time data is coming from the data source.

Lambda pre-aggregation batch and source diagram

First, you need to create pre-aggregations that will contain your batch data. In the following example, we call it batch. Please note, it must have a time_dimension and partition_granularity specified. Cube will use these properties to union batch data with freshly-retrieved source data.

You may also control the batch part of your data with the build_range_start and build_range_end properties of a pre-aggregation to determine a specific window for your batched data.

Next, you need to create a lambda pre-aggregation. To do that, create pre-aggregation with type rollup_lambda, specify rollups you would like to use with rollups property, and finally set union_with_source_data: true to use source data as a real-time layer.

Please make sure that the lambda pre-aggregation definition comes first when defining your pre-aggregations.

YAML
JavaScript
cubes:
  - name: users
    # ...
 
    pre_aggregations:
      - name: lambda
        type: rollup_lambda
        union_with_source_data: true
        rollups:
          - CUBE.batch
 
      - name: batch
        measures:
          - users.count
        dimensions:
          - users.name
        time_dimension: users.created_at
        granularity: day
        partition_granularity: day
        build_range_start:
          sql: SELECT '2020-01-01'
        build_range_end:
          sql: SELECT '2022-05-30'

Batch and streaming data

In this scenario, batch data is comes from one pre-aggregation and real-time data comes from a streaming pre-aggregation.

Lambda pre-aggregation batch and streaming diagram

You can use lambda pre-aggregations to combine data from multiple pre-aggregations, where one pre-aggregation can have batch data and another streaming.

YAML
JavaScript
cubes:
  - name: streaming_users
    # This cube uses a streaming SQL data source such as ksqlDB
    # ...
 
    pre_aggregations:
      - name: streaming
        type: rollup
        measures:
          - CUBE.count
        dimensions:
          - CUBE.name
        time_dimension: CUBE.created_at
      granularity: day,
      partition_granularity: day
 
  - name: users
    # This cube uses a data source such as ClickHouse or BigQuery
    # ...
 
    pre_aggregations:
      - name: batch_streaming_lambda
        type: rollup_lambda
        rollups:
          - users.batch
          - streaming_users.streaming
 
      - name: batch
        type: rollup
        measures:
          - users.count
        dimensions:
          - users.name
        time_dimension: users.created_at
        granularity: day
        partition_granularity: day
        build_range_start:
          sql: SELECT '2020-01-01'
        build_range_end:
          sql: SELECT '2022-05-30'