Lambda Pre-Aggregations

Lambda pre-aggregations follow the Lambda architecture design to union real-time and batch data. Cube acts as a serving layer and uses pre-aggregations as a batch layer and source data or other pre-aggregations, usually streaming, as a speed layer.

Lambda pre-aggregations only work with Cube Store.

Additionally, we’re going to remove support for external storages, other than Cube Store, later this year. Cube Store will replace Redis and, therefore will be a required component to run Cube even without pre-aggregations.

Below we are looking at the most common examples of using lambda pre-aggregations.

Batch and source data

Batch data is coming from pre-aggregation and real-time data is coming from the data source.

Lambda pre-aggregation batch and source diagram

First, you need to create pre-aggregations that will contain your batch data. In the following example, we call it batch. Please note, it must have timeDimension, and Cube will use it to union batch data with source data.

You control the batch part of your data with buildRangeStart and buildRangeEnd properties of pre-aggregation to determine specific window for your batched data.

Next, you need to create a lambda pre-aggregation. To do that, create pre-aggregation with type rollupLambda, specify rollups you would like to use with rollups property, and finally set unionWithSourceData: true to use source data as a real-time layer.

Please make sure that the lambda pre-aggregation definition comes first when defining your pre-aggregations.

cube('Users', {
  ...,

  preAggregations: {
    lambda: {
      type: `rollupLambda`,
      unionWithSourceData: true,
      rollups: [Users.batch]
    },
    batch: {
      measures: [Users.count],
      dimensions: [Users.name],
      timeDimension: Users.createdAt,
      granularity: `day`,
      buildRangeStart: {
        sql: `SELECT '2020-01-01'`
      },
      buildRangeEnd: {
        sql: `SELECT '2022-05-30'`
      }
    }
  },
})

Batch and streaming data

In this scenario, batch data is comes from one pre-aggregation and real-time data comes from a streaming pre-aggregation.

Lambda pre-aggregation batch and streaming diagram

You can use lambda pre-aggregations to combine data from multiple pre-aggregation, where one pre-aggregation can have batch data and another streaming.

// This cube uses a streaming SQL data source such as ksqlDB
cube('StreamingUsers', {
  ...,
  dataSource: 'ksql',
});

// This cube uses a data source such as Clickhouse or BigQuery
cube('Users', {
  ...,
  preAggregations: {
    batchStreamingLambda: {
      type: `rollupLambda`,
      rollups: [Users.batch, streaming]
    },
    batch: {
      type: `rollup`,
      measures: [Users.count],
      dimensions: [Users.name],
      timeDimension: Users.createdAt,
      granularity: `day`,
      buildRangeStart: {
        sql: `SELECT '2020-01-01'`
      },
      buildRangeEnd: {
        sql: `SELECT '2022-05-30'`
      }
    },
    streaming: {
      type: `rollup`,
      measures: [StreamingUsers.count],
      dimensions: [StreamingUsers.name],
      timeDimension: StreamingUsers.createdAt,
      granularity: `day`
    }}
  },
)

Did you find this page useful?