Accelerating Non-Additive Measures

We want to run queries against pre-aggregations only to ensure our application's superior performance. Usually, accelerating a query is as simple as including its measures and dimensions to the pre-aggregation definition.

Non-additive measures (e.g., average values or distinct counts) are a special case. Pre-aggregations with such measures are less likely to be selected to accelerate a query. However, there are a few ways to work around that.

Let's explore the Users cube that contains various measures describing users' age:

  • count of unique age values (distinctAges)
  • average age (avgAge)
  • 90th percentile of age (p90Age)
    distinctAges: {
      sql: `age`,
      type: `countDistinct`,
    },

    avgAge: {
      sql: `age`,
      type: `avg`,
    },

    p90Age: {
      sql: `PERCENTILE_CONT(0.9) WITHIN GROUP (ORDER BY age)`,
      type: `number`,
    },

All of these measures are non-additive. Practically speaking, it means that the pre-aggregation below would only accelerate a query that fully matches its definition:

    main: {
      measures: [
        CUBE.distinctAges,
        CUBE.avgAge,
        CUBE.p90Age
      ],
      dimensions: [
        CUBE.gender
      ]
    },

This query will match the pre-aggregation above and, thus, will be accelerated:

{
  "measures": [
    "Users.distinctAges",
    "Users.avgAge",
    "Users.p90Age"
  ],
  "dimensions": [
    "Users.gender"
  ]
}

Meanwhile, the query below won't match the same pre-aggregation because it contains non-additive measures and omits the gender dimension. It won't be accelerated:

{
  "measures": [
    "Users.distinctAges",
    "Users.avgAge",
    "Users.p90Age"
  ]
}

Let's explore some possible workarounds.

Often, non-additive countDistinct measures can be changed to have the countDistinctApprox type which will make them additive and orders of magnitude more performant. This countDistinctApprox measures can be used in pre-aggregations. However, there are two drawbacks:

  • This type is approximate, so the measures might yield slightly different results compared to their countDistinct counterparts. Please consult with your database's documentation to learn more.
  • The countDistinctApprox is not supported with all databases. Currently, Cube supports it for Athena, BigQuery, and Snowflake.

For example, the distinctAges measure can be rewritten as follows:

    distinctAges: {
      sql: `age`,
      type: `countDistinctApprox`,
    },

Non-additive avg measures can be rewritten as calculated measures that reference additive measures only. Then, this additive measures can be used in pre-aggregations.

For example, the avgAge measure can be rewritten as follows:

    avgAge: {
      sql: `${CUBE.ageSum} / ${CUBE.count}`,
      type: `number`,
    },

    ageSum: {
      sql: `age`,
      type: `sum`,
    },

    count: {
      type: `count`,
    },

If the two workarounds described above don't apply to your use case, feel free to create additional pre-aggregations with definitions that fully match your queries with non-additive measures. You will get a performance boost at the expense of a slightly increased overall pre-aggregation build time and space consumed.

Please feel free to check out the full source code or run it with the docker-compose up command. You'll see the result, including queried data, in the console.

Did you find this page useful?