Guides
Data modeling
Calculating nested aggregates

# Calculating nested aggregates

## Use case

Sometimes, there's a need to calculate a double aggregation over a fact table. For example, if you have a `line_items` table that has `store_id`, `order_id`, and `sales` columns, you might wonder what is the median of sales per product for each store.

With an ad-hoc SQL query, this double aggregation would probably be expressed as follows:

``````WITH sales_per_store_product AS (
SELECT store_id, product_id, SUM(sales) AS sales
FROM line_items
GROUP BY 1, 2
)

SELECT store_id, PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY sales) AS sales_median
FROM sales_per_store_product
GROUP BY 1``````

## Data modeling

In Cube, measures are used to define aggregates. However, a single measure can only contain a single aggregation, e.g., `SUM`, `APPROX_COUNT_DISTINCT`, or `PERCENTILE_CONT`.

If you'd like to define a double aggregation, e.g., a median of a sum of values, the outer aggregation would need to be defined in a separate cube and the inner aggregation (measure) would need to be brought to that cube as a subquery dimension. Also, these cubes would need to have a join definition between them.

Consider the following data model:

``````cubes:
- name: nested_agg_sales
sql: >
SELECT 1 AS id, 1 AS store_id, 1 AS product_id, 10 AS sales UNION ALL
SELECT 2 AS id, 1 AS store_id, 1 AS product_id, 20 AS sales UNION ALL
SELECT 3 AS id, 1 AS store_id, 2 AS product_id, 30 AS sales UNION ALL
SELECT 4 AS id, 1 AS store_id, 2 AS product_id, 40 AS sales UNION ALL
SELECT 5 AS id, 2 AS store_id, 1 AS product_id, 50 AS sales UNION ALL
SELECT 6 AS id, 2 AS store_id, 1 AS product_id, 60 AS sales UNION ALL
SELECT 7 AS id, 2 AS store_id, 2 AS product_id, 70 AS sales UNION ALL
SELECT 8 AS id, 2 AS store_id, 2 AS product_id, 80 AS sales

dimensions:
- name: id
sql: id
type: number
primary_key: true

- name: store_id
sql: store_id
type: number

- name: product_id
sql: product_id
type: number

- name: store_product_id
sql: "CONCAT({store_id}, '-', {product_id})"
type: string

measures:
- name: sales
sql: sales
type: sum

- name: nested_agg_stores_orders
sql: >
SELECT store_id, product_id
FROM (
SELECT 1 AS id, 1 AS store_id, 1 AS product_id, 10 AS sales UNION ALL
SELECT 2 AS id, 1 AS store_id, 1 AS product_id, 20 AS sales UNION ALL
SELECT 3 AS id, 1 AS store_id, 2 AS product_id, 30 AS sales UNION ALL
SELECT 4 AS id, 1 AS store_id, 2 AS product_id, 40 AS sales UNION ALL
SELECT 5 AS id, 2 AS store_id, 1 AS product_id, 50 AS sales UNION ALL
SELECT 6 AS id, 2 AS store_id, 1 AS product_id, 60 AS sales UNION ALL
SELECT 7 AS id, 2 AS store_id, 2 AS product_id, 70 AS sales UNION ALL
SELECT 8 AS id, 2 AS store_id, 2 AS product_id, 80 AS sales
) AS raw
GROUP BY 1, 2

joins:
- name: nested_agg_sales
sql: "{nested_agg_stores_orders.store_product_id} = {nested_agg_sales.store_product_id}"
relationship: one_to_many

dimensions:
- name: store_id
sql: store_id
type: number

- name: product_id
sql: product_id
type: number

- name: store_product_id
sql: "CONCAT({store_id}, '-', {product_id})"
type: string
primary_key: true

- name: sales_sum
sql: "{nested_agg_sales.sales}"
type: number
sub_query: true

measures:
- name: median_sales
sql: "PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY {sales_sum})"
type: number``````

As you can see, the sum of sales for per store and per product is defined in the `nested_agg_sales` cube as the `sales` measure. Then, it is brought to the `nested_agg_stores_orders` cube as `sales_sum` that is defined as a subquery dimension. Also, a join is defined between both cubes.

Then, the median of sales is defined as the `median_sales` measure in the `nested_agg_stores_orders` cube. It’s OK to reference `sales_sum` in this measure because now it's a dimension; referencing a measure from another cube here would not work.

## Result

Querying the `median_sales` measure would give the expected result:

We can verify that it's correct by adding one more dimension to the query: