Data modeling with YAML, Jinja, and Python

Cube supports authoring dynamic data models using the Jinja templating language (opens in a new tab) and Python. This allows de-duplicating common patterns in your data models as well as dynamically generating data models from a remote data source.

Jinja is supported in all YAML data model files.

YAML

It is recommended to default to YAML syntax because of its simplicity and readability.

Folded and literal strings

Sometimes you might want to use multi-line strings in YAML-based data models, e.g., in parameters such as sql or description. It is recommended to use literal (opens in a new tab) (|) string style in such cases as it preserves line breaks.

cubes:
  - name: orders
    description: |
      This cube represents customer orders.
      It includes measures for total sales and order count.
    sql: |
      -- Fetch only relevant columns
      SELECT id, created_at, total_amount
      FROM staging.orders

Jinja

Please check the Jinja documentation (opens in a new tab) for details on Jinja syntax.

Previewing YAML

You can preview the data model code after applying Jinja templates in the Data Model editor by clicking ... → Jinja Preview on files that contain Jinja templates in the sidebar.

Currently, there's no way to preview the data model code in YAML after applying Jinja templates in Cube Core. Please track this issue (opens in a new tab).

You can also view the resulting data model in Playground and Visual Model. Also, you can introspect the data model using the /v1/meta REST API endpoint.

Loops

Jinja supports looping (opens in a new tab) over lists and dictionaries. In the following example, we loop over a list of nested properties and generate a LEFT JOIN UNNEST clause for each one: for each one:

{%- set nested_properties = [
  "referrer",
  "href",
  "host",
  "pathname",
  "search"
] -%}
 
cubes:
  - name: analytics
    sql: |
      SELECT
      {%- for prop in nested_properties %}
        {{ prop }}_prop.value AS {{ prop }}
      {%- endfor %}
      FROM public.events
      {%- for prop in nested_properties %}
      LEFT JOIN UNNEST(properties) AS {{ prop }}_prop ON {{ prop }}_prop.key = '{{ prop }}'
      {%- endfor %}

Another useful pattern is to loop over a dictionary of values and generate a measure for each one, as in the following example:

{%- set metrics = {
  "mau": 30,
  "wau": 7,
  "day": 1
} %}
 
cubes:
  - name: orders
    sql_table: public.orders
 
    measures:
      {%- for name, days in metrics | items %}
      - name: {{ name | safe }}
        type: count_distinct
        sql: user_id
        rolling_window:
          trailing: {{ days }} day
          offset: start
      {% endfor %}

Macros

Cube data models also support Jinja macros, which allow you to define reusable snippets of code. You can read more about macros in the Jinja documentation (opens in a new tab).

In the following example, we define a macro called dimension() which generates a dimension definition in Cube. This macro is then invoked multiple times to generate multiple dimensions:

{# Declare the macro before using it, otherwise Jinja will throw an error. #}
{%- macro dimension(column_name, type='string', primary_key=False) -%}
      - name: {{ column_name }}
        sql: {{ column_name }}
        type: {{ type }}
        {% if primary_key -%}
        primary_key: true
        {% endif -%}
{% endmacro -%}
 
cubes:
  - name: orders
    sql_table: public.orders
 
    dimensions:
      {{ dimension('id', 'number', primary_key=True) }}
      {{ dimension('status') }}
      {{ dimension('created_at', 'time') }}
      {{ dimension('completed_at', 'time') }}

You could also use macros to generate SQL snippets for use in the sql property:

{%- macro cents_to_dollars(column_name, precision=2) -%}
  ({{ column_name }} / 100)::NUMERIC(16, {{ precision }})
{%- endmacro -%}
 
cubes:
  - name: payments
    sql: |
      SELECT
        id AS payment_id,
        {{ cents_to_dollars('amount') }} AS amount_usd
      FROM app_data.payments

Escaping unsafe strings

Auto-escaping (opens in a new tab) of unsafe string values in Jinja templates is enabled by default. It means that any strings coming from Python might get wrapped in quotes, potentially breaking YAML syntax.

You can work around that by using the safe Jinja filter (opens in a new tab) with such string values:

cubes:
  - name: my_cube
    description: {{ get_unsafe_string() | safe }}

Alternatively, you can wrap unsafe strings into instances of the following class in your Python code, effectively marking them as safe. This is particularly useful for library code, e.g., similar to the cube_dbt package.

class SafeString(str):
  is_safe: bool
 
  def __init__(self, v: str):
    self.is_safe = True

Python

Template context

You can use Python to declare functions that can be invoked and variables that can be referenced from within a Jinja template. These functions and variables must be defined in model/globals.py file and registered in the TemplateContext instance.

See the TemplateContext reference for more details.

In the following example, we declare a function called load_data that supposedly loads data from a remote API endpoint. We will then use the function to generate a data model:

from cube import TemplateContext
 
template = TemplateContext()
 
@template.function('load_data')
def load_data():
   client = MyApiClient("example.com")
   return client.load_data()
 
 
class MyApiClient:
  def __init__(self, api_url):
    self.api_url = api_url
 
  # mock API call
  def load_data(self):
    api_response = {
      "cubes": [
        {
          "name": "cube_from_api",
          "measures": [
            { "name": "count", "type": "count" },
            { "name": "total", "type": "sum", "sql": "amount" }
          ],
          "dimensions": []
        },
        {
          "name": "cube_from_api_with_dimensions",
          "measures": [
            { "name": "active_users", "type": "count_distinct", "sql": "user_id" }
          ],
          "dimensions": [
            { "name": "city", "sql": "city_column", "type": "string" }
          ]
        }
      ]
    }
    return api_response

Now that we've decorated our function with the @template.function decorator, we can call it from within a Jinja template. In the following example, we'll call the load_data() function and use the result to generate a data model.

cubes:
  {# Here we use the decorated function from earlier #}
  {%- for cube in load_data()["cubes"] %}
 
  - name: {{ cube.name }}
 
  {%- if cube.measures is not none and cube.measures|length > 0 %}
    measures:
      {%- for measure in cube.measures %}
      - name: {{ measure.name }}
        type: {{ measure.type }}
      {%- if measure.sql %}
        sql: {{ measure.sql }}
      {%- endif %}
      {%- endfor %}
  {%- endif %}
 
  {%- if cube.dimensions is not none and cube.dimensions|length > 0 %}
    dimensions:
      {%- for dimension in cube.dimensions %}
      - name: {{ dimension.name }}
        type: {{ dimension.type }}
        sql: {{ dimension.sql }}
      {%- endfor %}
  {%- endif %}
  {%- endfor %}

Imports

In the model/globals.py file (or the cube.py configuration file), you can import modules from the current directory. In the following example, we import a function from the utils module and use it to populate a variable in the template context:

model/utils.py

def answer_to_main_question() -> str:
  return "42"

model/globals.py

from cube import TemplateContext
from utils import answer_to_main_question
 
template = TemplateContext()
 
answer = answer_to_main_question()
template.add_variable('answer', answer)

Dependencies

If you need to use dependencies in your dynamic data model (or your cube.py configuration file), you can list them in the requirements.txt file in the root directory of your Cube deployment. They will be automatically installed with pip on the startup.

cube package is available out of the box, it doesn't need to be listed in requirements.txt.

If you use dbt for data transformation, you might find the cube_dbt package useful. It provides a set of utilities that simplify defining the data model in YAML based on dbt models.

If you need to use dependencies with native extensions, build a custom Docker image.

Dynamic data models JavaScript