Dynamic data models with Jinja and Python
Cube supports authoring dynamic data models using the Jinja templating language (opens in a new tab) and Python. This allows de-duplicating common patterns in your data models as well as dynamically generating data models from a remote data source.
Jinja is supported in all YAML data model files.
Jinja
Please check the Jinja documentation (opens in a new tab) for details on Jinja syntax.
Currently, there's no way to preview the data model code in YAML after applying Jinja templates. Please track this issue (opens in a new tab).
As a workaround, you can view the resulting data model in
Playground and Visual Model.
You can also introspect the data model using the /v1/meta
REST API
endpoint.
Loops
Jinja supports looping (opens in a new tab) over lists and dictionaries. In
the following example, we loop over a list of nested properties and generate a
LEFT JOIN UNNEST
clause for each one: for each one:
{%- set nested_properties = [
"referrer",
"href",
"host",
"pathname",
"search"
] -%}
cubes:
- name: analytics
sql: >
SELECT
{%- for prop in nested_properties %}
{{ prop }}_prop.value AS {{ prop }}
{%- endfor %}
FROM public.events
{%- for prop in nested_properties %}
LEFT JOIN UNNEST(properties) AS {{ prop }}_prop ON {{ prop }}_prop.key = '{{ prop }}'
{%- endfor %}
Another useful pattern is to loop over a dictionary of values and generate a measure for each one, as in the following example:
{%- set metrics = {
"mau": 30,
"wau": 7,
"day": 1
} %}
cubes:
- name: orders
sql_table: public.orders
measures:
{%- for name, days in metrics | items %}
- name: {{ name | safe }}
type: count_distinct
sql: user_id
rolling_window:
trailing: {{ days }} day
offset: start
{% endfor %}
Macros
Cube data models also support Jinja macros, which allow you to define reusable snippets of code. You can read more about macros in the Jinja documentation (opens in a new tab).
In the following example, we define a macro called dimension()
which generates
a dimension definition in Cube. This macro is then invoked multiple times to
generate multiple dimensions:
{# Declare the macro before using it, otherwise Jinja will throw an error. #}
{%- macro dimension(column_name, type='string', primary_key=False) -%}
- name: {{ column_name }}
sql: {{ column_name }}
type: {{ type }}
{% if primary_key -%}
primary_key: true
{% endif -%}
{% endmacro -%}
cubes:
- name: orders
sql_table: public.orders
dimensions:
{{ dimension('id', 'number', primary_key=True) }}
{{ dimension('status') }}
{{ dimension('created_at', 'time') }}
{{ dimension('completed_at', 'time') }}
You could also use macros to generate SQL snippets for use in the sql
property:
{%- macro cents_to_dollars(column_name, precision=2) -%}
({{ column_name }} / 100)::NUMERIC(16, {{ precision }})
{%- endmacro -%}
cubes:
- name: payments
sql: >
SELECT
id AS payment_id,
{{ cents_to_dollars('amount') }} AS amount_usd
FROM app_data.payments
Escaping unsafe strings
Auto-escaping (opens in a new tab) of unsafe string values in Jinja templates is enabled by default. It means that any strings coming from Python might get wrapped in quotes, potentially breaking YAML syntax.
You can work around that by using the safe
Jinja
filter (opens in a new tab) with such string values:
cubes:
- name: my_cube
description: {{ get_unsafe_string() | safe }}
Alternatively, you can wrap unsafe strings into instances of the following
class in your Python code, effectively marking them as safe. This is
particularly useful for library code, e.g., similar to the
cube_dbt
package.
class SafeString(str):
is_safe: bool
def __init__(self, v: str):
self.is_safe = True
Python
Template context
You can use Python to declare functions that can be invoked and variables that can be
referenced from within a Jinja template. These functions and variables must be defined
in model/globals.py
file and registered in the TemplateContext
instance.
See the TemplateContext
reference for more details.
In the following example, we declare a function called load_data
that supposedly loads
data from a remote API endpoint. We will then use the function to generate a data model:
from cube import TemplateContext
template = TemplateContext()
@template.function('load_data')
def load_data():
client = MyApiClient("example.com")
return client.load_data()
class MyApiClient:
def __init__(self, api_url):
self.api_url = api_url
# mock API call
def load_data(self):
api_response = {
"cubes": [
{
"name": "cube_from_api",
"measures": [
{ "name": "count", "type": "count" },
{ "name": "total", "type": "sum", "sql": "amount" }
],
"dimensions": []
},
{
"name": "cube_from_api_with_dimensions",
"measures": [
{ "name": "active_users", "type": "count_distinct", "sql": "user_id" }
],
"dimensions": [
{ "name": "city", "sql": "city_column", "type": "string" }
]
}
]
}
return api_response
Now that we've decorated our function with the @template.function
decorator, we can
call it from within a Jinja template. In the following example, we'll call the
load_data()
function and use the result to generate a data model.
cubes:
{# Here we use the decorated function from earlier #}
{%- for cube in load_data()["cubes"] %}
- name: {{ cube.name }}
{%- if cube.measures is not none and cube.measures|length > 0 %}
measures:
{%- for measure in cube.measures %}
- name: {{ measure.name }}
type: {{ measure.type }}
{%- if measure.sql %}
sql: {{ measure.sql }}
{%- endif %}
{%- endfor %}
{%- endif %}
{%- if cube.dimensions is not none and cube.dimensions|length > 0 %}
dimensions:
{%- for dimension in cube.dimensions %}
- name: {{ dimension.name }}
type: {{ dimension.type }}
sql: {{ dimension.sql }}
{%- endfor %}
{%- endif %}
{%- endfor %}
Imports
In the model/globals.py
file (or the cube.py
configuration file), you can
import modules from the current directory. In the following example, we import a function
from the utils
module and use it to populate a variable in the template context:
def answer_to_main_question() -> str:
return "42"
from cube import TemplateContext
from utils import answer_to_main_question
template = TemplateContext()
answer = answer_to_main_question()
template.add_variable('answer', answer)
Dependencies
If you need to use dependencies in your dynamic data model (or your cube.py
configuration file), you can list them in the requirements.txt
file in the root
directory of your Cube deployment. They will be automatically installed with pip
on
the startup.
cube
package is available out of the box, it doesn't need to be
listed in requirements.txt
.
If you use dbt for data transformation, you might find the cube_dbt
package useful. It provides a set of utilities that simplify
defining the data model in YAML based on dbt models.
If you need to use dependencies with native extensions, build a custom Docker image.