Yesterday, we announced that Python can be used for configuration in Cube. That’s not all: actually, Python can also be used for data modeling! So, now you can have Python-only experience with Cube, either providing programmatic configuration or authoring dynamic data models with YAML and Jinja.

Data modeling is arguably the most important part of Cube as a semantic layer. The data model encapsulates everything from metric definitions and the public-facing facade of the semantic layer to the access control and aggregation-awareness configuration. Similar to models in your data transformation tools, Cube deployments often have dozens or hundreds of cubes and views in their data models.

YAML syntax for data modeling

Almost a year ago, we introduced the support of the YAML syntax for data modeling. It’s a great alternative to the JavaScript-based syntax because YAML is just easier to write and read. We quickly saw the rising adoption of YAML in the Cube community.

cubes:
- name: revenue
sql_table: public.revenue
public: "{{ COMPILE_CONTEXT.security_context.is_finance }}"
dimensions:
- name: fiscal_year
sql: "{fiscal_periods.year_label}"
type: string
measures:
- name: ltm_revenue
description: Last twelve months (LTM) revenue
sql: amount
type: sum
rolling_window:
trailing: 12 month
pre_aggregations: []

However, pure YAML lacks the dynamicity of JavaScript that allows to define complex and powerful programmatic data models which depend on external data (e.g., retrieved from an API endpoint) or just remove code duplication by relying on a library of helper functions.

With the current update, we put YAML data models on par with JavaScript ones and empower you to organize the data model code, add dynamicity to it, and use programming techniques of your preference.

Adding Jinja and Python to the mix

We’re glad to announce that now Cube supports Jinja as a template engine and Python as a programming language to define dynamic data models in YAML

Jinja is an expressive template engine that is used by many prominent Python projects, including Flask and Python—not to mention that many data engineers are used to writing Jinja in dbt projects. With Jinja, you can add variables, loops, macros, and other logic directly in YAML.

{%- macro dimension(column_name, type='string', primary_key=False) -%}
- name: {{ column_name }}
sql: {{ column_name }}
type: {{ type }}
{% if primary_key -%}
primary_key: true
{% endif -%}
{% endmacro -%}
{%- set metrics = {
"mau": 30,
"wau": 7,
"day": 1
} %}
cubes:
- name: hello_jinja
sql_table: public.orders
data_source: {{ 'bigquery' if env_var('ENV') == 'prod' else 'postgres' }}
dimensions:
{{ dimension('id', 'number', primary_key=True) }}
{{ dimension('status') }}
{{ dimension('created_at', 'time') }}
measures:
{%- for name, days in metrics | items %}
- name: {{ name }}
type: count_distinct
sql: user_id
rolling_window:
trailing: {{ days }} day
offset: start
{% endfor %}

Even more importantly, Python functions can be called from Jinja templates, giving you an option to write some, or most, of your logic in Python and then render the result in YAML as cubes, views, or their parts. In the following example, the model/globals.py file is used to register a Python function to be callable from Jinja.

from cube import TemplateContext
from cube_dbt import Dbt
manifest_url = 'https://cube-dbt-integration.s3.amazonaws.com/manifest.json'
dbt = Dbt.from_url(manifest_url).filter(paths=['models/marts/'])
template = TemplateContext()
@template.function('dbt_model')
def dbt_model(name):
return dbt.model(name)

Needless to say, you can also use dependencies and benefit from the whole Python ecosystem. In the example above, we employ the cube_dbt package that… um, as you can guess, does something peculiar—more on that in the next blog post!

We are committed to support both Python and JavaScript for data modeling going forward. We recommend using YAML by default and adding Jinja and Python for dynamic data models as needed.

We’ve updated the documentation on dynamic data models with Jinja and Python code examples and explained how the model/globals.py file works. Please feel free to check out and try it yourself!

Support for Python comes with the Cube Core v0.34 release. You can get started today in Cube Cloud; please make sure to switch to the latest version channel. You can also get started with Cube Core; however, please note that macOS on M1/M2 as well as Windows are not supported yet; we’re actively working on introducing the support for these platforms very soon.

Please give Python a try out and join our Slack community to share your thoughts and feedback. Also, stay tuned for further updates, including the one related to the cube_dbt package. We’ve already scheduled a webinar on October 11, 2023 to cover that—please don’t hesitate to RSVP and come say hi!