Introducing YAML Data Modeling in Cube

You asked—we listened. Now, you can do your Cube data modeling in YAML.

Cover of the 'Introducing YAML Data Modeling in Cube' blog post

We often hear from our customers that they want the ability to create data models in languages other than JavaScript—specifically YAML and Python. Today, we’re excited to release a public preview for YAML data modeling in Cube. This feature is available in both Cube Cloud and Cube Core.

Our goal is to support all the core features of Cube’s data modeling, as well as caching and multi-tenancy configurations, within YAML, and later, Python.

YAML and Python are universally adopted in the modern data stack: data engineers setup Airflow DAGs in Python and author dbt models in YAML while data analysts perform data exploration in Python-based notebooks. In order for data engineers to not have to shift between YAML and JS as they work with so many different tools in their stacks, we want to bring support for these languages to Cube.

“Cube’s YAML support makes my life easier because it streamlines the compatibility between the tools in my stack and my data modeling. Not needing to toggle between different languages saves me time and effort. I’m excited about this feature—and about what Cube has in store for future updates!”

– Olivier Dupuis, Analytics Engineering Consultant, Rittman Analytics

Getting started with YAML

To get started with YAML data modeling in Cube, you’ll need to first ensure that you’re using Cube version 0.31.0 or higher.

Then, create a schema/orders.yml file with the following content:

- name: orders
sql: SELECT * FROM public.orders
- name: count
type: count
- name: total_revenue
sql: amount
type: sum
- name: id
sql: id
type: number
primary_key: true
- name: status
sql: status
type: string

You can learn more about YAML models in our newly-released documentation.

The future of YAML (and Python) in Cube

This release is the first step in our journey to bring both YAML and Python support to Cube. As we continuously implement and bolster support, there are some things we’ll continue to enhance, specifically around developer experience.

The following are the major areas on which we’re working.

Dynamic data models

Many of Cube’s customers leverage dynamic data models for multi-tenant scenarios. While it is possible to use COMPILE_CONTEXT already, dynamic data model generation is not yet available for YAML models.

We’re planning to let users write Python to dynamically generate models the very same way they can write JavaScript to generate them today. And, if you’re interested in this feature, we’d love to hear your thoughts—so, please post your thoughts on how this feature should be implemented.

YAML configuration

While most of the Cube configuration can be done via environment variables—the way we recommend doing it—users lean on the JS-based cube.js configuration file for advanced configuration. We’re planning to introduce YAML and Python support for advanced configuration scenarios as well.

Documentation and best practices

We’re releasing initial documentation for YAML modeling today. In the near future, we’ll update Cube documentation to fully integrate YAML models reference and examples. We are also updating our guides, tutorials, and best practices to showcase YAML examples alongside JS ones.

Cloud IDE support

As part of our Cube Cloud vision, we’re aiming to build a first-class development environment for building data models in Cube, which now includes YAML models. Our Cloud IDE will include data model validation, in-line tips, pre-commit testing, and many more useful features focused on developer productivity and ergonomics.

Give us feedback

You can help us prioritize the features we work on and engage with our open-source software via Slack, GitHub, or our contact page.

And if you have any questions about getting started with Cube’s YAML support (or any other questions, for that matter) find us on our Slack! As always, we’d love to hear from you.

Share this article