Cube Core v0.33 — Data Modeling Update

In the recent v0.33 release of Cube Core, we focused on improving the data modeling experience and conventions in Cube. We've been receiving consistent requests for additional conventions and a more pronounced Cube way of data modeling.

Based on feedback and observations of data modeling patterns across various use cases, we've made changes to Cube's data modeling. These revisions promote best practices to enhance consistency across different Cube projects, and establish a foundation for future conventions.

In addition, we're introducing a Cube style guide that we'll use across documentation, examples, learning materials, and customer engagements. This will help make our materials consistent and easier to maintain. We are making our style guide public and recommend that the Cube community follows it.

There are no breaking or backward-incompatible changes in the v0.33 release.

Snake case naming convention

Six month ago, we introduced YAML syntax for data modeling in Cube; its adoption has been growing rapidly. YAML syntax provides a way to write concise data model files without too many quotes, backticks, and braces.

The problem was, the camelCase name convention—which has been used in Cube since the very beginning—didn’t play nice with YAML. Considering that, we’ve decided to go with snake_case as a default naming convention across both JavaScript and YAML models.

We know that some of you may prefer to keep using camelCase, and that’s why we're going to support all definitions in both snake_case and camelCase. However, for the documentation, examples, and data model generation, we’ll use the snake_case moving forward.

cubes:
  - name: line_items
    sql_table: public.line_items

    joins:
      - name: orders
        sql: "{CUBE}.order_id = {orders.id}"
        relationship: many_to_one

    measures:
      - name: count
        type: count

      - name: total_amount
        sql: price
        type: sum

    dimensions:
      - name: id
        sql: id
        type: number
        primary_key: true

      - name: created_date
        sql: created_at
        type: time

Schema is now called data model

We've been using the terms "data model" and "schema" interchangeably for quite some time. Moving forward, we will use the term "data model" instead of "schema" to ensure clarity and consistency. The term "schema" could be confused with a database schema—hence the change in terminology.

With that change, we're also introducing a conventional folder structure for Cube projects. Views are becoming a crucial component of Cube's data modeling, serving as data marts to expose slices of your data graph to BI tools and data applications via APIs. Therefore, they need to have their own designated folder within the project structure.

model
├── cubes
│   ├── base_orders.yml
│   └── base_customers.yml
└── views
    └── orders.yml

Cube will now automatically generate a model folder with views and cubes subfolders inside. If you're using a schema folder to keep your data model, it will continue to work.

Conventional names for join relationships

In the v0.33 release, we are renaming join relationships to use widely adopted terminology: one_to_many, many_to_one, and one_to_one. As with all changes in this release, it's not a breaking change, and the previous names can still be used.

As a reminder, Cube always uses LEFT JOIN, and join relationships are used to avoid fanouts when generating SQL to query upstream data sources.

Syntactic sugar for cube definitions

Often, cubes are backed by entire tables in upstream data sources and defined with fairly typical SELECT * one-liners. And so, we're introducing the sql_table parameter to help define such cubes in a more concise way.

cubes:
  - name: orders
    sql: SELECT * FROM my_schema.orders

# Now this can be written as:

cubes:
  - name: orders
    sql_table: schema.orders

Update to views

Views are used to expose slices of your data graph and act as data marts. You control which measures and dimensions are exposed to BI or data apps and the direction of joins between exposed cubes.

In the v0.33 release, we're introducing the cubes parameter in views to include exposed cubes in bulk. You can build your view by combining multiple joined cubes together and specifying the path by which they should be joined for that particular view. You can learn more about views in the documentation.

views:
  - name: pages

    cubes:
      - join_path: web_pages
        includes: *
        excludes:
          - session_id

      - join_path: web_pages.web_sessions
        alias: sessions
        prefix: true
        includes:
          - start_date
          - duration
          - exit_path
          - is_first

      - join_path: web_pages.web_sessions.web_users
        alias: users
        prefix: true
        includes:
          - city
          - company
          - age
          - name: ltv
            alias: life_time_value

Unified visibility control

We've unified visibility management across cubes, views, measures, dimensions, and segments with the public parameter. The default value is true, meaning that all these elements are public, available through APIs, and visible to data consumers by default. However, you can hide some of them by setting this parameter to false.

cubes:
  - name: orders
    sql_table: schema.orders
    public: false

You can also use COMPILE_CONTEXT for dynamic visibility if necessary.

views:
  - name: arr
    description: Annual Recurring Revenue
    public: COMPILE_CONTEXT.security_context.has_finance_role

    includes:
      - customers.plan
      - revenue.arr
      - revenue.date

Private cubes and their members are now available in Playground. They will have a lock icon to indicate that they are private and not accessible via APIs.

As usual, the detailed changelog is available on GitHub.

What’s next in Cube Core

You may have noticed that the Cube configuration file is still only in camelCase. In version 0.34, we will support snake_case in the configuration file, and we will also introduce initial Python support for configuration. Consequently, it will be possible to write the configuration entirely in Python in a cube.py file. Moreover, this release will bring dynamic data models in YAML with Jinja and Python.

You can try the new v0.33 release today in Cube Core with our Docker distribution. It's also available on the latest release channel in Cube Cloud.

As always, please don't hesitate to get in touch with us and share your feedback in our Slack community of more than 8,000 data practitioners.