cube_dbt package
cube_dbt package simplifies defining the data model in the semantic layer
on top of dbt (opens in a new tab) models. It provides convenient tools for
loading the metadata of a dbt project, inspecting dbt models, and rendering
them as cubes in YAML.
- Install
cube_dbt(opens in a new tab) package from PyPI - Check the source code in
cube_dbt(opens in a new tab) on GitHub - Submit issues to
cube(opens in a new tab) on GitHub
Installation
Cube Cloud
Add the cube_dbt package to the requirements.txt file in the root
directory of your Cube project. Cube Cloud will install the dependencies
automatically.
Reference
Dbt class
Encapsulates tools for working with the metadata of a dbt project.
Dbt.__init__
The constructor accepts the metadata of a dbt project as a dict with the
contents of a manifest.json file (opens in a new tab).
import json
from cube_dbt import Dbt
manifest_path = './manifest.json'
with open(manifest_path, 'r') as file:
manifest = json.loads(file.read())
dbt = Dbt(manifest)Use in cases when Dbt.from_file and Dbt.from_url aren't applicable,
e.g., when manifest.json is loaded from a private AWS S3 bucket.
Dbt.from_file
This static method loads the metadata of a dbt project from a manifest.json
file by its path and returns an instance of the Dbt class.
from cube_dbt import Dbt
manifest_path = './manifest.json'
dbt = Dbt.from_file(manifest_path)Dbt.from_url
This static method loads the metadata of a dbt project from a manifest.json
file by its URL and returns an instance of the Dbt class.
from cube_dbt import Dbt
manifest_url = 'https://bucket.s3.amazonaws.com/manifest.json'
dbt = Dbt.from_url(manifest_url)Dbt.filter
This method filters loaded dbt models by their path prefixes, tags, or names.
from cube_dbt import Dbt
manifest_url = 'https://bucket.s3.amazonaws.com/manifest.json'
dbt = Dbt.from_url(manifest_url).filter(
paths=['marts/'], # Only models under the 'marts/' path
tags=['cube'], # Only models with the 'cube' tag
names=['orders'] # Only the 'orders' model
)Use to expose only necessary dbt models to the semantic layer.
Note that values in paths should not be prefixed with models/.
Dbt.models
This property exposes a list of loaded dbt models as instances of the
Model class.
from cube_dbt import Dbt
manifest_url = 'https://bucket.s3.amazonaws.com/manifest.json'
dbt = Dbt.from_url(manifest_url)
for model in dbt.models:
print(model)Only dbt models that comply with Dbt.filter rules and are not
materialized as ephemeral (opens in a new tab) will be returned.
Dbt.model
This method returns a loaded dbt model by its name as an instance of the
Model class.
from cube_dbt import Dbt
manifest_url = 'https://bucket.s3.amazonaws.com/manifest.json'
dbt = Dbt.from_url(manifest_url)
model = dbt.model('orders')
print(model)Only dbt models that comply with Dbt.filter rules and are not
materialized as ephemeral (opens in a new tab) will be returned.
Model class
Encapsulates tools for working with the metadata of a dbt model.
Model.name
This property exposes the name of a dbt model.
from cube_dbt import Dbt
manifest_url = 'https://bucket.s3.amazonaws.com/manifest.json'
dbt = Dbt.from_url(manifest_url)
model = dbt.model('orders')
print(model.name)
# For example, 'orders'Model.description
This property exposes the description of a dbt model.
from cube_dbt import Dbt
manifest_url = 'https://bucket.s3.amazonaws.com/manifest.json'
dbt = Dbt.from_url(manifest_url)
model = dbt.model('orders')
print(model.description)
# For example, 'All Jaffle Shop orders'Model.sql_table
This property exposes the fully-qualified SQL relation name of a dbt model
that can be used as the sql_table parameter of a cube.
from cube_dbt import Dbt
manifest_url = 'https://bucket.s3.amazonaws.com/manifest.json'
dbt = Dbt.from_url(manifest_url)
model = dbt.model('orders')
print(model.sql_table)
# For example, '"db"."public"."orders"'Model.columns
This property exposes a list of columns that belong to this dbt model as
instances of the Column class.
from cube_dbt import Dbt
manifest_url = 'https://bucket.s3.amazonaws.com/manifest.json'
dbt = Dbt.from_url(manifest_url)
model = dbt.model('orders')
for column in model.columns:
print(column)Model.column
This method exposes a column that belongs to this dbt model by its name as
an instance of the Column class.
from cube_dbt import Dbt
manifest_url = 'https://bucket.s3.amazonaws.com/manifest.json'
dbt = Dbt.from_url(manifest_url)
model = dbt.model('orders')
column = model.column('status')
print(column)Model.primary_key
This method returns the primary key column, if this dbt model has any, as an
instance of the Column class. Returns None if there's no primary key in
this dbt model.
from cube_dbt import Dbt
manifest_url = 'https://bucket.s3.amazonaws.com/manifest.json'
dbt = Dbt.from_url(manifest_url)
model = dbt.model('orders')
print(model.primary_key)See Column.primary_key for details on the detection of
primary key columns.
Model.as_cube
This method renders this dbt model as a YAML snippet that can be inserted
into YAML data models. Includes name, description (if present), and
sql_table.
from cube_dbt import Dbt
manifest_url = 'https://bucket.s3.amazonaws.com/manifest.json'
dbt = Dbt.from_url(manifest_url)
model = dbt.model('orders')
print(model.as_cube())In the returned multiline string, all lines except for the first one are left-padded with 4 spaces for easier use in YAML data models:
# Jinja template
cubes:
- {{ model.as_cube() }}
# YAML
cubes:
- name: orders
description: All Jaffle Shop orders
sql_table: '"db"."public"."orders"'Model.as_dimensions
This method renders the list of columns that belong to this dbt model as a YAML snippet that can be inserted into YAML data models.
Optionally, accepts a list of column names that should be ignored in skip.
from cube_dbt import Dbt
manifest_url = 'https://bucket.s3.amazonaws.com/manifest.json'
dbt = Dbt.from_url(manifest_url)
model = dbt.model('orders')
print(model.as_dimensions(skip=['status']))See Column.as_dimension for details on the
dimension rendering.
In the returned multiline string, all lines except for the first one are left-padded with 6 spaces for easier use in YAML data models:
# Jinja template
cubes:
- {{ model.as_cube() }}
dimensions:
{{ model.as_dimensions() }}
# YAML
cubes:
- name: orders
description: All Jaffle Shop orders
sql_table: '"db"."public"."orders"'
dimensions:
- name: id
sql: id
type: number
primary_key: trueColumn class
Encapsulates tools for working with the metadata of a column that belongs to a dbt model.
Column.name
This property exposes the name of a column.
from cube_dbt import Dbt
manifest_url = 'https://bucket.s3.amazonaws.com/manifest.json'
dbt = Dbt.from_url(manifest_url)
model = dbt.model('orders')
column = model.column('status')
print(column.name)
# For example, 'status'Column.description
This property exposes the description of a column.
from cube_dbt import Dbt
manifest_url = 'https://bucket.s3.amazonaws.com/manifest.json'
dbt = Dbt.from_url(manifest_url)
model = dbt.model('orders')
column = model.column('status')
print(column.description)
# For example, 'Order execution status: new, in progress, delivered'Column.sql
This property exposes the name of a column that can be used as the
sql parameter of a dimension.
from cube_dbt import Dbt
manifest_url = 'https://bucket.s3.amazonaws.com/manifest.json'
dbt = Dbt.from_url(manifest_url)
model = dbt.model('orders')
column = model.column('status')
print(column.sql)
# For example, 'status'Column.type
This property exposes the data type of a column that can be used as the
type parameter of a dimension.
from cube_dbt import Dbt
manifest_url = 'https://bucket.s3.amazonaws.com/manifest.json'
dbt = Dbt.from_url(manifest_url)
model = dbt.model('orders')
column = model.column('status')
print(column.type)
# For example, 'string'cube_dbt package applies a set of heuristics to map database-specific
types to dimension types. You can check the source
code (opens in a new tab)
for implementation details.
If a column type is not defined in the metadata of a dbt project, string
is used by default.
Column.meta
This property exposes the meta data of a column as a dict that can be
used as the meta parameter of a dimension.
from cube_dbt import Dbt
manifest_url = 'https://bucket.s3.amazonaws.com/manifest.json'
dbt = Dbt.from_url(manifest_url)
model = dbt.model('orders')
column = model.column('status')
print(column.meta)
# For example, '{some: "data"}'Column.primary_key
This property exposes a bool value that indicates if a column is
a primary key or not.
from cube_dbt import Dbt
manifest_url = 'https://bucket.s3.amazonaws.com/manifest.json'
dbt = Dbt.from_url(manifest_url)
model = dbt.model('orders')
column = model.column('status')
print(column.primary_key)
# For example, 'False'By convention, the column is considered a primary key if it has the
primary_key tag in the metadata of a dbt project.
Column.as_dimension
This method renders this column as a YAML snippet that can be inserted
into YAML data models. Includes name, description (if present), sql,
type, primary_key (if True), and meta (if present).
from cube_dbt import Dbt
manifest_url = 'https://bucket.s3.amazonaws.com/manifest.json'
dbt = Dbt.from_url(manifest_url)
model = dbt.model('orders')
column = model.column('status')
print(column.as_dimension())In the returned multiline string, all lines except for the first one are left-padded with 8 spaces for easier use in YAML data models:
# Jinja template
cubes:
- {{ model.as_cube() }}
dimensions:
{% for column in model.columns() %}
- {{ column.as_dimension() }}
{% endfor %}
# YAML
cubes:
- name: orders
description: All Jaffle Shop orders
sql_table: '"db"."public"."orders"'
dimensions:
- name: id
sql: id
type: number
primary_key: true
- name: status
description: 'Order execution status: new, in progress, delivered'
sql: status
type: string
meta:
some: data