cube_dbt
package
cube_dbt
package simplifies defining the data model in the semantic layer
on top of dbt (opens in a new tab) models. It provides convenient tools for
loading the metadata of a dbt project, inspecting dbt models, and rendering
them as cubes in YAML.
- Install
cube_dbt
(opens in a new tab) package from PyPI - Check the source code in
cube_dbt
(opens in a new tab) on GitHub - Submit issues to
cube
(opens in a new tab) on GitHub
Installation
Cube Cloud
Add the cube_dbt
package to the requirements.txt
file in the root
directory of your Cube project. Cube Cloud will install the dependencies
automatically.
Reference
Dbt
class
Encapsulates tools for working with the metadata of a dbt project.
Dbt.__init__
The constructor accepts the metadata of a dbt project as a dict
with the
contents of a manifest.json
file (opens in a new tab).
import json
from cube_dbt import Dbt
manifest_path = './manifest.json'
with open(manifest_path, 'r') as file:
manifest = json.loads(file.read())
dbt = Dbt(manifest)
Use in cases when Dbt.from_file
and Dbt.from_url
aren't applicable,
e.g., when manifest.json
is loaded from a private AWS S3 bucket.
Dbt.from_file
This static method loads the metadata of a dbt project from a manifest.json
file by its path and returns an instance of the Dbt
class.
from cube_dbt import Dbt
manifest_path = './manifest.json'
dbt = Dbt.from_file(manifest_path)
Dbt.from_url
This static method loads the metadata of a dbt project from a manifest.json
file by its URL and returns an instance of the Dbt
class.
from cube_dbt import Dbt
manifest_url = 'https://bucket.s3.amazonaws.com/manifest.json'
dbt = Dbt.from_url(manifest_url)
Dbt.filter
This method filters loaded dbt models by their path prefixes, tags, or names.
from cube_dbt import Dbt
manifest_url = 'https://bucket.s3.amazonaws.com/manifest.json'
dbt = Dbt.from_url(manifest_url).filter(
paths=['marts/'], # Only models under the 'marts/' path
tags=['cube'], # Only models with the 'cube' tag
names=['orders'] # Only the 'orders' model
)
Use to expose only necessary dbt models to the semantic layer.
Note that values in paths
should not be prefixed with models/
.
Dbt.models
This property exposes a list of loaded dbt models as instances of the
Model
class.
from cube_dbt import Dbt
manifest_url = 'https://bucket.s3.amazonaws.com/manifest.json'
dbt = Dbt.from_url(manifest_url)
for model in dbt.models:
print(model)
Only dbt models that comply with Dbt.filter
rules and are not
materialized as ephemeral (opens in a new tab) will be returned.
Dbt.model
This method returns a loaded dbt model by its name as an instance of the
Model
class.
from cube_dbt import Dbt
manifest_url = 'https://bucket.s3.amazonaws.com/manifest.json'
dbt = Dbt.from_url(manifest_url)
model = dbt.model('orders')
print(model)
Only dbt models that comply with Dbt.filter
rules and are not
materialized as ephemeral (opens in a new tab) will be returned.
Model
class
Encapsulates tools for working with the metadata of a dbt model.
Model.name
This property exposes the name of a dbt model.
from cube_dbt import Dbt
manifest_url = 'https://bucket.s3.amazonaws.com/manifest.json'
dbt = Dbt.from_url(manifest_url)
model = dbt.model('orders')
print(model.name)
# For example, 'orders'
Model.description
This property exposes the description of a dbt model.
from cube_dbt import Dbt
manifest_url = 'https://bucket.s3.amazonaws.com/manifest.json'
dbt = Dbt.from_url(manifest_url)
model = dbt.model('orders')
print(model.description)
# For example, 'All Jaffle Shop orders'
Model.sql_table
This property exposes the fully-qualified SQL relation name of a dbt model
that can be used as the sql_table
parameter of a cube.
from cube_dbt import Dbt
manifest_url = 'https://bucket.s3.amazonaws.com/manifest.json'
dbt = Dbt.from_url(manifest_url)
model = dbt.model('orders')
print(model.sql_table)
# For example, '"db"."public"."orders"'
Model.columns
This property exposes a list of columns that belong to this dbt model as
instances of the Column
class.
from cube_dbt import Dbt
manifest_url = 'https://bucket.s3.amazonaws.com/manifest.json'
dbt = Dbt.from_url(manifest_url)
model = dbt.model('orders')
for column in model.columns:
print(column)
Model.column
This method exposes a column that belongs to this dbt model by its name as
an instance of the Column
class.
from cube_dbt import Dbt
manifest_url = 'https://bucket.s3.amazonaws.com/manifest.json'
dbt = Dbt.from_url(manifest_url)
model = dbt.model('orders')
column = model.column('status')
print(column)
Model.primary_key
This method returns the primary key column, if this dbt model has any, as an
instance of the Column
class. Returns None
if there's no primary key in
this dbt model.
from cube_dbt import Dbt
manifest_url = 'https://bucket.s3.amazonaws.com/manifest.json'
dbt = Dbt.from_url(manifest_url)
model = dbt.model('orders')
print(model.primary_key)
See Column.primary_key
for details on the detection of
primary key columns.
Model.as_cube
This method renders this dbt model as a YAML snippet that can be inserted
into YAML data models. Includes name
, description
(if present), and
sql_table
.
from cube_dbt import Dbt
manifest_url = 'https://bucket.s3.amazonaws.com/manifest.json'
dbt = Dbt.from_url(manifest_url)
model = dbt.model('orders')
print(model.as_cube())
In the returned multiline string, all lines except for the first one are left-padded with 4 spaces for easier use in YAML data models:
# Jinja template
cubes:
- {{ model.as_cube() }}
# YAML
cubes:
- name: orders
description: All Jaffle Shop orders
sql_table: '"db"."public"."orders"'
Model.as_dimensions
This method renders the list of columns that belong to this dbt model as a YAML snippet that can be inserted into YAML data models.
Optionally, accepts a list of column names that should be ignored in skip
.
from cube_dbt import Dbt
manifest_url = 'https://bucket.s3.amazonaws.com/manifest.json'
dbt = Dbt.from_url(manifest_url)
model = dbt.model('orders')
print(model.as_dimensions(skip=['status']))
See Column.as_dimension
for details on the
dimension rendering.
In the returned multiline string, all lines except for the first one are left-padded with 6 spaces for easier use in YAML data models:
# Jinja template
cubes:
- {{ model.as_cube() }}
dimensions:
{{ model.as_dimensions() }}
# YAML
cubes:
- name: orders
description: All Jaffle Shop orders
sql_table: '"db"."public"."orders"'
dimensions:
- name: id
sql: id
type: number
primary_key: true
Column
class
Encapsulates tools for working with the metadata of a column that belongs to a dbt model.
Column.name
This property exposes the name of a column.
from cube_dbt import Dbt
manifest_url = 'https://bucket.s3.amazonaws.com/manifest.json'
dbt = Dbt.from_url(manifest_url)
model = dbt.model('orders')
column = model.column('status')
print(column.name)
# For example, 'status'
Column.description
This property exposes the description of a column.
from cube_dbt import Dbt
manifest_url = 'https://bucket.s3.amazonaws.com/manifest.json'
dbt = Dbt.from_url(manifest_url)
model = dbt.model('orders')
column = model.column('status')
print(column.description)
# For example, 'Order execution status: new, in progress, delivered'
Column.sql
This property exposes the name of a column that can be used as the
sql
parameter of a dimension.
from cube_dbt import Dbt
manifest_url = 'https://bucket.s3.amazonaws.com/manifest.json'
dbt = Dbt.from_url(manifest_url)
model = dbt.model('orders')
column = model.column('status')
print(column.sql)
# For example, 'status'
Column.type
This property exposes the data type of a column that can be used as the
type
parameter of a dimension.
from cube_dbt import Dbt
manifest_url = 'https://bucket.s3.amazonaws.com/manifest.json'
dbt = Dbt.from_url(manifest_url)
model = dbt.model('orders')
column = model.column('status')
print(column.type)
# For example, 'string'
cube_dbt
package applies a set of heuristics to map database-specific
types to dimension types. You can check the source
code (opens in a new tab)
for implementation details.
If a column type is not defined in the metadata of a dbt project, string
is used by default.
Column.meta
This property exposes the meta data of a column as a dict
that can be
used as the meta
parameter of a dimension.
from cube_dbt import Dbt
manifest_url = 'https://bucket.s3.amazonaws.com/manifest.json'
dbt = Dbt.from_url(manifest_url)
model = dbt.model('orders')
column = model.column('status')
print(column.meta)
# For example, '{some: "data"}'
Column.primary_key
This property exposes a bool
value that indicates if a column is
a primary key or not.
from cube_dbt import Dbt
manifest_url = 'https://bucket.s3.amazonaws.com/manifest.json'
dbt = Dbt.from_url(manifest_url)
model = dbt.model('orders')
column = model.column('status')
print(column.primary_key)
# For example, 'False'
By convention, the column is considered a primary key if it has the
primary_key
tag in the metadata of a dbt project.
Column.as_dimension
This method renders this column as a YAML snippet that can be inserted
into YAML data models. Includes name
, description
(if present), sql
,
type
, primary_key
(if True
), and meta
(if present).
from cube_dbt import Dbt
manifest_url = 'https://bucket.s3.amazonaws.com/manifest.json'
dbt = Dbt.from_url(manifest_url)
model = dbt.model('orders')
column = model.column('status')
print(column.as_dimension())
In the returned multiline string, all lines except for the first one are left-padded with 8 spaces for easier use in YAML data models:
# Jinja template
cubes:
- {{ model.as_cube() }}
dimensions:
{% for column in model.columns() %}
- {{ column.as_dimension() }}
{% endfor %}
# YAML
cubes:
- name: orders
description: All Jaffle Shop orders
sql_table: '"db"."public"."orders"'
dimensions:
- name: id
sql: id
type: number
primary_key: true
- name: status
description: 'Order execution status: new, in progress, delivered'
sql: status
type: string
meta:
some: data