Documentation
Databricks

Databricks

Databricks (opens in a new tab) is a unified data intelligence platform.

Prerequisites

Setup

Environment Variables

Add the following to a .env file in your Cube project:

CUBEJS_DB_TYPE=databricks-jdbc
# CUBEJS_DB_NAME is optional
CUBEJS_DB_NAME=default
# You can find this inside the cluster's configuration
CUBEJS_DB_DATABRICKS_URL=jdbc:databricks://dbc-XXXXXXX-XXXX.cloud.databricks.com:443/default;transportMode=http;ssl=1;httpPath=sql/protocolv1/o/XXXXX/XXXXX;AuthMech=3;UID=token
# You can specify the personal access token separately from `CUBEJS_DB_DATABRICKS_URL` by doing this:
CUBEJS_DB_DATABRICKS_TOKEN=XXXXX
# This accepts the Databricks usage policy and must be set to `true` to use the Databricks JDBC driver
CUBEJS_DB_DATABRICKS_ACCEPT_POLICY=true

Docker

Create a .env file as above, then extend the cubejs/cube:jdk Docker image tag to build a Cube image with the JDBC driver:

FROM cubejs/cube:jdk
 
COPY . .
RUN npm install

You can then build and run the image using the following commands:

docker build -t cube-jdk .
docker run -it -p 4000:4000 --env-file=.env cube-jdk

Environment Variables

Environment VariableDescriptionPossible ValuesRequired
CUBEJS_DB_NAMEThe name of the database to connect toA valid database name
CUBEJS_DB_DATABRICKS_URLThe URL for a JDBC connectionA valid JDBC URL
CUBEJS_DB_DATABRICKS_ACCEPT_POLICYWhether or not to accept the license terms for the Databricks JDBC drivertrue, false
CUBEJS_DB_DATABRICKS_TOKENThe personal access token (opens in a new tab) used to authenticate the Databricks connectionA valid token
CUBEJS_DB_DATABRICKS_CATALOGThe name of the Databricks catalog (opens in a new tab) to connect toA valid catalog name
CUBEJS_DB_EXPORT_BUCKET_MOUNT_DIRThe path for the Databricks DBFS mount (opens in a new tab) (Not needed if using Unity Catalog connection)A valid mount path
CUBEJS_CONCURRENCYThe number of concurrent connections each queue has to the database. Default is 2A valid number
CUBEJS_DB_MAX_POOLThe maximum number of concurrent database connections to pool. Default is 8A valid number

Pre-Aggregation Feature Support

count_distinct_approx

Measures of type count_distinct_approx can be used in pre-aggregations when using Databricks as a source database. To learn more about Databricks's support for approximate aggregate functions, click here (opens in a new tab).

Pre-Aggregation Build Strategies

To learn more about pre-aggregation build strategies, head here.

FeatureWorks with read-only mode?Is default?
Simple
Export Bucket

By default, Databricks JDBC uses a simple strategy to build pre-aggregations.

Simple

No extra configuration is required to configure simple pre-aggregation builds for Databricks.

Export Bucket

Databricks supports using both AWS S3 (opens in a new tab) and Azure Blob Storage (opens in a new tab) for export bucket functionality.

AWS S3

To use AWS S3 as an export bucket, first complete the Databricks guide on connecting to cloud object storage using Unity Catalog (opens in a new tab).

Ensure the AWS credentials are correctly configured in IAM to allow reads and writes to the export bucket in S3.

CUBEJS_DB_EXPORT_BUCKET_TYPE=s3
CUBEJS_DB_EXPORT_BUCKET=s3://my.bucket.on.s3
CUBEJS_DB_EXPORT_BUCKET_AWS_KEY=<AWS_KEY>
CUBEJS_DB_EXPORT_BUCKET_AWS_SECRET=<AWS_SECRET>
CUBEJS_DB_EXPORT_BUCKET_AWS_REGION=<AWS_REGION>

Azure Blob Storage

To use Azure Blob Storage as an export bucket, follow the Databricks guide on connecting to Azure Data Lake Storage Gen2 and Blob Storage (opens in a new tab).

Retrieve the storage account access key (opens in a new tab) from your Azure account and use as follows:

CUBEJS_DB_EXPORT_BUCKET_TYPE=azure
CUBEJS_DB_EXPORT_BUCKET=wasbs://my-bucket@my-account.blob.core.windows.net
CUBEJS_DB_EXPORT_BUCKET_AZURE_KEY=<AZURE_STORAGE_ACCOUNT_ACCESS_KEY>

SSL/TLS

Cube does not require any additional configuration to enable SSL/TLS for Databricks JDBC connections.

Additional Configuration

Cube Cloud

To accurately show partition sizes in the Cube Cloud APM, an export bucket must be configured.