Databricks JDBC
- A JDK installation
- The JDBC URL for the Databricks cluster
Add the following to a .env
file in your Cube project:
CUBEJS_DB_TYPE=databricks-jdbc
# CUBEJS_DB_NAME is optional
CUBEJS_DB_NAME=default
# You can find this inside the cluster's configuration
CUBEJS_DB_DATABRICKS_URL=jdbc:databricks://dbc-XXXXXXX-XXXX.cloud.databricks.com:443/default;transportMode=http;ssl=1;httpPath=sql/protocolv1/o/XXXXX/XXXXX;AuthMech=3;UID=token
# You can specify the personal access token separately from `CUBEJS_DB_DATABRICKS_URL` by doing this:
CUBEJS_DB_DATABRICKS_TOKEN=XXXXX
# This accepts the Databricks usage policy and must be set to `true` to use the Databricks JDBC driver
CUBEJS_DB_DATABRICKS_ACCEPT_POLICY=true
Create a .env
file as above, then extend the
cubejs/cube:jdk
Docker image tag to build a Cube image with the JDBC driver:
FROM cubejs/cube:jdk
COPY . .
RUN npm install
You can then build and run the image using the following commands:
docker build -t cubejs-jdk .
docker run -it -p 4000:4000 --env-file=.env cubejs-jdk
Environment Variable | Description | Possible Values | Required | Supports multiple data sources? |
---|---|---|---|---|
CUBEJS_DB_NAME | The name of the database to connect to | A valid database name | ✅ | ✅ |
CUBEJS_DB_DATABRICKS_URL | The URL for a JDBC connection | A valid JDBC URL | ✅ | ✅ |
CUBEJS_DB_DATABRICKS_ACCEPT_POLICY | Whether or not to accept the license terms for the Databricks JDBC driver | true , false | ✅ | ✅ |
CUBEJS_DB_DATABRICKS_TOKEN | The personal access token used to authenticate the Databricks connection | A valid token | ✅ | ✅ |
CUBEJS_DB_EXPORT_BUCKET_MOUNT_DIR | The path for the Databricks DBFS mount | A valid mount path | ❌ | ✅ |
CUBEJS_CONCURRENCY | The number of concurrent connections each queue has to the database. Default is 2 | A valid number | ❌ | ❌ |
CUBEJS_DB_MAX_POOL | The maximum number of concurrent database connections to pool. Default is 8 | A valid number | ❌ | ✅ |
countDistinctApprox
Measures of type
countDistinctApprox
can
not be used in pre-aggregations when using Databricks as a data source.
To learn more about pre-aggregation build strategies, head here.
Feature | Works with read-only mode? | Is default? |
---|---|---|
Simple | ❌ | ✅ |
Export Bucket | ❌ | ❌ |
By default, Databricks JDBC uses a simple strategy to build pre-aggregations.
Simple
No extra configuration is required to configure simple pre-aggregation builds for Databricks.
Export Bucket
Databricks supports using both AWS S3 and Azure Blob Storage for export bucket functionality.
AWS S3
To use AWS S3 as an export bucket, first complete the Databricks guide on mounting S3 buckets to Databricks DBFS.
Ensure the AWS credentials are correctly configured in IAM to allow reads and writes to the export bucket in S3.
CUBEJS_DB_EXPORT_BUCKET_TYPE=s3
CUBEJS_DB_EXPORT_BUCKET=s3://my.bucket.on.s3
CUBEJS_DB_EXPORT_BUCKET_AWS_KEY=<AWS_KEY>
CUBEJS_DB_EXPORT_BUCKET_AWS_SECRET=<AWS_SECRET>
CUBEJS_DB_EXPORT_BUCKET_AWS_REGION=<AWS_REGION>
Azure Blob Storage
To use Azure Blob Storage as an export bucket, follow the Databricks guide on mounting Azure Blob Storage to Databricks DBFS.
Retrieve the storage account access key from your Azure account and use as follows:
CUBEJS_DB_EXPORT_BUCKET_TYPE=azure
CUBEJS_DB_EXPORT_BUCKET=wasbs://my-bucket@my-account.blob.core.windows.net
CUBEJS_DB_EXPORT_BUCKET_AZURE_KEY=<AZURE_STORAGE_ACCOUNT_ACCESS_KEY>
Cube does not require any additional configuration to enable SSL/TLS for Databricks JDBC connections.
Cube Cloud
To accurately show partition sizes in the Cube Cloud APM, an export bucket must be configured.
Did you find this page useful?