Reference
Configuration
Configuration options

Configuration options

Following configuration options can be defined either using Python, in a cube.py file, or using JavaScript, in a cube.js file.

Note that configuration options follow the snake case (opens in a new tab) convention in Python (base_path) and the camel case (opens in a new tab) convention in JavaScript (basePath).

Every configuration option that is a function (e.g., query_rewrite) can be defined as either synchronous or asynchronous. Cube will await for the completion of asynchronous functions.

It's wise to make functions that are called on each request as fast as possible to minimize the performance hit. Consider using caching when applicable and performing calculations outside of these functions.

Data model

schema_path

Path to data model files.

Python
JavaScript

Overrides CUBEJS_SCHEMA_PATH. The default value is model.

Use repositoryFactory for multitenancy or when a more flexible setup is needed.

context_to_app_id

It's a multitenancy option.

context_to_app_id is a function to determine an app id which is used as caching key for various in-memory structures like data model compilation results, etc.

Called on each request.

Python
JavaScript

repository_factory

This option allows to customize the repository for Cube data model files. It is a function, which accepts a context object and can dynamically provide data model files. Learn more about it in multitenancy.

Called only once per app_id.

You can use convenient file_repository implementation to read files from a specified path:

Python
JavaScript

You can also provide file contents directly, e.g., after fetching them from a remote storage or via an API:

Python
JavaScript

schema_version

schema_version can be used to tell Cube that the data model should be recompiled in case it depends on dynamic definitions fetched from some external database or API.

This method is called on each request however RequestContext parameter is reused per application ID as determined by context_to_app_id. If the returned string is different, the data model will be recompiled. It can be used in both multi-tenant and single tenant environments.

Python
JavaScript

compiler_cache_size

Maximum number of compiled data models to persist with in-memory cache. Defaults to 250, but optimum value will depend on deployed environment. When the max is reached, will start dropping the least recently used data models from the cache.

Python
JavaScript

max_compiler_cache_keep_alive

Maximum length of time in ms to keep compiled data models in memory. Default keeps data models in memory indefinitely.

Python
JavaScript

update_compiler_cache_keep_alive

Setting update_compiler_cache_keep_alive to True keeps frequently used data models in memory by reseting their max_compiler_cache_keep_alive every time they are accessed.

Python
JavaScript

allow_js_duplicate_props_in_schema

Boolean to enable or disable a check duplicate property names in all objects of a data model. The default value is false, and it is means the compiler would use the additional transpiler for check duplicates.

Python
JavaScript

Query cache & queue

cache_and_queue_driver

The cache and queue driver to use for the Cube deployment. Defaults to memory in development, cubestore in production.

Python
JavaScript

context_to_orchestrator_id

In versions of Cube prior to v0.29, each tenant would have an individual instance of the query orchestrator.

context_to_orchestrator_id is a function used to determine a caching key for the query orchestrator instance. The query orchestrator holds database connections, execution queues, pre-aggregation table caches. By default, the same instance is used for all tenants; override this property in situations where each tenant requires their own Query Orchestrator.

Please remember to override pre_aggregations_schema if you override context_to_orchestrator_id. Otherwise, you end up with table name clashes for your pre-aggregations.

Called on each request.

Python
JavaScript

driver_factory

A function to provide a custom configuration for the data source driver.

Called once per data_source for every orchestrator id.

Should be used to configure data source connections dynamically in multitenancy.

Not recommended to be used when multiple data sources can be configured statically. Use CUBEJS_DATASOURCES and decorated environment variables in that case.

In Python, should return a dictionary; in JavaScript, should return an object. It should contain the type element corresponding to data source type and other options that will be passed to a data source driver. You can lookup supported options in the drivers' source code (opens in a new tab).

Python
JavaScript

In JavaScript, custom driver implementations can also be loaded:

const VeryCustomDriver = require('cube-custom-driver');
 
module.exports = {
  driverFactory: ({ securityContext, dataSource }) => {
    return new VeryCustomDriver({
      /* options */
    })
  }
};

orchestrator_options

We strongly recommend leaving these options set to the defaults. Changing these values can result in application instability and/or downtime.

You can pass this object to set advanced options for the query orchestrator.

OptionDescriptionDefault Value
continueWaitTimeoutLong polling interval in seconds, maximum is 905
rollupOnlyModeWhen enabled, an error will be thrown if a query can't be served from a pre-aggregation (rollup)false
queryCacheOptionsQuery cache options for DB queries{}
queryCacheOptions.refreshKeyRenewalThresholdTime in seconds to cache the result of refresh_key checkdefined by DB dialect
queryCacheOptions.backgroundRenewControls whether to wait in foreground for refreshed query data if refresh_key value has been changed. Refresh key queries or pre-aggregations are never awaited in foreground and always processed in background unless cache is empty. If true it immediately returns values from cache if available without refresh_key check to renew in foreground.false
queryCacheOptions.queueOptionsQuery queue options for DB queries{}
preAggregationsOptionsQuery cache options for pre-aggregations{}
preAggregationsOptions.maxPartitionsThe maximum number of partitions each pre-aggregation in a cube can use.10000
preAggregationsOptions.queueOptionsQuery queue options for pre-aggregations{}
preAggregationsOptions.externalRefreshWhen running a separate instance of Cube to refresh pre-aggregations in the background, this option can be set on the API instance to prevent it from trying to check for rollup data being current - it won't try to create or refresh them when this option is truefalse

queryCacheOptions are used while querying database tables, while preAggregationsOptions settings are used to query pre-aggregated tables.

Setting these options is highly discouraged as these are considered to be system-level settings. Please use CUBEJS_DB_QUERY_TIMEOUT and CUBEJS_CONCURRENCY environment variables instead.

Timeout and interval options' values are in seconds.

OptionDescriptionDefault Value
concurrencyMaximum number of queries to be processed simultaneosly. For drivers with connection pool CUBEJS_DB_MAX_POOL should be adjusted accordingly. Typically pool size should be at least twice of total concurrency among all queues.2
executionTimeoutTotal timeout of single query600
orphanedTimeoutQuery will be marked for cancellation if not requested during this period.120
heartBeatIntervalWorker heartbeat interval. If 4*heartBeatInterval time passes without reporting, the query gets cancelled.30
Python
JavaScript

Pre-aggregations

pre_aggregations_schema

Database schema name to use for storing pre-aggregations.

Either string or function can be passed. Providing a function allows to set the schema name dynamically depending on the security context.

Defaults to dev_pre_aggregations in development mode and prod_pre_aggregations in production.

Can be also set via the CUBEJS_PRE_AGGREGATIONS_SCHEMA environment variable.

It's strongly recommended to use different pre-aggregation schemas in development and production environments to avoid pre-aggregation table clashes.

Cube will wipe out the contents of this database schema before use. It shall be used exclusively by Cube and shall not be shared with any application.

Called once per app_id.

Python
JavaScript

scheduled_refresh_timer

This is merely a refresh worker's heartbeat. It doesn't affect the freshness of pre-aggregations or refresh keys, nor how frequently Cube accesses the database. Setting this value to 30s doesn't mean pre-aggregations or in-memory cache would be refreshed every 30 seconds but instead refresh key is checked for freshness every 30 seconds in the background. Please consult the cube refresh_key documentation and pre-aggregation refresh_key documentation on how to set data refresh intervals.

Setting this variable enables refresh worker mode, which means it shouldn't usually be set to any constant number but depend on your cluster environment. Setting it to the constant value in the cluster environment will lead to the instantiation of Refresh Worker on every Cube instance of your cluster, including API ones. This will usually lead to refreshing race conditions and to out of memory errors.

Cube enables background refresh by default using the CUBEJS_REFRESH_WORKER environment variable.

Python
JavaScript

Best practice is to run scheduled_refresh_timer in a separate worker Cube instance.

You may also need to configure scheduledRefreshTimeZones and scheduledRefreshContexts.

scheduled_refresh_time_zones

This option specifies a list of time zones that pre-aggregations will be built for. It has impact on pre-aggregation matching.

You can specify multiple timezones in the TZ Database Name (opens in a new tab) format, e.g., America/Los_Angeles:

Python
JavaScript

The default value is a list of a single time zone. UTC.

This configuration option can be also set using the CUBEJS_SCHEDULED_REFRESH_TIMEZONES environment variable.

scheduled_refresh_contexts

When trying to configure scheduled refreshes for pre-aggregations that use the securityContext inside context_to_app_id or context_to_orchestrator_id, you must also set up scheduled_refresh_contexts. This will allow Cube to generate the necessary security contexts prior to running the scheduled refreshes.

Leaving scheduled_refresh_contexts unconfigured will lead to issues where the security context will be undefined. This is because there is no way for Cube to know how to generate a context without the required input.

Python
JavaScript

Querying

query_rewrite

This is a security hook to check your query just before it gets processed. You can use this very generic API to implement any type of custom security checks your app needs and rewrite input query accordingly.

Called on each request.

For example, you can use query_rewrite to add row-level security filter, if needed:

Python
JavaScript

Raising an exception would prevent a query from running:

Python
JavaScript

allow_ungrouped_without_primary_key

Setting allow_ungrouped_without_primary_key to True disables the primary key inclusion check for ungrouped queries.

Python
JavaScript

APIs

base_path

The base path for the REST API.

Python
JavaScript

The default value is /cubejs-api.

http.cors

CORS settings for the Cube REST API can be configured by providing an object with options from here (opens in a new tab):

Python
JavaScript

web_sockets_base_path

The base path for the WebSocket server.

Python
JavaScript

The default value is / (the root path).

process_subscriptions_interval

This property controls how often WebSocket client subscriptions are refreshed. Defaults to 5000.

Python
JavaScript

context_to_api_scopes

This function is used to select accessible API scopes and effectively allow or disallow access to REST API endpoints, based on the security context.

Security context is provided as the first argument. An array of scopes that was set via CUBEJS_DEFAULT_API_SCOPES is provided as the second argument.

Called on each request.

Python
JavaScript

extend_context

Option to extend the RequestContext with custom values. This method is called on each request.

The function should return an object which gets appended to the RequestContext. Make sure to register your value using context_to_app_id to use cache context for all possible values that your extendContext object key can have.

extend_context is applied only to requests that go through API. It isn't applied to refresh worker execution. If you're looking for a way to provide global environment variables for your data model, please see Execution environment docs.

Python
JavaScript

You can use the custom value from extend context in your data model like this:

const { activeOrganization } = COMPILE_CONTEXT;
 
cube(`Users`, {
  sql: `SELECT * FROM users where organization_id=${activeOrganization}`,
});

check_auth

Used in both REST and WebSockets API.

Called on each request.

Default implementation parses JSON Web Token (JWT) (opens in a new tab) in Authorization header and sets payload to securityContext if it's verified. More information on how to generate these tokens is here.

You can set securityContext = userContextObj inside the middleware if you want to customize SECURITY_CONTEXT.

Currently, assigning to security context doesn't work in Python. Please track this issue (opens in a new tab).

You can use empty check_auth function to disable built-in security or raise an exception to fail the authentication check.

Python
JavaScript

Currently, raising an exception would result in an HTTP response with the status code 500 for Cube Core and 403 for Cube Cloud. Please track this issue (opens in a new tab).

jwt

  jwt: {
    jwkUrl?: ((payload: any) => string) | string;
    key?: string;
    algorithms?: string[];
    issuer?: string[];
    audience?: string;
    subject?: string;
    claimsNamespace?: string;
  };

check_sql_auth

Used in the SQL API. Default implementation verifies user name and password from environment variables: CUBEJS_SQL_USER, CUBEJS_SQL_PASSWORD, but in development mode it ignores validation.

Called on each new connection to Cube SQL API, on change user by SET USER or __user field, every CUBESQL_AUTH_EXPIRE_SECS.

For example, you can use check_sql_auth to validate username and password. password argument is provided only when new connections are established. check_sql_auth implementation should gracefully handle missing password field to handle change user and re-authentication flows. check_sql_auth should always return password as it used for validation of password provided by user. If clear text password can't be obtained, best practice is to return password provided as an argument after password validation. Only security context is used for change user and re-authentication flows so returned password isn't checked in this case.

Python
JavaScript

Check this recipe for an example of using check_sql_auth to authenticate requests to the SQL API with LDAP.

can_switch_sql_user

Used in the SQL API,. Default implementation depends on CUBEJS_SQL_SUPER_USER and return true when it's equal to session's user.

Called on each change request from Cube SQL API.

For example, you can use can_switch_sql_user to define your custom logic:

Python
JavaScript

Utility

logger

A function to server as a custom logger.

Accepts the following arguments:

  • message: the message to be logged
  • params: additional parameters
Python
JavaScript

telemetry

Cube collects high-level anonymous usage statistics for servers started in development mode. It doesn't track any credentials, data model contents or queries issued. This statistics is used solely for the purpose of constant cube.js improvement.

You can opt out of it any time by setting telemetry option to False or, alternatively, by setting CUBEJS_TELEMETRY environment variable to false.

Python
JavaScript

Deprecated

dbType

dbType is deprecated and will be removed in a future release. Use driverFactory instead.

Data source type. Called only once per appId.

module.exports = {
  // string
  dbType: 'snowflake',
 
  // function
  dbType: ({ securityContext }) => 'databricks',
};

Either string or function could be passed. Providing a Function allows to dynamically select a database type depending on the security context. Usually used for multitenancy.

If not defined, Cube will lookup for environment variable CUBEJS_DB_TYPE to resolve the data source type.