Configuration options
Following configuration options can be defined either
using Python, in a cube.py
file, or using JavaScript, in a cube.js
file.
Note that configuration options follow the snake case (opens in a new tab)
convention in Python (base_path
) and the camel case (opens in a new tab) convention in
JavaScript (basePath
).
Every configuration option that is a function (e.g., query_rewrite
)
can be defined as either synchronous or asynchronous. Cube will await for
the completion of asynchronous functions.
It's wise to make functions that are called on each request as fast as possible to minimize the performance hit. Consider using caching when applicable and performing calculations outside of these functions.
Data model
schema_path
Path to data model files.
This configuration option can also be set using the CUBEJS_SCHEMA_PATH
environment variable. The default value is model
.
Use repositoryFactory
for multitenancy
or when a more flexible setup is needed.
context_to_app_id
It's a multitenancy option.
context_to_app_id
is a function to determine an app id which is used as
caching key for various in-memory structures like data model compilation
results, etc.
Called on each request.
repository_factory
This option allows to customize the repository for Cube data model files. It is a function, which accepts a context object and can dynamically provide data model files. Learn more about it in multitenancy.
Called only once per app_id
.
You can use convenient file_repository
implementation to read files
from a specified path:
You can also provide file contents directly, e.g., after fetching them from a remote storage or via an API:
schema_version
schema_version
can be used to tell Cube that the data model should be recompiled
in case it depends on dynamic definitions fetched from some external database or
API.
This method is called on each request however RequestContext
parameter is
reused per application ID as determined by
context_to_app_id
. If the returned string is different,
the data model will be recompiled. It can be used in both multi-tenant and
single tenant environments.
compiler_cache_size
Maximum number of compiled data models to persist with in-memory cache. Defaults to 250, but optimum value will depend on deployed environment. When the max is reached, will start dropping the least recently used data models from the cache.
max_compiler_cache_keep_alive
Maximum length of time in ms to keep compiled data models in memory. Default keeps data models in memory indefinitely.
update_compiler_cache_keep_alive
Setting update_compiler_cache_keep_alive
to True
keeps frequently used data models
in memory by reseting their max_compiler_cache_keep_alive
every time they are
accessed.
allow_js_duplicate_props_in_schema
Boolean to enable or disable a check duplicate property names in all objects of
a data model. The default value is false
, and it is means the compiler would
use the additional transpiler for check duplicates.
Query cache & queue
cache_and_queue_driver
The cache and queue driver to use for the Cube deployment. Defaults to
memory
in development, cubestore
in production.
This configuration option can also be set using the CUBEJS_CACHE_AND_QUEUE_DRIVER
environment variable.
context_to_orchestrator_id
In versions of Cube prior to v0.29, each tenant would have an individual instance of the query orchestrator.
context_to_orchestrator_id
is a function used to determine a caching key for the
query orchestrator instance. The query orchestrator holds database connections,
execution queues, pre-aggregation table caches. By default, the same instance is
used for all tenants; override this property in situations where each tenant
requires their own Query Orchestrator.
Please remember to override
pre_aggregations_schema
if you override
context_to_orchestrator_id
. Otherwise, you end up with table name clashes for
your pre-aggregations.
Called on each request.
driver_factory
A function to provide a custom configuration for the data source driver.
Called once per data_source
for every
orchestrator id.
Should be used to configure data source connections dynamically in multitenancy.
Not recommended to be used when multiple data sources
can be configured statically. Use CUBEJS_DATASOURCES
and decorated environment
variables in that case.
In Python, should return a dictionary; in JavaScript, should return an object.
It should contain the type
element corresponding to data source type and other
options that will be passed to a data source driver. You can lookup supported options
in the drivers' source code (opens in a new tab).
In JavaScript, custom driver implementations can also be loaded:
const VeryCustomDriver = require('cube-custom-driver');
module.exports = {
driverFactory: ({ securityContext, dataSource }) => {
return new VeryCustomDriver({
/* options */
})
}
};
orchestrator_options
We strongly recommend leaving these options set to the defaults. Changing these values can result in application instability and/or downtime.
You can pass this object to set advanced options for the query orchestrator.
Option | Description | Default Value |
---|---|---|
continueWaitTimeout | Long polling interval in seconds, maximum is 90 | 5 |
rollupOnlyMode | When enabled, an error will be thrown if a query can't be served from a pre-aggregation (rollup) | false |
queryCacheOptions | Query cache options for DB queries | {} |
queryCacheOptions.refreshKeyRenewalThreshold | Time in seconds to cache the result of refresh_key check | defined by DB dialect |
queryCacheOptions.backgroundRenew | Controls whether to wait in foreground for refreshed query data if refresh_key value has been changed. Refresh key queries or pre-aggregations are never awaited in foreground and always processed in background unless cache is empty. If true it immediately returns values from cache if available without refresh_key check to renew in foreground. | false |
queryCacheOptions.queueOptions | Query queue options for DB queries | {} |
preAggregationsOptions | Query cache options for pre-aggregations | {} |
preAggregationsOptions.maxPartitions | The maximum number of partitions each pre-aggregation in a cube can use. | 10000 |
preAggregationsOptions.queueOptions | Query queue options for pre-aggregations | {} |
preAggregationsOptions.externalRefresh | When running a separate instance of Cube to refresh pre-aggregations in the background, this option can be set on the API instance to prevent it from trying to check for rollup data being current - it won't try to create or refresh them when this option is true | false |
queryCacheOptions
are used while querying database tables, while
preAggregationsOptions
settings are used to query pre-aggregated tables.
Setting these options is highly discouraged as these are considered to be
system-level settings. Please use CUBEJS_ROLLUP_ONLY
, CUBEJS_DB_QUERY_TIMEOUT
, and
CUBEJS_CONCURRENCY
environment variables instead.
Timeout and interval options' values are in seconds.
Option | Description | Default Value |
---|---|---|
concurrency | Maximum number of queries to be processed simultaneosly. For drivers with connection pool CUBEJS_DB_MAX_POOL should be adjusted accordingly. Typically pool size should be at least twice of total concurrency among all queues. | 2 |
executionTimeout | Total timeout of single query | 600 |
orphanedTimeout | Query will be marked for cancellation if not requested during this period. | 120 |
heartBeatInterval | Worker heartbeat interval. If 4*heartBeatInterval time passes without reporting, the query gets cancelled. | 30 |
Pre-aggregations
pre_aggregations_schema
Database schema name to use for storing pre-aggregations.
Either string or function can be passed. Providing a function allows to set the schema name dynamically depending on the security context.
Defaults to dev_pre_aggregations
in development mode
and prod_pre_aggregations
in production.
This configuration option can also be set using the CUBEJS_PRE_AGGREGATIONS_SCHEMA
environment variable.
It's strongly recommended to use different pre-aggregation schemas in development and production environments to avoid pre-aggregation table clashes.
Cube will wipe out the contents of this database schema before use. It shall be used exclusively by Cube and shall not be shared with any application.
Called once per app_id
.
scheduled_refresh_timer
This is merely a refresh worker's heartbeat. It doesn't affect the freshness of
pre-aggregations or refresh keys, nor how frequently Cube accesses the database.
Setting this value to 30s
doesn't mean pre-aggregations or in-memory cache
would be refreshed every 30 seconds but instead refresh key is checked for
freshness every 30 seconds in the background. Please consult the cube
refresh_key
documentation and
pre-aggregation refresh_key
documentation
on how to set data refresh intervals.
Setting this variable enables refresh worker mode, which means it shouldn't usually be set to any constant number but depend on your cluster environment. Setting it to the constant value in the cluster environment will lead to the instantiation of Refresh Worker on every Cube instance of your cluster, including API ones. This will usually lead to refreshing race conditions and to out of memory errors.
Cube enables background refresh by default using the CUBEJS_REFRESH_WORKER
environment variable.
Best practice is to run scheduled_refresh_timer
in a separate worker Cube
instance.
You may also need to configure
scheduledRefreshTimeZones
and
scheduledRefreshContexts
.
scheduled_refresh_time_zones
This option specifies a list of time zones that pre-aggregations will be built for. It has impact on pre-aggregation matching.
Either an array or function returning an array can be passed. Providing a function allows to set the time zones dynamically depending on the security context.
Time zones should be specified in the TZ Database Name (opens in a new tab) format,
e.g., America/Los_Angeles
.
The default value is a list of a single time zone: UTC
.
This configuration option can also be set using the
CUBEJS_SCHEDULED_REFRESH_TIMEZONES
environment variable.
scheduled_refresh_contexts
When trying to configure scheduled refreshes for pre-aggregations that use the
securityContext
inside context_to_app_id
or context_to_orchestrator_id
, you must
also set up scheduled_refresh_contexts
. This will allow Cube to generate the
necessary security contexts prior to running the scheduled refreshes.
Leaving scheduled_refresh_contexts
unconfigured will lead to issues where the
security context will be undefined
. This is because there is no way for Cube
to know how to generate a context without the required input.
Querying
query_rewrite
This is a security hook to check your query just before it gets processed. You can use this very generic API to implement any type of custom security checks your app needs and rewrite input query accordingly.
Called on each request.
For example, you can use query_rewrite
to add row-level security filter, if
needed:
Raising an exception would prevent a query from running:
Currently, there's no built-in way to access the data model metadata in
query_rewrite
. Please track this issue (opens in a new tab)
and read about a workaround (opens in a new tab).
allow_ungrouped_without_primary_key
Setting allow_ungrouped_without_primary_key
to True
disables the primary
key inclusion check for ungrouped queries.
This configuration option can also be set using the
CUBEJS_ALLOW_UNGROUPED_WITHOUT_PRIMARY_KEY
environment variable.
When query pushdown in the SQL API is enabled via the CUBESQL_SQL_PUSH_DOWN
environment variable, this option is enabled as well for the best user experience.
APIs
base_path
The base path for the REST API.
The default value is /cubejs-api
.
http.cors
CORS settings for the Cube REST API can be configured by providing an object with options from here (opens in a new tab):
web_sockets_base_path
The base path for the WebSocket server.
The default value is /
(the root path).
process_subscriptions_interval
This property controls how often WebSocket client subscriptions are refreshed.
Defaults to 5000
.
context_to_api_scopes
This function is used to select accessible API scopes and effectively allow or disallow access to REST API endpoints, based on the security context.
Security context is provided as the first argument. An array of scopes that
was set via CUBEJS_DEFAULT_API_SCOPES
is provided as the second argument.
Called on each request.
extend_context
This function is used to extend the security context with additional data.
Called on each request.
It should return an object which gets appended to the request context,
an object that contains securityContext
and that is passed as an argument to
other functions like context_to_app_id
or repository_factory
.
When using extend_context
, you should also define context_to_app_id
so that all possible values of the extended context are reflected in the app id.
You can use the custom value from extend context in your data model like this:
{% set securityContext = COMPILE_CONTEXT['securityContext'] %}
cubes:
- name: users
sql: >
SELECT *
FROM users
WHERE organization_id={{ securityContext['active_organization'] }}
extend_context
is applied only to requests that go through APIs. It isn't
applied to refresh worker execution. If you're looking for a way to provide
global environment variables for your data model, please see the execution
environment documentation.
check_auth
Used in the REST API. Default implementation parses the JSON
Web Token (opens in a new tab) in the Authorization
header, verifies it, and sets its
payload to the securityContext
. Read more about JWT generation.
Called on each request.
You can return an object with the security_context
field if you want to
customize SECURITY_CONTEXT
.
You can use empty check_auth
function to disable built-in security or
raise an exception to fail the authentication check.
Currently, raising an exception would result in an HTTP response with the status code 500 for Cube Core and 403 for Cube Cloud. Please track this issue (opens in a new tab).
jwt
jwt: {
jwkUrl?: ((payload: any) => string) | string;
key?: string;
algorithms?: string[];
issuer?: string[];
audience?: string;
subject?: string;
claimsNamespace?: string;
};
Option | Description | Environment variable |
---|---|---|
jwkUrl | URL from which JSON Web Key Sets (JWKS) can be retrieved | Can also be set using CUBEJS_JWK_URL |
key | JSON string that represents a cryptographic key. Similar to CUBEJS_API_SECRET | Can also be set using CUBEJS_JWT_KEY |
algorithms | Any supported algorithm for decoding JWTs (opens in a new tab) | Can also be set using CUBEJS_JWT_ALGS |
issuer | Issuer value which will be used to enforce the iss claim from inbound JWTs (opens in a new tab) | Can also be set using CUBEJS_JWT_ISSUER |
audience | Audience value which will be used to enforce the aud claim from inbound JWTs (opens in a new tab) | Can also be set using CUBEJS_JWT_AUDIENCE |
subject | Subject value which will be used to enforce the sub claim from inbound JWTs (opens in a new tab) | Can also be set using CUBEJS_JWT_SUBJECT |
claimsNamespace | Namespace within the decoded JWT under which any custom claims can be found | Can also be set using CUBEJS_JWT_CLAIMS_NAMESPACE |
check_sql_auth
Used in the SQL API. Default
implementation verifies user name and password from environment variables:
CUBEJS_SQL_USER
, CUBEJS_SQL_PASSWORD
, but in development
mode it ignores validation.
Called on each new connection to Cube SQL API, on change user by SET USER
or __user
field, every CUBESQL_AUTH_EXPIRE_SECS
.
For example, you can use check_sql_auth
to validate username and password.
password
argument is provided only when new connections are established.
check_sql_auth
implementation should gracefully handle missing password
field to handle change user and re-authentication flows.
check_sql_auth
should always return password
as it used for validation of password provided by user.
If clear text password can't be obtained, best practice is to return password
provided as an argument after password validation.
Only security context is used for change user and re-authentication flows so returned password
isn't checked in this case.
Check this recipe for an example of
using check_sql_auth
to authenticate requests to the SQL API with LDAP.
can_switch_sql_user
Used in the SQL API. Default implementation depends on
CUBEJS_SQL_SUPER_USER
and returns true
when it's equal to session's user.
Called on each change request from Cube SQL API.
For example, you can use can_switch_sql_user
to define your custom logic:
Utility
logger
A function to server as a custom logger.
Accepts the following arguments:
message
: the message to be loggedparams
: additional parameters
See also the CUBEJS_LOG_LEVEL
environment variable.
telemetry
Cube collects high-level anonymous usage statistics for servers started in development mode. It doesn't track any credentials, data model contents or queries issued. This statistics is used solely for the purpose of constant cube.js improvement.
You can opt out of it any time by setting telemetry
option to False
or,
alternatively, by setting CUBEJS_TELEMETRY
environment variable to false
.
Deprecated
dbType
dbType
is deprecated and will be removed in a future release.
Use driverFactory
instead.
Data source type. Called only once per appId
.
module.exports = {
// string
dbType: 'snowflake',
// function
dbType: ({ securityContext }) => 'databricks',
};
Either string
or function
could be passed. Providing a Function
allows
to dynamically select a database type depending on the security context.
Usually used for multitenancy.
If not defined, Cube will lookup for environment variable
CUBEJS_DB_TYPE
to resolve the data source type.