Overview
This section contains a general overview of deploying a Cube cluster in production. You can also check platform-specific guides for Cube Cloud and Docker.
If you are moving Cube to production, check out the Production Checklist.
Components
As shown in the diagram below, a typical production deployment of Cube includes the following components:
- One or multiple API instances
- A Refresh Worker
- A Cube Store cluster
API Instances process incoming API requests and query either Cube Store for pre-aggregated data or connected database(s) for raw data. The Refresh Worker builds and refreshes pre-aggregations in the background. Cube Store ingests pre-aggregations built by Refresh Worker and responds to queries from API instances.
API instances and Refresh Workers can be configured via environment
variables or the cube.js
configuration file.
They also need access to the data model files. Cube Store clusters can be
configured via environment variables.
You can find an example Docker Compose configuration for a Cube deployment in the platform-specific guide for Docker.
API instances
API instances process incoming API requests and query either Cube Store for pre-aggregated data or connected data sources for raw data. It is possible to horizontally scale API instances and use a load balancer to balance incoming requests between multiple API instances.
The Cube Docker image (opens in a new tab) is used for API Instance.
API instances can be configured via environment variables or the cube.js
configuration file, and must have access to the data model files (as
specified by schema_path
.
Refresh Worker
A Refresh Worker updates pre-aggregations and invalidates the in-memory cache in the background. They also keep the refresh keys up-to-date for all data models and pre-aggregations. Please note that the in-memory cache is just invalidated but not populated by Refresh Worker. In-memory cache is populated lazily during querying. On the other hand, pre-aggregations are eagerly populated and kept up-to-date by Refresh Worker.
The Cube Docker image (opens in a new tab) can be used for creating Refresh Workers; to
make the service act as a Refresh Worker, CUBEJS_REFRESH_WORKER=true
should be
set in the environment variables.
Cube Store
Cube Store is the purpose-built pre-aggregations storage for Cube.
Cube Store uses a distributed query engine architecture. In every Cube Store cluster:
- a one or many router nodes handle incoming connections, manages database metadata, builds query plans, and orchestrates their execution
- multiple worker nodes ingest warmed up data and execute queries in parallel
- a local or cloud-based blob storage keeps pre-aggregated data in columnar format
By default, Cube Store listens on the port 3030
for queries coming from Cube.
The port could be changed by setting CUBESTORE_HTTP_PORT
environment variable.
In a case of using custom port, please make sure to change
CUBEJS_CUBESTORE_PORT
environment variable for Cube API Instances and Refresh
Worker.
Both the router and worker use the Cube Store Docker image (opens in a new tab). The following environment variables should be used to manage the roles:
Environment Variable | Specify on Router? | Specify on Worker? |
---|---|---|
CUBESTORE_SERVER_NAME | ✅ Yes | ✅ Yes |
CUBESTORE_META_PORT | ✅ Yes | — |
CUBESTORE_WORKERS | ✅ Yes | ✅ Yes |
CUBESTORE_WORKER_PORT | — | ✅ Yes |
CUBESTORE_META_ADDR | — | ✅ Yes |
Looking for a deeper dive on Cube Store architecture? Check out this presentation (opens in a new tab) by our CTO, Pavel (opens in a new tab).
Cube Store Router
The Router in a Cube Store cluster is responsible for receiving queries from Cube, managing metadata for the Cube Store cluster, and query planning and distribution for the Workers. It also provides a MySQL-compatible interface that can be used to query pre-aggregations from Cube Store directly. Cube only communicates with the Router, and does not interact with Workers directly.
Cube Store Worker
Workers in a Cube Store cluster receive and execute subqueries from the Router, and directly interact with the underlying distributed storage for insertions, selections and pre-aggregation warmup. Workers do not interact with each other directly, and instead rely on the Router to distribute queries and manage any associated metadata.
Scaling
Although Cube Store can be run in single-instance mode, this is often unsuitable for production deployments. For high concurrency and data throughput, we strongly recommend running Cube Store as a cluster of multiple instances instead. Because the storage layer is decoupled from the query processing engine, you can horizontally scale your Cube Store cluster for as much concurrency as you require.
A sample Docker Compose stack setting Cube Store cluster up might look like:
services:
cubestore_router:
image: cubejs/cubestore:latest
environment:
- CUBESTORE_WORKERS=cubestore_worker_1:10001,cubestore_worker_2:10002
- CUBESTORE_REMOTE_DIR=/cube/data
- CUBESTORE_META_PORT=9999
- CUBESTORE_SERVER_NAME=cubestore_router:9999
volumes:
- .cubestore:/cube/data
depends_on:
- cubestore_worker_1
- cubestore_worker_2
cubestore_worker_1:
image: cubejs/cubestore:latest
environment:
- CUBESTORE_WORKERS=cubestore_worker_1:10001,cubestore_worker_2:10002
- CUBESTORE_SERVER_NAME=cubestore_worker_1:10001
- CUBESTORE_WORKER_PORT=10001
- CUBESTORE_REMOTE_DIR=/cube/data
- CUBESTORE_META_ADDR=cubestore_router:9999
volumes:
- .cubestore:/cube/data
cubestore_worker_2:
image: cubejs/cubestore:latest
environment:
- CUBESTORE_WORKERS=cubestore_worker_1:10001,cubestore_worker_2:10002
- CUBESTORE_SERVER_NAME=cubestore_worker_2:10002
- CUBESTORE_WORKER_PORT=10002
- CUBESTORE_REMOTE_DIR=/cube/data
- CUBESTORE_META_ADDR=cubestore_router:9999
volumes:
- .cubestore:/cube/data
Storage
Cube Store makes use of a separate storage layer for storing metadata as well as for persisting pre-aggregations as Parquet files. Cube Store can use both AWS S3 and Google Cloud, or if desired, a local path on the server if all nodes of a cluster run on a single machine.
A simplified example using AWS S3 might look like:
services:
cubestore_router:
image: cubejs/cubestore:latest
environment:
- CUBESTORE_SERVER_NAME=cubestore_router:9999
- CUBESTORE_META_PORT=9999
- CUBESTORE_WORKERS=cubestore_worker_1:9001
- CUBESTORE_S3_BUCKET=<BUCKET_NAME_IN_S3>
- CUBESTORE_S3_REGION=<BUCKET_REGION_IN_S3>
- CUBESTORE_AWS_ACCESS_KEY_ID=<AWS_ACCESS_KEY_ID>
- CUBESTORE_AWS_SECRET_ACCESS_KEY=<AWS_SECRET_ACCESS_KEY>
cubestore_worker_1:
image: cubejs/cubestore:latest
environment:
- CUBESTORE_SERVER_NAME=cubestore_worker_1:9001
- CUBESTORE_WORKER_PORT=9001
- CUBESTORE_META_ADDR=cubestore_router:9999
- CUBESTORE_WORKERS=cubestore_worker_1:9001
- CUBESTORE_S3_BUCKET=<BUCKET_NAME_IN_S3>
- CUBESTORE_S3_REGION=<BUCKET_REGION_IN_S3>
- CUBESTORE_AWS_ACCESS_KEY_ID=<AWS_ACCESS_KEY_ID>
- CUBESTORE_AWS_SECRET_ACCESS_KEY=<AWS_SECRET_ACCESS_KEY>
depends_on:
- cubestore_router