Deployment Overview

This section contains a general overview of deploying a Cube cluster in production. You can find platform-specific guides for:

If you are moving Cube to production, check out the Production Checklist.

As shown in the diagram below, a typical production Cube cluster consists of one or multiple API instances, a Refresh Worker and a Cube Store cluster.

Deployment Overview

API Instances process incoming API requests and query either Cube Store for pre-aggregated data or connected database(s) for raw data. The Refresh Worker builds and refreshes pre-aggregations in the background. Cube Store ingests pre-aggregations built by Refresh Worker and responds to queries from API instances.

API instances and Refresh Worker can be configured via environment variables or cube.js configuration file. They also need access to the data schema files.

Cube Store cluster can be configured via environment variables.

Below you can find an example Docker Compose configuration for a Cube cluster:

version: '2.2'

services:
  cube_api:
    image: cubejs/cube
    ports:
      - 4000:4000
    environment:
      - CUBEJS_DB_TYPE=bigquery
      - CUBEJS_DB_BQ_PROJECT_ID=cubejs-k8s-cluster
      - CUBEJS_DB_BQ_CREDENTIALS=<BQ-KEY>
      - CUBEJS_DB_EXPORT_BUCKET=cubestore

      - CUBEJS_CUBESTORE_HOST=cubestore_router

      - CUBEJS_API_SECRET=secret
    volumes:
      - .:/cube/conf
    depends_on:
      - cubestore_worker_1
      - cubestore_worker_2
      - cube_refresh_worker

  cube_refresh_worker:
    image: cubejs/cube
    environment:
      - CUBEJS_DB_TYPE=bigquery
      - CUBEJS_DB_BQ_PROJECT_ID=cubejs-k8s-cluster
      - CUBEJS_DB_BQ_CREDENTIALS=<BQ-KEY>
      - CUBEJS_DB_EXPORT_BUCKET=cubestore

      - CUBEJS_CUBESTORE_HOST=cubestore_router

      - CUBEJS_API_SECRET=secret

      - CUBEJS_REFRESH_WORKER=true
    volumes:
      - .:/cube/conf

  cubestore_router:
    image: cubejs/cubestore:latest
    environment:
      - CUBESTORE_WORKERS=cubestore_worker_1:10001,cubestore_worker_2:10002
      - CUBESTORE_REMOTE_DIR=/cube/data
      - CUBESTORE_META_PORT=9999
      - CUBESTORE_SERVER_NAME=cubestore_router:9999
    volumes:
      - .cubestore:/cube/data

  cubestore_worker_1:
    image: cubejs/cubestore:latest
    environment:
      - CUBESTORE_WORKERS=cubestore_worker_1:10001,cubestore_worker_2:10002
      - CUBESTORE_SERVER_NAME=cubestore_worker_1:10001
      - CUBESTORE_WORKER_PORT=10001
      - CUBESTORE_REMOTE_DIR=/cube/data
      - CUBESTORE_META_ADDR=cubestore_router:9999
    volumes:
      - .cubestore:/cube/data
    depends_on:
      - cubestore_router

  cubestore_worker_2:
    image: cubejs/cubestore:latest
    environment:
      - CUBESTORE_WORKERS=cubestore_worker_1:10001,cubestore_worker_2:10002
      - CUBESTORE_SERVER_NAME=cubestore_worker_2:10002
      - CUBESTORE_WORKER_PORT=10002
      - CUBESTORE_REMOTE_DIR=/cube/data
      - CUBESTORE_META_ADDR=cubestore_router:9999
    volumes:
      - .cubestore:/cube/data
    depends_on:
      - cubestore_router

API instances process incoming API requests and query either Cube Store for pre-aggregated data or connected data sources for raw data. It is possible to horizontally scale API instances and use a load balancer to balance incoming requests between multiple API instances.

The Cube Docker image is used for API Instance.

API instance needs to be configured via environment variables, cube.js file and has access to the data schema files.

A Refresh Worker updates pre-aggregations and the in-memory cache in the background. They also keep the refresh keys up-to-date for all defined schemas and pre-aggregations.

Cube Docker image can be used for creating Refresh Workers; to make the service act as a Refresh Worker, CUBEJS_REFRESH_WORKER=true should be set in the environment variables.

Cube Store is the purpose-built pre-aggregations storage for Cube.

Cube Store uses a distributed query engine architecture. In every Cube Store cluster:

  • a one or many router nodes handle incoming connections, manages database metadata, builds query plans, and orchestrates their execution
  • multiple worker nodes ingest warmed up data and execute queries in parallel
  • a local or cloud-based blob storage keeps pre-aggregated data in columnar format

Cube Store architecture diagram

More information on Cube Store architecture can be found in this presentation.

By default, Cube Store listens on the port 3030 for queries coming from Cube. The port could be changed by setting CUBESTORE_HTTP_PORT environment variable. In a case of using custom port, please make sure to change CUBEJS_CUBESTORE_PORT environment variable for Cube API Instances and Refresh Worker.

Although Cube Store can be run in single-instance mode, this is often unsuitable for production deployments. For high concurrency and data throughput, we strongly recommend running Cube Store as a cluster of multiple instances instead. Because the storage layer is decoupled from the query processing engine, you can horizontally scale your Cube Store cluster for as much concurrency as you require.

Cube Store has two "kinds" of nodes:

  • The router node handles incoming client connections, manages database metadata and serves simple queries
  • Multiple worker nodes which execute SQL queries received from Cube

Both the router and worker use the Cube Store Docker image. The following environment variables should be used to manage the roles:

Environment VariableSpecify on Router?Specify on Worker?
CUBESTORE_SERVER_NAMEYesYes
CUBESTORE_META_PORTYes-
CUBESTORE_WORKERSYesYes
CUBESTORE_WORKER_PORT-Yes
CUBESTORE_META_ADDR-Yes

A sample Docker Compose stack setting Cube Store cluster up might look like:

version: '2.2'

services:
  cubestore_router:
    image: cubejs/cubestore:latest
    environment:
      - CUBESTORE_WORKERS=cubestore_worker_1:10001,cubestore_worker_2:10002
      - CUBESTORE_REMOTE_DIR=/cube/data
      - CUBESTORE_META_PORT=9999
      - CUBESTORE_SERVER_NAME=cubestore_router:9999
    volumes:
      - .cubestore:/cube/data
    depends_on:
      - cubestore_worker_1
      - cubestore_worker_2

  cubestore_worker_1:
    image: cubejs/cubestore:latest
    environment:
      - CUBESTORE_WORKERS=cubestore_worker_1:10001,cubestore_worker_2:10002
      - CUBESTORE_SERVER_NAME=cubestore_worker_1:10001
      - CUBESTORE_WORKER_PORT=10001
      - CUBESTORE_REMOTE_DIR=/cube/data
      - CUBESTORE_META_ADDR=cubestore_router:9999
    volumes:
      - .cubestore:/cube/data

  cubestore_worker_2:
    image: cubejs/cubestore:latest
    environment:
      - CUBESTORE_WORKERS=cubestore_worker_1:10001,cubestore_worker_2:10002
      - CUBESTORE_SERVER_NAME=cubestore_worker_2:10002
      - CUBESTORE_WORKER_PORT=10002
      - CUBESTORE_REMOTE_DIR=/cube/data
      - CUBESTORE_META_ADDR=cubestore_router:9999
    volumes:
      - .cubestore:/cube/data

Cube Store makes use of a separate storage layer for storing metadata as well as for persisting pre-aggregations as Parquet files. Cube Store can use both AWS S3 and Google Cloud, or if desired, a local path on the server if all nodes of a cluster run on a single machine.

A simplified example using AWS S3 might look like:

version: '2.2'
services:
  cubestore_router:
    image: cubejs/cubestore:latest
    environment:
      - CUBESTORE_SERVER_NAME=cubestore_router:9999
      - CUBESTORE_META_PORT=9999
      - CUBESTORE_WORKERS=cubestore_worker_1:9001
      - CUBESTORE_S3_BUCKET=<BUCKET_NAME_IN_S3>
      - CUBESTORE_S3_REGION=<BUCKET_REGION_IN_S3>
      - CUBESTORE_AWS_ACCESS_KEY_ID=<AWS_ACCESS_KEY_ID>
      - CUBESTORE_AWS_SECRET_ACCESS_KEY=<AWS_SECRET_ACCESS_KEY>
  cubestore_worker_1:
    image: cubejs/cubestore:latest
    environment:
      - CUBESTORE_SERVER_NAME=cubestore_worker_1:9001
      - CUBESTORE_WORKER_PORT=9001
      - CUBESTORE_META_ADDR=cubestore_router:9999
      - CUBESTORE_WORKERS=cubestore_worker_1:9001
      - CUBESTORE_S3_BUCKET=<BUCKET_NAME_IN_S3>
      - CUBESTORE_S3_REGION=<BUCKET_REGION_IN_S3>
      - CUBESTORE_AWS_ACCESS_KEY_ID=<AWS_ACCESS_KEY_ID>
      - CUBESTORE_AWS_SECRET_ACCESS_KEY=<AWS_SECRET_ACCESS_KEY>
    depends_on:
      - cubestore_router

Did you find this page useful?