Running in Production

Cube.js makes use of two different kinds of cache:

  • Redis, for in-memory storage of query results
  • Cube Store for storing pre-aggregations

In development, Cube.js uses in-memory storage on the server. In production, we strongly recommend running Redis as a separate service.

Cube Store is enabled by default when running Cube.js in development mode. In production, Cube Store must run as a separate process. The easiest way to do this is to use the official Docker images for Cube.js and Cube Store.

Using Windows? We strongly recommend using WSL2 for Windows 10 to run the following commands.

You can run Cube Store with Docker with the following command:

docker run -p 3030:3030 cubejs/cubestore

Cube Store can further be configured via environment variables. To see a complete reference, please consult the Cube Store section of the Environment Variables reference.

Next, run Cube.js and tell it to connect to Cube Store running on localhost (on the default port 3030):

docker run -p 4000:4000 \
  -e CUBEJS_CUBESTORE_HOST=localhost \
  -v ${PWD}:/cube/conf \
  cubejs/cube

In the command above, we're specifying CUBEJS_CUBESTORE_HOST to let Cube.js know where Cube Store is running.

You can also use Docker Compose to achieve the same:

version: '2.2'
services:
  cubestore:
    image: cubejs/cubestore:latest
    environment:
      - CUBESTORE_REMOTE_DIR=/cube/data
    volumes:
      - .cubestore:/cube/data

  cube:
    image: cubejs/cube:latest
    ports:
      - 4000:4000
    environment:
      - CUBEJS_CUBESTORE_HOST=localhost
    depends_on:
      - cubestore
    links:
      - cubestore
    volumes:
      - ./schema:/cube/conf/schema

Cube Store can be run in a single instance mode, but this is usually unsuitable for production deployments. For high concurrency and data throughput, we strongly recommend running Cube Store as a cluster of multiple instances instead.

Scaling Cube Store for a higher concurrency is relatively simple when running in cluster mode. Because the storage layer is decoupled from the query processing engine, you can horizontally scale your Cube Store cluster for as much concurrency as you require.

In cluster mode, Cube Store runs two kinds of nodes:

  • a single router node handles incoming client connections, manages database metadata and serves simple queries.
  • multiple worker nodes which execute SQL queries

The configuration required for each node can be found in the table below. More information about these variables can be found in the Environment Variables reference.

Environment VariableSpecify on Router?Specify on Worker?
CUBESTORE_SERVER_NAMEYesYes
CUBESTORE_META_PORTYes-
CUBESTORE_WORKERSYesYes
CUBESTORE_WORKER_PORT-Yes
CUBESTORE_META_ADDR-Yes

To fully take advantage of the worker nodes in the cluster, we strongly recommend using partitioned pre-aggregations.

A sample Docker Compose stack setting this up might look like:

version: '2.2'
services:
  cubestore_router:
    restart: always
    image: cubejs/cubestore:latest
    environment:
      - CUBESTORE_SERVER_NAME=cubestore_router:9999
      - CUBESTORE_META_PORT=9999
      - CUBESTORE_WORKERS=cubestore_worker_1:9001,cubestore_worker_2:9001
      - CUBESTORE_REMOTE_DIR=/cube/data
    volumes:
      - .cubestore:/cube/data
  cubestore_worker_1:
    restart: always
    image: cubejs/cubestore:latest
    environment:
      - CUBESTORE_SERVER_NAME=cubestore_worker_1:9001
      - CUBESTORE_WORKER_PORT=9001
      - CUBESTORE_META_ADDR=cubestore_router:9999
      - CUBESTORE_WORKERS=cubestore_worker_1:9001,cubestore_worker_2:9001
      - CUBESTORE_REMOTE_DIR=/cube/data
    depends_on:
      - cubestore_router
    volumes:
      - .cubestore:/cube/data
  cubestore_worker_2:
    restart: always
    image: cubejs/cubestore:latest
    environment:
      - CUBESTORE_SERVER_NAME=cubestore_worker_2:9001
      - CUBESTORE_WORKER_PORT=9001
      - CUBESTORE_META_ADDR=cubestore_router:9999
      - CUBESTORE_WORKERS=cubestore_worker_1:9001,cubestore_worker_2:9001
      - CUBESTORE_REMOTE_DIR=/cube/data
    depends_on:
      - cubestore_router
    volumes:
      - .cubestore:/cube/data
  cube:
    image: cubejs/cube:latest
    ports:
      - 4000:4000
    environment:
      - CUBEJS_CUBESTORE_HOST=cubestore_router
    depends_on:
      - cubestore_router
    volumes:
      - .:/cube/conf

Cube Store can only use one type of remote storage at runtime.

Cube Store makes use of a separate storage layer for storing metadata as well as for persisting pre-aggregations as Parquet files. Cube Store can be configured to use either AWS S3 or Google Cloud. If desired, a path on the server can also be used (this includes network shares and NFS mounts).

A simplified example using AWS S3 might look like:

version: '2.2'
services:
  cubestore_router:
    image: cubejs/cubestore:latest
    environment:
      - CUBESTORE_SERVER_NAME=cubestore_router:9999
      - CUBESTORE_META_PORT=9999
      - CUBESTORE_WORKERS=cubestore_worker_1:9001
      - CUBESTORE_S3_BUCKET=<BUCKET_NAME_IN_S3>
      - CUBESTORE_S3_REGION=<BUCKET_REGION_IN_S3>
      - CUBESTORE_AWS_ACCESS_KEY_ID=<AWS_ACCESS_KEY_ID>
      - CUBESTORE_AWS_SECRET_ACCESS_KEY=<AWS_SECRET_ACCESS_KEY>
  cubestore_worker_1:
    image: cubejs/cubestore:latest
    environment:
      - CUBESTORE_SERVER_NAME=cubestore_worker_1:9001
      - CUBESTORE_WORKER_PORT=9001
      - CUBESTORE_META_ADDR=cubestore_router:9999
      - CUBESTORE_WORKERS=cubestore_worker_1:9001
      - CUBESTORE_S3_BUCKET=<BUCKET_NAME_IN_S3>
      - CUBESTORE_S3_REGION=<BUCKET_REGION_IN_S3>
      - CUBESTORE_AWS_ACCESS_KEY_ID=<AWS_ACCESS_KEY_ID>
      - CUBESTORE_AWS_SECRET_ACCESS_KEY=<AWS_SECRET_ACCESS_KEY>
    depends_on:
      - cubestore_router

AWS

Cube Store can retrieve security credentials from instance metadata automatically. This means you can skip defining the CUBESTORE_AWS_ACCESS_KEY_ID and CUBESTORE_AWS_SECRET_ACCESS_KEY environment variables.

Cube Store currently does not take the key expiration time returned from instance metadata into account; instead the refresh duration for the key is defined by CUBESTORE_AWS_CREDS_REFRESH_EVERY_MINS, which is set to 180 by default.

Cube Store currently does not have any in-built authentication mechanisms. For this reason, we recommend running your Cube Store cluster on a network that only allows requests from the Cube.js deployment.

Did you find this page useful?