How you win by using Cube Store: Part 1

With this article, we start a multi-part series that will cover the spectrum of reasons to use Cube Store, a purpose-built distributed, fast, and reliable data store, as a part of your Cube deployments.

With more than 2 years in active development, Cube Store accounts for half of Cube’s Rust code base and supports some of the most critical features: in-memory cache, query queue management, and pre-aggregations. Thanks to these features, Cube provides fantastic support for embedded analytics and real-time analytics use cases, processing client queries to cloud data warehouses and streaming databases with sub-second latency and high concurrency while safeguarding these data sources from load bursts and overspending.

In this article, we’ll focus on the in-memory cache and query queue features. Last year, we announced plans to update these features to use Cube Store and, consequently, remove Redis®* from Cube deployments. Much work has been done on GitHub, and we’ve also run a webinar covering these changes recently. Now, let’s recap and dig deeper.

* Redis is a registered trademark of Redis Ltd. Any rights therein are reserved to Redis Ltd. Any use by Cube is for referential purposes only and does not indicate any sponsorship, endorsement or affiliation between Redis and Cube.

In-memory cache in Cube

As the semantic layer for data applications, Cube is designed to sit between data sources (e.g., cloud data warehouses and traditional databases) and data consumers (e.g., BI tools, data notebooks, and front-end apps). It receives queries from these data consumers via any of the supported APIs (i.e., SQL API, REST API, and GraphQL API) and fulfills them in a timely manner.

Semantic layer

In the most trivial scenario, to fulfill a query, Cube would analyze the query, apply access control rules, transform it using the data model, generate a new SQL query, run this new query against the upstream data source, and deliver the result set back to the data consumer. It takes time, and querying the upstream data source is the most time-consuming..

You can expect the latency to be up to a few seconds when using cloud data warehouses such as Snowflake or BigQuery; it can grow indefinitely when using traditional databases such as Postgres. It depends on multiple factors, including but not limited to the following: the complexity of the query, the volume of data to be processed, the type of data storage (row-based vs. columnar), the database architecture (single-node vs. distributed), and implemented optimizations.

It means that, to ensure low query latency and excellent user experience at the data consumer level, running queries against upstream data sources should be avoided at all costs, when possible. Let’s consider one of these possibilities.

Often, multiple identical queries would hit Cube within a short time frame. It usually happens when multiple users open the same dashboard in a BI tool or the same page of a front-end application; it also happens when a single user refreshes a page or navigates back to it. In many cases, a result of a query that was run 5 seconds ago should still be relevant for the same query that has just arrived; moreover, query results often don't depend on the query time at all. For example, the revenue generated by the Oregon branch of your company last week or the volume of deposits withdrawn from Silicon Valley Bank on the week of March 5, 2023, don’t change as time passes.

Query deduplication

Cube deduplicates identical queries to its APIs.

Let’s unpack how it works:

When the first-of-a-kind query arrives, it will be fulfilled by querying the upstream data source. The result would also be put in the in-memory cache that Cube maintains.
Next, when a query identical to the first one arrives, it would be fulfilled from the in-memory cache—almost instantly. We’re talking about milliseconds vs. seconds in the case of upstream data sources.
As more and more identical queries hit Cube, almost counterintuitively, the overall performance would improve. Indeed, median and 95th percentile latencies for the query would naturally tend to the in-memory cache latency, not that of the upstream data source.

Please watch the following 10-minute demo to see how query deduplication works and how it affects query latencies:

Here’s the most insightful part of the demo above:

% k6 run scripts/1.js | grep iteration
     iterations.....................: 2950   588.92/s
     iteration_duration.............: min=2.46ms med=10.33ms p(95)=23.13ms max=1563.71ms

When thousands of queries are being run (specifically, 2950), the maximum latency is well over a second because one query has to hit the upstream data source; the median and p(95) latencies are two orders of magnitude lower, in the range of 10–23 milliseconds. Cube effectively employs its in-memory cache to deduplicate identical queries and provide the lowest query latency possible.

It’s worth noting that the in-memory cache is, naturally, a cache, so it’s subject to one of the two hardest things in computer science: cache invalidation. By default, Cube would invalidate the entries in the in-memory cache after a time interval that depends on the upstream data source: every 2 minutes for BigQuery, Athena, Snowflake, and Presto; every 10 seconds for all other databases. It makes sense to deduplicate queries against data warehouses more aggressively since running them costs money and consumes limited resource quotas.

API idempotence and scalability

Cube’s architecture allows scaling its API instances from one to many. Regardless of which API instance was hit by an incoming query, Cube will check if its unified in-memory cache has a relevant cached result.

Such architecture makes Cube’s APIs idempotent. You can distribute the load between API instances however you want (a simple round-robin strategy works excellently), and no sticky behavior is needed. You can also implement more sophisticated load-balancing strategies, e.g., when running massive multitenant deployments in production. It means you can freely and quickly scale API instances up and down, no cache warm-up is needed for new instances.

Production deployments in Cube Cloud are able to auto-scale the number of API instances from 2 to 10 (and above), so you can process dozens and hundreds of requests per second (RPS). Talk to us if you’d like to discuss your use case and performance requirements.

Query queue in Cube

When Cube needs to run a query against the upstream data source, it goes through the following steps:

First, the query is added to the query queue. Cube maintains a separate query queue for every upstream data source and your data model can surely work with multiple data sources at once.
Second, Cube will pick a query from the queue and execute it against the data source. The maximum number of simultaneously executed queries is defined by the CUBEJS_CONCURRENCY config option and depends on the upstream data source: 2 for Postgres, 4 for Redshift, 5 for Athena and Snowflake, 10 for BigQuery, etc. These sane defaults prevent the data source from overloading and data warehouse resource quotas from depleting.
Lastly, when the query execution is completed, Cube will put its results in the in-memory cache. Immediately thereafter, all API requests that were waiting for the results of this query will be fulfilled.

Please watch the following 3-minute demo to see how the query queue works and how it allows limiting query concurrency on the data source side:

Queuing queries instead of running them directly against the upstream data source brings substantial benefits. Query queue safeguards data sources from excess queries in case of burst load; it also allows servicing long-running interactive queries and canceling queries initiated by dropped API connections.

In-memory cache and query queue implementation

As evident from the previous sections, in-memory cache and query queue are essential parts of Cube that have been present from day one.

Initial architecture

Initially, they were implemented using Redis, an open-source in-memory data store, to keep the cache and queue entries. Cube API instances and refresh workers interacted with Redis to maintain the query queue and use the in-memory cache. The following architecture was used:

Architecture with Redis

It worked really well for many use cases. However, we collected enough evidence over time to decide that Cube needed to move on from Redis.

Reasons to move on from Redis

Please read the blog post with detailed reasoning by Pavel Tiunov, CTO at Cube.

Alternatively, refer to the following summary:

Operations and reliability considerations. Redis requires a complex setup for scalability and high availability. To maintain a Redis cluster, you must run at least three Redis and Redis Sentinel nodes. Moreover, cluster coordination consumes a lot of resources under high load. Use case considerations. Redis works well as a data store for the in-memory cache. However, it was never designed for large columnar payloads and low bandwidth consumption. Also, Redis is less than optimally suited as a data store for the query queue given the complexity of data flow.
Client stability and code complexity considerations. Since the first days of Cube, Redis has required substantial effort to provide stable support on the client side. We had to introduce the support for multiple client libraries (node-redis and ioredis) and the related CUBEJS_REDIS_USE_IOREDIS config option. Moreover, we had to maintain complex and error-prone code on the client side to manage the query queue (see examples: one, two, three).

One more deciding factor was that many Cube deployments—in addition to Redis—include Cube Store, a purpose-built distributed, fast, and reliable data store. So, we updated the implementation of the in-memory cache and query queue to use Cube Store.

Up-to-date architecture

Up-to-date Cube deployments use Cube Store as the all-in-one data store: for in-memory cache, query queue, and pre-aggregations. Redis is not needed.

Architecture with Cube Store

If you still have Redis as a part of Cube deployment, read on for the migration guide.

How to migrate to Cube Store

Cube supports using Cube Store for in-memory cache and query queue starting from version 0.31.60. To migrate, you will need to upgrade Cube to this version and check the CUBEJS_CACHE_AND_QUEUE_DRIVER config option.

Migration scenarios slightly differ depending on where you run Cube: in Cube Cloud, in a self-hosted environment, or locally on your computer.

Migration in Cube Cloud

In Cube Cloud, the migration from Redis is seamless. Every production cluster in Cube Cloud includes a fully managed Cube Store with high availability.

Proceed through the following steps:

Pick a deployment, navigate to Settings / General, and ensure the Cube version is at least 0.31.60. Choose either a Stable (preferred) or the Latest channel:
Starting from 0.32.0, no additional configuration is needed. If you’ve chosen a version between 0.31.60 and 0.32.0, go to Settings / Configuration and ensure you have the CUBEJS_CACHE_AND_QUEUE_DRIVER config option set to cubestore.

That is all. Now you run Cube with Cube Store for the in-memory cache and query queue.

Migration when self-hosting

In a self-hosted environment, the migration is really easy if you already have Cube Store. If you don’t run it yet and don’t want to complicate your setup, consider using Cube Cloud which provides a seamless migration experience.

Proceed through the following steps:

If you don’t run Cube Store yet, you should run at least a Cube Store router. Please refer to the Cube Store architecture overview and example Docker configuration.
Ensure the version is at least 0.31.60 for all Cube nodes: API instances, refresh workers, Cube Store routers, and Cube Store workers.
Starting from 0.32.0, no additional configuration is needed. If you’ve chosen a version between 0.31.60 and 0.32.0, ensure you have the CUBEJS_CACHE_AND_QUEUE_DRIVER config option set to cubestore for all API instances and refresh workers.
Remove all CUBEJS_REDIS_* config options from all API instances and refresh workers.
Remove Redis from your architecture.

That is all. Now you run Cube with Cube Store for the in-memory cache and query queue.

Migration on a local computer

If you run a single instance of Cube locally on your computer in the development mode, the migration is really easy because you already have Cube Store running in-process.

Proceed through the following steps:

Ensure the Cube version is at least 0.31.60.
Starting from 0.32.0, no additional configuration is needed. If you’ve chosen a version between 0.31.60 and 0.32.0, ensure you have the CUBEJS_CACHE_AND_QUEUE_DRIVER config option set to cubestore.

That is all. Now you run Cube with Cube Store for the in-memory cache and query queue.

What’s next?

Run Cube Store as part of your Cube deployments. Get started today in Cube Cloud to try a fully managed Cube experience.

Enjoy the benefits of using Cube Store for the in-memory cache and query queue, including low query latency for identical queries and safeguarding data sources from burst load.

Please don’t hesitate to talk to us if you have any questions. Also, join our Slack community of 8,000 data practitioners at slack.cube.dev to discuss your experience with Cube Store.

How you win by using Cube Store: Part 1

In-memory cache in Cube

Query deduplication

API idempotence and scalability

Query queue in Cube

In-memory cache and query queue implementation

Initial architecture

Reasons to move on from Redis

Up-to-date architecture

How to migrate to Cube Store

Migration in Cube Cloud

Migration when self-hosting

Migration on a local computer

What’s next?

Upgrade your data stack today

More on Deep Dives

Ensuring Data Governance with Cube Cloud: Implementing Fine-Grained Access Controls

How to Ensure a Smooth Transition from SSAS to Cube Cloud

Why Microsoft SQL Server Analysis Services No Longer Meets the Needs of Cloud-First Organizations