The Cube x Jobber user story.
Jobber, the leading provider of business management software, helps small home service businesses stay organized, connect with customers, grow revenue, and better compete against large corporations. The company's technology supports more than 50 industries, including HVAC, plumbing, lawn care, cleaning, and more. Since launching in 2011, businesses using Jobber have serviced over 15 million households in more than 47 countries.
One of the important tools for Jobber's customers is dashboards that can provide a snapshot of what's happening in their businesses to schedule their day, optimize routing, keep track of invoices, accept payments, and more, even when people are not in traditional brick and mortar offices. Today, more than 100,000 service professionals use Jobber's platform and rely on it to stay on top of—and grow—their businesses.
In the past, the Jobber team was retrieving information by doing direct queries against a production database from their Ruby on Rails application through Active Record ORM. However, as their business scaled and with close to a decade of data, the dashboard performance started to get slower. The team worked on things like caching, query optimization, and database tuning to address performance issues but realized that more needed to be done.
When Jobber first discovered Cube.js, they started studying the documentation and examples and liked the flexibility. The Jobber team also liked the two-level caching system in Cube.js as they previously had to implement a lot of that with their own code. Now they simply use Cube.js almost as a black box that they communicate with and can remove the optimization code they had to maintain in the past. Jobber's first use case with Cube.js was using daily rollup pre-aggregations with the two-level caching.
Another important and timely feature was the external pre-aggregations capability of Cube.js. With their source database being a read-only database replica, building the pre-aggregations inline within the source database wasn't an option.
As you can see in the architecture diagram below, Jobber has a single PostgreSQL database backend, and they recently started using React for the frontend. Jobber has a read-only replica of the database to populate the rollup pre-aggregations database and wraps the Cube.js REST API with their own API for handling things like authentication (Cube.js is a trusted sub-system in this configuration). Jobber also leverages multitenancy with a query transformer to enforce at runtime that all data queries filter to the authenticated tenant. Other security features include full certificate verification for database connections.
Splitting Cube.js like this—an API server which handles the queries and a scheduled refresh server which updates the pre-aggregation rollups periodically—required leveraging some special configuration options for the orchestrator:
CubejsServerbut configured so that it will only respond with data from the rollup pre-aggregations database even if queries attempt to access data outside of the available rollups (
rollupOnlyMode: true), and additionally, should skip the consistency checks and background refreshing of the rollups (
ServerCore(no need to run a web server here!) to execute
runScheduledRefreshrepeatedly (with a five-second cooldown) until the rollups have completed refreshing. Note that leveraging a read replica requires setting up
readOnly: trueon the
Jobber's architecture could be a viable option in their environment for anyone who is querying production databases (vs. querying some big data data sources).
There were several team members at Jobber that were involved in Cube.js development and included two developers, one designer, and a product manager. The team initially used Cube.js out-of-the-box and utilized the Cube.js Developer Playground extensively, with a few additional refinements, such as converting the auto-generated project to TypeScript. Jobber uses Docker for their development environment, and they surface the Cube.js microservice in a container.
The Jobber team also developed tests for things like additional schema validations on their CI build. One of the things the team ran into early on was the table name truncation that tends to happen with PostgreSQL. So one of their tests actually asks Cube.js what table names it's going to generate, and if the table name is longer than what PostgreSQL allows, the test will fail early in the CI process.
The Jobber team also did two rounds of performance testing. First, the team wanted to understand where the bottlenecks for Cube.js were (e.g., CPU vs. memory) when saturated with incoming requests and observed how it responded while ramping up simulated concurrent requests.
In the load test graph below, you can see how their test showed signs of achieving maximum throughput (for the given server configuration), with response times (blue) starting to get shaky as the number of concurrent requests (yellow) ramps up. This particular test was getting 100% Redis cache hits, and the bottleneck was the CPU.
Next, they used simulated traffic in the frontend to mimic actual customers' traffic (and data requests) more accurately and began experimenting with different scaling options (vertical and horizontal). This allowed them to observe more complex performance characteristics because both caching tiers were getting exercised.
Once on beefier servers, the CPU was no longer the bottleneck, and the next bottleneck encountered was related to memory! This manifested in a runaway memory ramp (at a certain amount of traffic) where Node.js would consume all the available memory and eventually pause the entire runtime for several seconds to run a "mark sweep" style garbage collection.
After some research into how Node.js' memory configuration works—it's complicated, often described incorrectly online, and changed between versions 10 and 12—the Jobber team optimized the usage of the server resources by running concurrent copies of the Node.js process using a library called throng.
The graph below shows the service's response time shape over three days. You actually see that each day, as the traffic starts increasing up to the peak volume (represented as the vertical white lines), the Cube.js response time actually goes down (almost counterintuitively). This is because the first hit of the day is going against the rollup pre-aggregations database (level 2), but subsequent looks at the dashboard hit the faster query result cache in Redis (level 1).
Jobber's team was also very active contributors to Cube.js and in many ways co-developers as they were evaluating and deploying Cube.js. This GitHub query shows contributions from two team members at Jobber. They actually only show contributions via pull requests and don't include Slack or other channels.
This type of active engagement from users clearly demonstrates the value of open source. The contributions help accelerate development and, more importantly, provide valuable insight to the Cube Dev team on actual use cases in production environments.
If you want to see Cube.js in action today at Jobber, you can visit the Jobber's Help Center page. In the screenshot below, Cube.js is used for the four line-series charts.
After the successful deployment, the Jobber team is continuing to explore other areas where Cube.js can reduce engineering costs while delivering great features for their customers.
Explore Cube.js examples & tutorials and get started today. To jumpstart your efforts, please join us on Discourse & Slack, follow us on Twitter, and get engaged with the growing Cube.js community.