Performance Insights

The Performance page in Cube Cloud displays charts that help analyze the performance of your deployment and fine-tune its configuration. It's recommended to review Performance Insights when the workload changes or if you face any performance-related issues with your deployment.

Performance Insights are available in Cube Cloud on Premium and above (opens in a new tab) product tiers. You can also choose a Query History tier.

Charts

Charts provide insights into different aspects of your deployment.

API instances

The API instances chart shows the number of API instances that served queries to the deployment.

You can use this chart to fine-tune the auto-scaling configuration of API instances, e.g., increase the minimum and maximum number of API instances.

For example, the following chart shows a deployment with sane auto-scaling limits that don't need adjusting. It looks like the deployment needs to sustain just a few infrequent load bursts per day and auto-scaling to 3 API instances does the job just fine:

The next chart shows a deployment with auto-scaling limits that definitely need an adjustment. It looks like the load is so high that most of the time this deployment has to use at least 4-6 API instances. So, it would be wise to increase the minimum auto-scaling limit to 6 API instances:

When in doubt, consider using a higher minimum auto-scaling limit: when an additional API instance starts, it needs some time to compile the data model before it would be able to serve the requests. So, over-provisioning API instances with a higher minimum auto-scaling limit would allow to decrease the number of requests that had to wait for the data model compilation.

Also, you can use this chart to fine-tune the auto-suspension configuration, e.g., by turning auto-suspension off or increasing the auto-suspension threshold. For example, the following chart shows a Development Instance deployment that is only accessed a few times a day and automatically suspends after a short period of inactivity:

The next chart shows a misconfigured Production Cluster deployment that serves the requests throughout the whole day but was configured to auto-suspend with a tiny threshold:

Cache type

The Requests by cache type chart shows the number of API requests that were fulfilled by specific cache types, e.g., pre-aggregations, in-memory cache, no cache, etc. For example, the following chart shows a deployment that fulfills about 50% of requests by using pre-aggregations:

The Avg. response time by cache type shows the difference in the response time for requests that hit pre-aggregations, in-memory cache, or no cache (i.e., the upstream data source). The next chart shows that pre-aggregations usually provide sub-second response times while queries to the data source take much longer:

You can use these charts to see if you'd like to have more queries that hit the cache and have lower response time. In that case, consider adding more pre-aggregations in Cube Store or fine-tune the existing ones, e.g., by using indexes to speed up pre-aggregations with suboptimal query plans.

Data model compilation

The Requests by data model compilation chart shows the number of API requests that had or had not to wait for the data model compilation. For example, the following chart shows a deployment that only has a tiny fraction of requests that require the data model to be compiled:

The Wait time for data model compilation chart shows the total time requests had to wait for the data model compilation. The next chart shows that at certain points of time requests had to wait dozens of seconds while the data model was being compiled:

You can use these charts to fine-tune the auto-suspension configuration (e.g., turn it off or increase the threshold so that API instances suspend less frequently), identify multitenancy misconfiguration (e.g., suboptimal bucketing via context_to_app_id), or consider using a multi-cluster deployment to distribute requests to different tenants over a number of Production Cluster deployments.

Cube Store

The Saturation for queries by Cube Store workers chart shows if Cube Store workers are overloaded with serving queries. High saturation for queries prevents Cube Store workers from fulfilling requests and results in wait time displayed at the Wait time for queries by Cube Store workers chart.

For example, the following chart shows a deployment that uses 4 Cube Store workers and almost never lets them come to saturation, resulting in no wait time for queries:

Similarly, the Saturation for jobs by Cube Store workers and Wait time for jobs by Cube Store workers charts show if Cube Store Workers are overloaded with serving jobs, i.e., building pre-aggregations or performing internal tasks such as data compaction.

For example, the following chart shows a misconfigured deployment that uses 8 Cube Store workers and keeps them at full saturation during prolonged intervals, resulting in huge wait time and, in case of jobs, delayed refresh of pre-aggregations:

The next chart shows that oversaturated Cube Store workers might yield hours of wait time for queries and jobs:

You can use these charts to fine-tune the number of Cube Store workers used by your deployment, e.g., increase it until you see that there's no saturation and no wait time for queries and jobs.

Pre-aggregations Monitoring Integrations