Nowadays, data pipelines resemble Lego toys, built with diversely colored data tools born from the foam of the unbundling of Airflow. Unlike nicely interlocking Lego bricks, data tools often need WD-40 and excessive person-hours to integrate properly. Unsurprisingly, it makes a good case for data orchestration tools that are immensely helpful for gluing data pipelines together.

At Cube, we strive to ensure that our semantic layer integrates and interoperates with other tools in the modern data stack. With an extensive set of supported data sources and data consumers, we’re always eager to hear from our users and customers, dig into their use cases, and help them with relevant integrations.

Cube et al

Today, we’re presenting an update that enables Cube’s semantic layer to work with data orchestration tools and to blend nicely into your data pipelines.

Orchestration API

Over the last three years, Cube has developed a set of APIs to deliver data from the semantic layer to all kinds of data consumers:

Cube APIs

Now, Cube also has the Orchestration API to work with data orchestration tools and let them push changes from upstream data sources to Cube. Orchestration API enables Cube to be part of Airflow pipelines, Dagster jobs, Prefect workflows, and basically any data plumbing. Previously, one would configure Cube to pull updated data from upstream data sources based on schedule or condition. Now, a data orchestration tool may take over and push updated data to Cube when needed.

Orchestration API has a single /v1/pre-aggregations/jobs endpoint that allows triggering pre-aggregation build jobs or retrieving statuses of such jobs. You can choose to rebuild all pre-aggregations (effectively invalidating the whole cache in Cube) or work on a more granular level and rebuild pre-aggregations with data from particular data sources or with members of specific cubes.

In the documentation, you will find guides and links to integration packages that simplify using the orchestration API with the three most popular data orchestration tools: Airflow, Dagster, and Prefect. These integration packages were originally contributed by members of the Cube community, for which we are very grateful.

Here’s what Alessandro Lollo, Data Engineering Manager at Cloud Academy and the contributor to the Prefect integration, thinks:

When building a data solution, you usually want to keep your data warehouse / data lake and your pre-aggregations in sync so that the changes to data are reflected on your frontend in a timely fashion.

If you’re using Prefect as your data orchestration tool, you can leverage the integration with Cube’s Orchestration API, which lets you trigger pre-aggregation refresh directly from your data pipelines.

Imagine the following scenario:

Your data lands on the data lake; it is then imported into your cloud DWH, where you do a data transformation to build or refresh some dimensional and fact tables.

Once the data is in the warehouse, you’d like to refresh pre-aggregations in Cube to make the most up-to-date data available to your end users.

With Prefect, you can organize all of this with a flow composed of tasks: a task that checks for new data available on the data lake, a task that triggers your ETL logic based on the result of the previous task, a task that triggers your Cube Orchestration APIs once the ETL task is successfully completed.

Alessandro LolloAlessandro LolloData Engineering Manager at Cloud Academy

We’re curious to learn how the Orchestration API would help you integrate Cube with data orchestration tools and blend the semantic layer into your data pipelines.

Orchestration API is a Cube Core feature. It’s available to all Cube Core and Cube Cloud users, and you can try it today. Join our Slack community at slack.cube.dev to share your thoughts and opinions about this update.