Today, we're happily announcing that Cube now works with Trino, the fast distributed SQL query engine for big data analytics, formerly known as PrestoSQL.
Meet the Cube team at Trino Summit on November 10, 2022. We'll be happy to chat and explore how Cube can augment your Trino experience.
What is Trino?
Trino is an open-source query engine designed to work with big data.
It's fast, scalable, SQL-compliant, and has almost universal connectivity to all kinds of data sources and business intelligence tools. Let's explore these in more detail:
- Scale. Trino is capable of querying exabyte-scale data lakes and massive data warehouses, as confirmed by Trino installations at companies such as LinkedIn, Netflix, and Shopify.
- Performance. Trino is a highly parallel and distributed query engine with two types of cluster nodes: coordinator nodes, responsible for query planning and execution, and worker nodes, responsible for fetching and processing data. You can fine-tune your Trino cluster to allow for efficient, low latency analytics.
- Connectivity. Trino can natively query data from a plethora of data sources without the need to extract and load the data into a data warehouse, even if you'd like to do a cross-database join. Also, Trino is an ANSI SQL-compliant query engine that works with BI tools such as Tableau, Power BI, and Superset as well as headless BI tools such as Cube.
Most common use cases for Trino include ad-hoc analytics at interactive speeds, massive multi-hour batch queries, and high volume apps that perform sub-second queries.
Trino was originally designed and developed at Facebook in 2013 and beared the name Presto at that time. In 2019, Presto development forked and two query engines emerged: PrestoDB maintained by Facebook and PrestoSQL maintained by the Presto Software Foundation and the original creators. Later in 2020, PrestoSQL was renamed to Trino.
What is Cube?
Cube is the headless BI platform for accessing data from modern data stores (including Trino), organizing it into consistent metrics definitions, and delivering them to downstream applications.
Cube is designed to take the central part in the data pipeline, delivering consistent data to all downstream teams and data consumers. It serves as a source of truth for the metrics definitions, access control rules, and caching settings. Regardless of how many data consumers you have (e.g., front-end applications with embedded analytics or BI tools), Cube will deliver consistent data to all of them with its REST API, GraphQL API, or SQL API.
How Cube works with Trino
In a typical data pipeline, Trino is placed upstream of Cube, providing unrestricted access to all data sources as well as data federation capabilities. You can use Cube to create an additional semantic layer or a last-mile caching layer on top of Trino.
More importantly, you can use the set of APIs that Cube provides, including REST API and GraphQL API, to deliver the data directly to custom-built front-end applications, retaining low latency and high concurrency.
Try Cube with Trino
Start by signing up for a free Cube Cloud account. You'll be prompted to select the cloud provider (AWS, GCP, or Azure) and a region for your deployment:
Then, pick Trino from the data source options:
Lastly, provide the credentials for the Trino connection:
In a few seconds, you'll get your Cube Cloud deployment up and running, ready to query your Trino installation and deliver data to downstream applications.