Modern data teams face growing complexity: multiple data sources, various BI tools, and ever-increasing self-service analytics. Organizations must maintain consistency across their metrics while making data accessible to business users. This is where a semantic layer shines: It provides a single source of truth for business metrics while abstracting away the underlying complexity.
In Part 1, we explored the semantic layer through the lens of MVC, discovering how it acts as both a controller for data access and a model for business metrics. While this architectural pattern helps us understand its role, the real value of a semantic layer comes from its everyday use.
This article examines how semantic layers fit into modern data architectures and their critical benefits, from API-driven access to enhanced governance, and why they've become essential in today's data stack.
Understanding the Modern Semantic Layer
A semantic layer is a tool that makes complex data easier to understand and use. It translates technical data into simple terms and concepts that everyone in a business can understand.
My definition goes something along the lines of:
A semantic layer acts as an intermediary, translating complex data into understandable user business concepts. It bridges the gap between raw data in databases (such as sales data with various attributes) and actionable insights (such as revenue per store or popular brands). This layer helps business users access and interpret data using familiar terms without needing deep technical knowledge.
Evolution from Traditional to Modern
A semantic layer is nothing entirely new. However, there are modern versions of semantic layers. Let's call them the modern Semantic Layer, whereas the old semantic layer, invented in 1991 and first used in SAP BO, is the more traditional and almost every BI tool out there.
The novelty is that a modern semantic layer lives outside the BI tool. It is open-source and usually comes with a data modeling language to define the metrics declaratively such as LookML.
💡 Restaurant À la carte Analogy Imagine you sit in a restaurant and have the menu in front of you—you choose from a catalog of delicious menus. You order your favorite, and it's delivered. You don't know how it's made, only what has been made, and it's good.
Like a semantic layer, predefined metrics can be chosen in your BI tool of choice, and correct and consistent data is delivered. The semantic layer is the layer that maps metrics to its physical tables, queries them, and returns the menu. You don't know how, but we are happy as long as it's correct.
Core Components: Different Types of a Semantic Layer
There are different levels of semantic layers. You can have a thin layer within your BI dashboard tool or a thicker layer encompassing transformation logic with modeling languages such as LookML, caching, APIs, etc. Both abstract the how for you unless you are responsible for building the metrics.
The different types of layers:
- BI and Analytics Semantic Layer: Defines business concepts, relationships, metrics, and calculations. Think of the thin semantic layer within a BI tool that is opened into a separate, open tool, previously also appropriately called headless BI.
- Data Warehouse Semantic Layer: This layer centralizes data, providing a single source of truth for various departments and use cases.
- Data Catalog and Asset Layer: An overview and list of all your data tables, assets, and artifacts available across your landscape. Single index for finding the data within your company—like a Google search for your internal metadata.
- Data Orchestration Layer: Interface for orchestration tools to update BI metrics and data assets declaratively. Extracting metadata such as naming and organizing data models.
- API Semantic Layer: Standardizes data access and interpretation across different APIs, ensuring consistent meaning and format.
- Data Virtualization and Caching Layer: If you use Trino or Dremio, you are essentially creating a semantic layer across your data sources and speeding up query times across diverse data sources.
- AI and Natural Langue Layer: A natural language query interface that allows anyone, especially non-technical people, to query the data, generating SQL or YAML automatically and speeding up the whole metrics generation by order of magnitude.
A thick semantic layer, such as Cube, provides all these features within a Universal Semantic Layer. This layer connects all operational data to various analysis tools and is a homogenous interface for most data requirements.
The Platinum Layer: Integration with Data Stack
You may ask, how does a semantic layer fit into the current data landscape, and do we need one ourselves?
Think of it as a platinum layer that's part of the Medallion Architecture, sitting on top of your gold layer with a superset of functions. In the classical architecture of a data warehouse, a layer on top of marts, similar to SSAS cubes in the old days, where we had a fast unified layer for all sub-seconds required dashboard or Excel integrations.
Remember: The number of layers in your architecture depends on your requirement and business needs. This is a default architecture where the Semantic Layer acts as an interface to the data.
The semantic layer contains this information and will sync it into your BI tool, data applications, or the ML model you're building. Usually, in larger companies with a high demand for data and people who want to work with it, you have several downstream apps and BI tools that require the same KPIs. Wouldn't it be great to define and maintain it once, ensuring it reaches all downstream tools?
This is what a modern semantic layer tries to achieve. It's a semantic SQL layer, but with tools like Cube, instead of putting the logic into spaghetti SQL that gets duplicated in each dashboard repeatedly, you can define all metrics in a declarative fashion (usually YAML), empowering reusability. Everyone can verify and update the defintions. It also allows you to version, automate, and apply validated software practices to your metrics.
This platinum layer allows your different data application access to aggregate the same business metrics consistently. For example, your profit metric might be a massive SQL query. On one side, it might take ages to process, and on the other hand, you want to avoid managing these metrics in your various BI tools or recreate them as part of your ML model, your customer-facing data app, or maybe in Excel. With a semantic layer, you can have clear interfaces where you define them once, and each use-case can pull the aggregated metrics via SQL, REST, or GraphQL API.
It also abstracts away the complexity of the data storage layer into a single layer. You may have data in a Postgres database, others in an aggregated OLAP cube, and some more on an S3 bucket; as part of the data app, you don't need to care where the data comes from. A nice additional feature is that Cube, for example, comes with a sophisticated out-of-the-box cache layer called Cube Store.
💡 Visual Modeler
The new Visual Modeler, recently added to Cube, is another advanced tool for modeling data and bridging the gap between business and code.
Example: Building an Analytics API
In my previous company, we developed an Analytics API similar to Cube's before semantic layers were a thing. We switched from SSAS to Druid, a modern OLAP solution, to handle diverse business metrics queries from Tableau, Notebooks, and our Web App.
Druid's inability to store metrics as part of the cube, similar to what SSAS and others are doing, led us to reimplement our complex business logic and measures in each tool. Hundreds of metrics would have needed to be recreated within Tableau for reporting, within the web app for showing the current state, and within the Jupyter notebooks for the data scientists to play with the data.
Instead, we implemented our semantic layer and called it the Analytics API. It did a fraction of the great things mentioned above and that current semantic layers are capable of. However, it allowed us to define and automate metrics as code, offering a universal, open-standard approach to handling business metrics.
Trade-offs and Considerations
As with any technology or architectural approach, there are also downsides to it. Things to consider are:
- Integration with existing systems: One more system to integrate into the data stack
- Adoption and change management: People need to adopt a model where measures are centrally managed instead of inside the BI tool.
- Maintaining flexibility while ensuring consistency: Change management, if you change measures, when to sync BI tools with changes, how to communicate these upstream.
Key Benefits of a Semantic Layer
So, what are the benefits of a semantic layer? Isn't adding more layers also more complex?
Yes, it is. But what if you redefine and recalculate each metric in your data products, whether it's a dashboard, a mobile app, or an embedding into a website? What if a metric changes you, and you need to maintain it across these different products, tools, and architecture?
What if you had one central place to change and maintain, one place where people implement the complex calculation of a particular sales metric, a supply chain cost estimate, or just some website analytics?
If you have only one BI tool for all your metrics, you don't benefit. But if you have several across your organization, a semantic layer can simplify things. Usually, I'd recommend storing all your complex business transformations in your warehouse and ETL.
However, this is usually impossible for modern OLAP cubes (Druid, ClickHouse), where metrics are defined during query time. You can't store the sales metrics and pre-calculate them for all dimensions. These cubes are usually so fast that it's not a performance issue but a handling problem that you need to define all queries during ad-hoc requests.
With a dimensional model, you have ten, twenty, or more dimensions that can all be combined; this can become overwhelming. The total number of shipped items varies greatly across dimensions such as location, time, and product, so you need a strong query engine. One reason an OLAP system exists is to deliver many unique queries in subseconds.
In a sense, a semantic layer is an OLAP cube, but with advanced that, the metric can be pre-defined and consistently maintained at a central place.
💡 A Simple Example of the Cardinality of Dimensions
With typical dimensional modeling, you model your data in facts and dimensions. The dimensions are time, location, and product category. Each dimension has its columns and distinct cardinality. Imagine three dimensions in a model: time with 12 values (months), location with 10 values (various stores), and product category with five values.
The total combinatorial possibility with these dimensions is their cardinalities' product. It would calculate to
12 (time) × 10 (location) × 5 (product category) = 600
possible combinations and, therefore, rows. The complexity escalates with the addition of multiple columns and numerous dimensions. To illustrate, we initially had12 × 10 = 120
combinations. With the inclusion of the product category, we witness a surge to480
additional combinations.
Let's look at some of these capabilities in detail.
Unified Data Access Through APIs
Probably the most significant benefit is the abstraction of complex enterprise logic through a singular API, supported by different ones but all having the same security and metrics definition.
Data architectures can quickly become heterogeneous and intertwined. I'm not saying it's the norm, but having multiple databases and systems, even different ones for analytics is quite common. For example, you may have a fast but less frequently updated cube for dashboards and one updated often but less fast for querying, as it's on distributed files on S3.
The below image illustrates well what that would mean with something like Cube. Making the data available through interfaces such as GraphQL, REST, SQL, Orchestration, AI API, and even a native MDX API accessing it through Excel adds a massive plus to overall governance and data management.
Enterprise-Ready Excel Integration (MDX)
As famously, Excel Never Dies, let's explore the integration of Excel. A new user's first question is usually, "Can I export this dashboard to Excel?" With this integration, he cannot only export but also connect directly with the data connection wizard in Excel, choose an OLAP cube/MDX data source, and get daily live updates, including all company-wide business measures and ultra-fast response times.
Integration with commonly used LDAP authentication is also supported and ensures user access is consistent with organizational policies. This would also work for Power BI and other MDX API-supported methods.
The integration of Excel into the semantic layer allows us to integrate all the users' needs and lets people leverage familiar tools, such as pivot tables, for sophisticated data analysis. This unlocks traditional analytical capabilities within the comfort of Excel, including dimension hierarchies and measure groups, all while maintaining consistent security and data access controls across Cube's various APIs.
📝 How Does the Connection Work?
With Cube Cloud's MDX API, Excel can directly query large datasets through the universal semantic layer. The MDX queries are translated into optimized SQL, processed by Cube, and the results are returned in a format Excel can interpret.
This integration combines the benefits of a semantic layer with Excel's ubiquitous analysis capabilities, making advanced data analysis accessible to all businesses and users.
Modern API Support: GraphQL, REST, and AI
Besides Excel, there are other APIs that help fast data analytics with governance metrics management. For example, Cube has these six:
- SQL API - Delivers data over the Postgres-compatible protocol to BI tools.
- REST API - Delivers data over the HTTP protocol to embedded analytics applications.
- GraphQL API - Delivers data over the HTTP protocol to GraphQL-enabled data applications.
- MDX API - Provides a native interface for Microsoft Excel connections.
- Orchestration API: Enables data orchestration tools to push changes from upstream data sources to Cube directly.
- AI API - Provides an interface with large language models (LLMs), enabling text-to-semantic layer queries. Recently, Cube Copilot was also announced, which assists in designing and editing data models faster.
To ease discovery, a semantic data catalog allows a unified search for existing data assets and even returns data lineage and its relationship to entities and metrics. DuckDB and MotherDuck can also be integrated.
Visual Designer and Data Model Engine
When I started as a business intelligence engineer, data modeling and visual design were at the core of my job. However, they lost some appeal, and the data stack was modeled accordingly. However, with the recent Visual Designer, bringing back that focus inside your semantic layer helps less technical people, but experts in their domain, take part in shaping the data models. This allows for a much better business outcome without writing code.
The visual designer offers a canvas-style interface where you model the data logically.
The neat implementation automatically converts each change into code, preserving the benefits of both the central metrics repository and the declarative approach.
Enhanced Performance Through Caching
In addition to the visual designer, the new data modeling engine, Tesseract, which is built on top of Rust, helps us with performance. Tesseract allows for fast, performant multi-stage calculations and many more to come.
Together with the other rust-based cache store, this brings us another massive benefit for any BI dashboard: sub-second response times across all your data sources. Instead of waiting for the aggregation to occur each time, the cache improves common queries and makes dashboard visualization more enjoyable. This is not as trivial to achieve with plain modeling techniques using persistent tables in Postgres or with dbt. Except if you materialize every table used for a BI dashboard, which leads to high maintenance and high cost for your data pipeline as every change is a code change and re-run of your pipelines. Having that feature out of the box can enable many use cases without data movement.
Simplified Security and Data Governance
API-based data access is beneficial because it provides enhanced security through robust authentication mechanisms and fine-grained access controls. It also abstracts complex database structures into simplified interfaces while ensuring good performance through optimized queries and caching consistent across applications.
It improved data governance through controlled data access and a single repository for defining and versioning mission-critical business metrics. With this integrated data access layer served through the different APIs, we have one place to govern and restrict access. It's the best place to give access to your data, compared to doing it in each BI tool or directly on GRANT level permission on a database; it's an elegant way to restrict it in one place, implementing it once, too; there is no need to reimplement in your data app or your notebook.
What's Next: AI and Natural Language
In Part 1, we examined semantic layers through the lens of MVC patterns. In this article, we explore their enterprise capabilities and integrations. We've seen how semantic layers create a unified source of truth for metrics while simplifying data access through modern APIs. The combination of visual modeling, intelligent caching, and centralized governance has made semantic layers an essential component of modern data architectures.
Looking ahead to Part 3, we'll explore how artificial intelligence is transforming semantic layers. From natural language querying to AI-assisted modeling with tools like Cube Copilot, we'll examine how these advances make data more accessible while reducing the complexity of building and maintaining semantic layers. In the meantime, feel free to give Cube a try. You'll find all the information in the quick-start guide.