The Need for an Open Standard for the Semantic Layer

The promise of a semantic layer is to bring consistency to interactions between data sources and data tools. The core functionality of a semantic layer is to make it easy to build data models - defining dimensions, measures, and metrics - that deliver consistent metrics to all of an organization’s data experiences. A fully-featured semantic layer includes security and data access controls as well as pre-aggregations and query caching which guarantees that consumers of data have a secure, consistent, and super fast experience. Metrics and business logic should be the same from tool to tool. Access control and performance characteristics should be the same as well.

To deliver consistently on the promise of semantic layers, solution providers must support the widest range of data sources and data applications. A semantic layer may have an amazing set of functionality, but if it doesn’t integrate with the rest of the data stack it is unlikely to provide much value. Unfortunately for the developers of semantic layers, there is an ever-expanding set of technologies that customers expect to integrate with. One of my colleagues recently remarked “No one said it was going to be easy” and while I agree with him, there is something we can adopt from other areas of technology with competing implementations: standardization.

Development of an open standard for semantic layers could provide needed consistency on the subject of how business intelligence tools, data science notebooks, embedded analytics systems, catalogs and other consumption methods should interact with the semantic layer. New data applications are in development right now that would benefit from more defined integration patterns to inform their design if they want to treat the semantic layer as a first class source. Standards reduce the burden of software vendors to add semantic layer support and also benefit consumers by reducing switching cost and vendor lock-in. For this reason, we also believe that is important that the standard be an open standard that is not beholden to any one vendor.

What should a standard cover? We believe that there are three major areas that an open semantic layer standard should cover.

Specification of Objects
Querying Protocols
Metadata Exchange Protocols

Specification of Objects: Metrics-centric, Dataset-centric

While there are many differences in internal semantic layer object representations - code-based vs UI-based, supported data modeling languages and specific object definition syntax, we are most interested in the outward representation of the semantic layer. With this in mind, there are currently two main approaches on the market for designing semantic layers: metrics-centric and datasets-centric.

Metrics-centric. In the metrics-centric approach, metrics are first-class objects. The benefit is that it’s closer to how people think and talk about data: we look at metrics, analyze them, and set them as KPIs. The biggest challenge with this approach is that most data consumption tools, including BIs, don’t have a notion of metrics. These BI tools operate on tables or cubes with measures and dimensions.

Dataset-centric. The dataset-centric semantic layer exposes tables containing measures (metrics) and dimensions as first-class objects. While it might look like a more complicated framework, it provides the benefit of better flexibility and compatibility with the existing suite of tools.

At Cube, we believe in a dataset-centric approach. It enables a wider interoperability with the existing data ecosystem, because every data tool is tabular data centric. It also gives flexibility to design outward representation of the semantic layer either as entity-first or as metrics-first, effectively giving data teams the best of the two worlds.

In the Entity-first approach, entities are denormalized tables, containing many measures and dimensions needed to fully describe the entity. In Cube, we use views to build outward representation of the semantic layer. The example below illustrates view design to support entity-first approach.

views:
  - name: orders_view

    cubes:
      - join_path: orders
        includes:
          - status
          - created_at

          - completed_count
          - count
          - total_amount
          - average_order_value

      - join_path: orders.users
        prefix: true
        includes: 
          - city
          - age
          - gender

In the Entity-first approach, metrics tables are denormalized tables, containing one measure (metric) and all the relevant dimensions. In Cube, that can be achieved by creating a view for the specific metric, like average_order_value in the example below.

views:
  - name: average_order_value

    cubes:
      - join_path: orders
        includes:
          - average_order_value
          - status
          - created_at

      - join_path: orders.users
        prefix: true
        includes: 
          - city
          - age
          - gender

You can learn more about designing metrics with Cube in the documentation.

Querying Protocols: SQL and GraphQL

Semantic layer should provide a full-fledged querying interface to fulfill the promise of delivering metrics to all data consumption tools. Currently, most of the data tools use SQL protocol to query data from databases and data warehouses. Considering that, the semantic layer solution should expose a SQL endpoint so all existing tools can connect to it and query metrics. Additionally, it is instrumental that the semantic layer provides an HTTP-based interface for customer apps development and embedded analytics. GraphQL is gaining popularity recently as a open-source data query and manipulation language for APIs, so it makes sense for the semantic layer to support it as an HTTP-based API.

The challenge to using SQL as a querying interface for the semantic layer is the lack of a notion of metric in the SQL language. SQL is designed to work with tabular, not multidimensional data. That would require to extend SQL with a MEASURE type and corresponding aggregate function to query measures.

SELECT
  status,
  MEASURE(average_order_value)
FROM orders_view
GROUP BY 1

Supporting querying metrics with SQL will enable interoperability with a wide range of BIs and visualization tools that are already using SQL to query data from data warehouses.

Metadata Exchange Between the Semantic Layer and the Front End

While there are many benefits to having the semantic layer decoupled from the various BI tools and data apps, it does bring on challenges as well. One of the biggest challenges of implementing a standalone semantic layer is the connection with the front-end UI and user experience. When the semantic layer is integrated as a component of a specific BI tool, it influences and seamlessly works with the UI and end-user experience. For example, when you define an Explore object with LookML (Looker’s integrated semantic layer) it appears in the UI for end users so they can navigate and explore them. It is a challenge to provide the same level of deep integration for the decoupled semantic layer as every BI tools front-end user interface has different names, components, and functionality.

For the industry to truly reach the goal of delivering consistent and accurate data into all of an organization’s BI tools and data apps, then this integration needs to be improved. If Semantic layer and BI vendors work together to build a more standard connection, this will allow better and deeper integration between the front-end and the semantic layer – thus giving the end user the best possible user experience across all BI tools at a company.

Join the Conversation

At Cube, we believe semantic layers need exceptional interoperability to give users choice. We endeavor to support the widest ecosystem of data sources and data consumers that we can for our customers. But the best way to do that is to work more broadly with the data community.

To that end, we are beginning discussions on the design of an open semantic layer standard and are eager to include other semantic layer vendors, business intelligence and dashboard tool vendors and any other data product tools vendors in the conversation. By doing this work collectively and transparently, we can build a better future for our users and the data community.

The Need for an Open Standard for the Semantic Layer

Specification of Objects: Metrics-centric, Dataset-centric

Querying Protocols: SQL and GraphQL

Metadata Exchange Between the Semantic Layer and the Front End

Join the Conversation

Upgrade your data stack today

More on Data Engineering

Cube Achieves Snowflake Ready Technology Validation

Why Cube Cloud Makes Your Query Costs Predictable (and Lower)

Ensuring Data Governance with Cube Cloud: Implementing Fine-Grained Access Controls