Data-driven decision-making is crucial for business success, but organizations face a growing challenge of complexity and data governance. These challenges make it difficult to access data in a unified way.
In Part 1, we explored the semantic layer through the lens of MVC, and in Part 2, we outlined its benefits. In this final piece of the series, we examine the integration of a semantic layer with artificial intelligence and why it might be the best place to start with GenAI. Let's explore the integration of semantic layers with existing data infrastructure and AI.
The Power of Combining Semantic Layers with AI
The semantic layer is the ideal foundation for natural language queries, providing a structured bridge between data sources and business requirements with AI capabilities. Its declarative definition of metrics, comprehensive data models, and metadata management enable the large language model to understand and interpret the business context accurately. This promises higher-quality results from natural queries, human interaction, and data insight.
This is the interface and how it looks in Cube: Natural language interface, discovering your data | See more on Announcing Cube's AI Assistant: Empowering Every User with Data Intelligence - Cube Blog
Generative AI makes natural language queries accessible for discovering complex business insights through simple questions. Cube is pioneering natural language querying, especially with the acquisition and integration of DelphiLab, among other things. It makes data accessible to more non-technical users and enhances data exploration, benefiting from faster insights without the technical bottlenecks.
How Semantic Layers Enable Natural Language Querying
How does it work? If we start with the top-down approach, we have an instruction - a prompt that gets augmented. It uses existing database schemas and data, queries them with SQL to get the answer, and finally transpiles it back to text. This would look something like this:
An example from instruction to answer | More on Natural language AI queries drive customer analytics
Many different interfaces are involved, and using natural language isn't as simple as it looks. So, the question is, how do they work?
Architecture and Components
A simplified architecture that helps Cube transform text-to-SQL via Cubes API based on retrieval augmented generation (RAG). Cube will make sure this process is deterministic. It uses metadata from the semantic layer, and the data catalog looks like this with the data warehouse holding our data and Cube our semantics about the data, allowing Cube to interact with text with the help of a generative large language model such as the models from Open AI:
A simple architecture of how Cube Universal Semantic Layer integrated Open AI | More on Cube's AI Assistant: Empowering Every User with Data Intelligence
There are more components involved. Below are the critical components for generative and natural interaction with a semantic layer to get a better understanding:
- Semantic layer (like Cube) - providing declarative metric definitions, data modeling, and metadata
- Large language models (like OpenAI) and natural language response generators are used for natural language processing, generation, and interpreting results into human-readable text
- RAG (Retrieval Augmented Generation) - to extend internal business knowledge and context
- A Semantic Catalog lets users search, understand, and reuse trusted data products from Cube and external sources such as cloud data warehouses and business intelligence tools.
- API and SQL Transpiler: The Cube API compiles user prompts deterministically to SQL. With the intermediate API, Cube adds constraints for the request format and reading from semantic layer artifacts to enable simpler output generation for the AI model.
- Query execution and planner engine - running the SQL and returning results
- Cache layer - for query optimization and performance
- Access control & governance - managing security and permissions
- Visualization/dashboard tools - for displaying results graphically
A lot is going on until we have our intuitive human interface, and many topics are constantly improving and changing. Luckily, Cube has integrated all of this. If you'd like to try it, try it and get started. There is an OSS and a cloud version; some of these features are available only.
📝 Evolution of the Semantic Layer
To get an even better overview, here is a short recap of the evolution we've gone through until we arrived where we are today (long story here):
- SAP BusinessObjects Universe and BI semantic layer (1991)
- SSAS and MDX, with their logical modeling layer with MDX, define business metrics and dimensions in a structured way (1997)
- Kimball discussed the concept of a semantic layer in #158 Making Sense of the Semantic Layer , around 2013
- Maturing BI tools with an integrated semantic layer, such as Tableau, TARGIT, PowerBI, Apache Superset, etc., have their own metrics layer definition around 2016
- Looker and LookML popularized as the first semantic layer around 2019
- Declarative Semantics and metrics with tools such as Cube, MetriQL, MetricFlow, Minerva, and dbt arose with the explosion of data tools around 2022
- The evolution of data querying and natural language queries with the integration of LLMs and catalogs around 2023/24
What are AI and Semantic Layer Use Cases?
After we understand how it works and what is needed under the hood, let's explore typical use cases with semantic layer and AI. The best-suited use cases for a semantic layer involve centralized metrics definition, unified data modeling, query optimization with a cache layer, and security and access control governance.
Typical Use Cases
The chosen use cases below illustrate the power and suitability of a semantic layer, among many others:
💡 AI-Powered Semantic Layer Use Cases Key scenarios where semantic layers enhance AI and natural language capabilities
🗣️ Natural Language Data Exploration
- Transform business questions into accurate SQL queries through LLMs
- Context-aware query generation using metadata and business definitions
- Extended context through RAGs with internal systems like CRM, ERP
- Real-time data exploration through conversational interfaces
- Automated insight discovery and anomaly detection
- Enhanced data accessibility for non-technical users
🎯 Rapidly Creating Business Dashboards through GenAI
- Quick response to new data sources or management report requests
- Requirements engineering & business needs validation
- Dimension/fact definition with the appropriate granularity
- Dashboard-oriented metric building
- Accelerated transition from business request to functional dashboard
🎯 Unified Data Governance
- Ideal for both fresh starts and consolidating fragmented data landscapes
- Seamlessly connects siloed departmental data sources into a cohesive structure
- Easy integration of heterogeneous data sources (internal & external)
- Natural language and intelligence metric management
- Automated discovery of related metrics and dimensions with API abstraction
- Unified access with in-build features: Query pushdown, caching optimization
- Organization-wide DRY
🤖 AI Chatbot Integration
- Embedded analytics with AI chatbot solutions
- Customizable dashboard implementations
- Natural language query capabilities
- Real-world case studies:
Find more on Customer Stories.
Copilot and AI Assistant
Unfortunately, every organization is largely complex. Therefore, AI has difficulty producing the correct measures and data models. With generative AI we can further automate this complex business process and make it more accessible for non-technical users, especially those who know the business inside and out. These domain experts have worked in the industry for a long time.
A semantic layer has an edge here with the combination of declarative metrics definitions and the know-how of business expert, we have chance to enhance the model with the critical business context from internal systems such as ERP, CRM, or any others; and iterate quickly with more concise and correct output.
With Cube, we can generate the first iterations of the data model and measures, joins, etc. with Cube Copilot for example. It can suggest metrics in real-time based on the current context, empowering data practitioners working with business requirements and SQL for greater productivity.
Cube Copilot in action, where you define your needed measure as a comment
You can generate and ask questions with the AI Assistant using text. Generated code and exploration can be integrated into the Playground, the Cube's built-in web-based tool for validating and executing queries, including previews as charts and dashboards.
AI-Powered Semantic Layer and Its Future
Looking ahead to what might be possible, we see that AI is here to stay. So, the question is, how can we use this power in our everyday lives? The semantic layer is one of the first areas I would start experimenting with; as we have seen, the intersection between data sources and the business already offers excellent opportunities with Copilot, AI assistant, and many more features.
But what else would be possible? We could ask our questions directly to our data apps to generate the dashboard we'd like to see, and AI would take over and create everything autonomously and entirely correctly. This will still take time. However, as assistants to humans, we supervise the outcome (as we did with the stewardship of master data management), and we can approve or decline generations as we do with code generations and other applications of AI today.
However, the most significant future that GenAI with a semantic layer will enable is much faster iterations, making it more approachable for business experts and feeding into more accurate outcomes, fed with more context than any other tool in the data stack and adding sub-second response time to my data warehouse with an OLAP cube featuring a solid business context and AI assistant.
Above all, enhancements in natural language querying will improve daily, and data lineage and governance will improve along the way with quality checks and automatic tests.
Besides the challenges of implementing a central tool for metrics or using it for a small landscape, the evolution shows that the need for semantics and a universal layer to define business logic has been here for a long time and that a new semantic layer with new generative capabilities will evolve even further.
Conclusion: The Path Forward
In closing this three-part series, we've taken a comprehensive journey through the evolution and potential of semantic layers. We began by comparing the semantic layer to a Model-View-Controller pattern, explored the key benefits and components of a universal semantic layer with centralized metrics definition, and finally demonstrated how semantic layers are becoming critical tools for implementing natural language integrations due to their unique position bridging business logic and technical implementation.
Integrating AI with semantic layers represents a new way organizations can interact with our data. By providing a unified data modeling layer, optimized query performance, and robust governance, semantic layers are positioning themselves as essential components in the data stack; creating alignment between business and technical teams while ensuring data consistency and governance.
Looking ahead, the combination of semantic layers with AI capabilities promises to democratize data access while maintaining control and accuracy. I'm sure this evolution will continue to enhance business intelligence capabilities, making data more accessible to non-technical users while preserving the rigorous standards required for enterprise-scale operations.
Thank you for following all three parts of this series. We'd love to hear your thoughts, experiences, or questions about implementing semantic layers in your organization. What topics would you like to see covered next? Please share your feedback and reach out to me or Cube directly.
Those interested in learning more about Cube's semantic layer capabilities can visit Cube.dev or check out the documentation to get started.