How to Directly Query Large Datasets with Excel

Intro

The MDX API in Cube Cloud is a strategic enhancement designed to meet the evolving needs of our enterprise customers. Many users are accustomed to the seamless integration and robust analytics capabilities of traditional OLAP solutions like SQL Server Analysis Services.

By enabling direct connections from Excel to Cube Cloud using MDX queries, anyone with Excel skills can perform sophisticated data analysis in a familiar environment. This integration facilitates the visualization of data in pivot tables and the utilization of Excel (or any other tool that speaks MDX) as a powerful front-end tool for slicing and dicing data across dimensions and measures.

What is MDX

Multidimensional Expressions (MDX) is a query language specifically designed for querying and managing multidimensional data structures, such as OLAP cubes. Originally developed by a team of engineers, including Mosha Pasumansky, one of the principal architects behind SQL Server Analysis Services (SSAS), MDX was introduced with the release of OLAP Services 7.0 in the late 1990s.

Unlike SQL, which is tailored for two-dimensional, tabular datasets, MDX is designed to work with complex, multi-dimensional data structures. These structures utilize measures, dimensions, hierarchies, and attributes to support advanced data analysis. Although MDX and SQL both use clauses like SELECT, FROM, and WHERE, their purposes and applications are quite distinct, reflecting the different types of data they are intended to query.

Architecture overview

The native MDX API in Cube Cloud enhances the universal semantic layer by enabling seamless integration with Excel's analytical tools. This support allows users to leverage Excel's powerful data analysis features while maintaining a connection to Cube’s robust data infrastructure. Traditionally, analysts faced challenges when dealing with large datasets in Excel. They often had to pre-process and aggregate data before analysis, which disconnected their workbooks from live data sources. This process could have been more convenient and efficient, especially when updates or sharing were required.

With Cube Cloud's MDX API, Excel can directly query large datasets through the universal semantic layer. The MDX queries are translated into optimized SQL, processed by Cube, and the results are returned in a format Excel can interpret.

Customers can extend the benefits of the universal semantic layer by allowing anyone to connect to and reuse trusted data assets and perform multi-dimensional analysis on governed data with Microsoft Excel using a live connection–no exports needed. This integration eliminates the need for pre-processing, ensures real-time data access, and streamlines complex data transformations, allowing analysts to focus on deriving insights and storytelling.

Enhanced Data Analysis

The MDX API in Cube Cloud unlocks advanced analytical capabilities, empowering users to perform more sophisticated and meaningful data analysis. Just like everything in Cube Cloud, the data model is the flexible foundation upon which everything else is derived. Users can gain deeper insights and more precise control over their data by leveraging features such as dimension hierarchies, measure groups, and key-based hierarchies defined in the data model.

Dimension hierarchies

Dimension hierarchies allow users to define and navigate multiple levels of data granularity within their analyses. This powerful feature enables structured and intuitive data exploration, making uncovering insights and trends across different dimensions easier.

For example, consider a view in Cube Cloud named orders_view that contains data about orders, including the amount, count, and breakdown by status and geography. By defining a Geography hierarchy, this data can be organized into levels such as country, state, and city. This hierarchical structure facilitates a standardized drill-down path in Excel for deeper and more granular analysis.

views:
  - name: orders_view
    description: "Data about orders, amount, count and breakdown by status and geography."
    meta:
      hierarchies:
        - name: "Geography"
          levels:
            - country
            - state
            - city

Measure groups

Measure groups, supported by the MDX API, allow for the organization of measures into logical groups or folders within the dataset's schema. This organizational feature enhances the clarity and usability of data analyses by grouping related metrics together.

You can define measure groups in the view's schema file:

views:
  - name: orders_view
    description: "Data about orders, amount, count and breakdown by status and geography."
    meta:
      folders:
          - name: "Folder A"
            members:
              - total_amount
              - average_order_value
          - name: "Folder B"
            members:
              - completed_count
              - completed_percentage

Key-Based Hierarchies

Utilize the label_member data model param to create key-based hierarchies, ensuring that business-friendly text names are uniquely identifiable by their keys, even when multiple entries share the same text name.

You can define measure groups in the view's schema file:

dimensions:
  - name: status
    sql: STATUS
    type: string
    description: The status of the order (completed etc)
    meta:
      label_member: <another dimension name here>

Security and data access controls

Authentication and authorization work the same as for the SQL API. While this may seem obvious or not particularly special, it brings significant value for enterprise-grade companies that need fine-grained data access policies that are compliant with organizational policies.

Once you set up security policies using Cube’s data modeling techniques, you instantly get the same policies across all your APIs - REST, SQL, MDX, and even AI. To maximize its use, utilize the full power of modeling techniques provided by the Cube semantic layer, like Row-Level Security and query rewrite with the current user’s permissions.

Authenticating Requests with an LDAP Catalog

In enterprise environments, LDAP (Lightweight Directory Access Protocol) is commonly used to authenticate and manage users. Integrating LDAP authentication with the MDX API ensures that user access is consistent with organizational policies and that sensitive data remains protected. To authenticate requests to the MDX API using an LDAP catalog, check this recipe.

Caching

Caching is essential for enhancing the performance and responsiveness of the MDX API in Cube Cloud. It minimizes redundant database access, which saves money, and speeds up data retrieval, improving user experience. Cube Cloud implements two primary levels of caching: in-memory caching and pre-aggregations.

In-Memory Caching

In-memory caching, enabled by default, stores query results in memory for quick retrieval. When a query is made, Cube Cloud checks the in-memory cache first. If the result is available and valid, it returns instantly, bypassing the database and reducing latency.

Pre-Aggregations

Pre-aggregations require explicit configuration and pre-compute aggregated data, drastically speeding up query performance. Defined in the data model, they are stored in the Cube Store. When a query is received, Cube Cloud checks for relevant pre-aggregations. If found, it serves the query from pre-aggregated data; if not, the database executes it, and the result is cached.

Caching in Cube Cloud offers significant advantages: it improves performance by minimizing database queries, scales efficiently to handle more users and larger datasets without performance degradation, and reduces operational costs by lowering query frequency. Additionally, it enhances user experience by delivering faster query responses, allowing analysts to focus on insights. Leveraging both in-memory caching and pre-aggregations, Cube Cloud ensures fast, scalable, and efficient data analysis, even with large datasets in Excel.

Conclusion

Cube Cloud’s MDX API not only enhances data analysis capabilities but also integrates seamlessly with Excel and underlying data sources, leveraging the power of the universal semantic layer. This layer serves as a bridge between data sources and data consumers, ensuring consistent and trusted insights across all endpoints—from BI tools to embedded analytics and AI agents.

Cube Cloud, designed with a code-first, developer-oriented approach, supports a wide range of deployment options, data connectivity, and native APIs. It empowers data engineers to apply software engineering best practices like CI/CD, version control, and code reviews, thereby boosting productivity and performance in data management.

By utilizing Cube Cloud’s Semantic Catalog, users gain a unified view of connected data assets that they can analyze in Excel, facilitating effortless exploration and understanding of data lineage, entities, relationships, dimensions, and metrics. This comprehensive approach not only enhances collaboration between data engineers and analysts but also streamlines data modeling and analysis processes.

In summary, Cube Cloud’s MDX API feature, combined with its universal semantic layer and robust capabilities, sets a new standard for modern multidimensional analysis in spreadsheets. It enables organizations to make informed business decisions faster, with confidence in their data integrity and insights derived from a unified data environment.