How COTA uses Cube semantic layer to create a single source of truth for data

IndustryHealthcare

Employees101-250

HQNew York, New York, United States

StackGoogle Cloud, Angular

Use CasesSemantic Layer

COTA was founded in 2011, and they combine oncology expertise with advanced technology and analytics to organize real-world medical treatment data to guide cancer research and care.

COTA builds solutions to better support oncologists' clinical decision-making and researchers' development of new drugs and therapies. One such product is the Real World Analytics (RWA) solution that helps clinicians and researchers make sense of fragmented and often incomplete electronic health records (EHR) data and provide insight into the patient population, treatment patterns, disease outcomes, etc.

COTA has access to millions of electronic oncology patient records with a vast amount of associated data unmatched in the oncology healthcare industry. During COTA's recent Series D funding announcement, they shared the expansion of their data access by more than 300%.

Data analytics is thus critical for COTA, and they strive to apply innovative ways to improve the technology used in the healthcare space. The COTA team wanted to replace off-the-shelf solutions like Qlik and Tableau, which require heavy customization and specialty configuration knowledge with a more developer-friendly ecosystem.

With Cube, they found that match and leveraged its reference architecture to save development cycles that previously involved writing custom queries, custom data munging, and processing. When the COTA team discovered Cube in 2019, they became early adopters of Cube soon after it was open sourced. You will find Cube in action today in this RWA demo video.

”As COTA continues to democratize the use of real world data in the healthcare ecosystem, Cube is helping us accelerate cancer research and improve quality of patient care.”

Shivam Mathura, Product & Strategy at COTA

How COTA is using Cube

Today, Cube is a foundational tool for COTA's products, with Cube Data Schema serving as the single source of truth for their data and Cube API powering their applications.

The COTA team uses Google Cloud Platform to host and run their applications, including the Cube deployment. Their deep, longitudinal datasets feature upwards of 150 tables from structured and unstructured sources, which come together in their data products to form a cohesive, comprehensive patient journey.

While assessing possible architecture options for the Cube deployment, the COTA team initially explored using Google Cloud Functions to run Cube in serverless mode, and they were among the first Cube users who were interested in support for Google Cloud Functions. However, they ultimately decided to use microservices hosted on Google Compute Engine.

COTA architecture

The COTA team uses Angular on the front-end of their solutions. They have been an early proponent of Angular support in Cube while initially integrating Cube via its vanilla JavaScript library, which provided framework-agnostic methods to query Cube API. Today they use Angular 10 and leverage TypeScript support in Cube. They also use the Plotly charting library wrapped with the angular-plotly.js component for data visualization in their applications.

Since multiple queries are often run against Cube API by users while navigating the applications, the COTA team wanted to ensure that queries can be canceled to conserve resources. For example, when a user queries multiple times, they want to cancel the ongoing queries and only run the current (or the latest) query. To achieve that, they used mutex support, which synchronizes multiple concurrent requests performed by the Cube API via mutexKey and mutexObj:

let resultSet = this.cubejsClient.load({
    dimensions,
    measures,
    filters,
    timeDimensions,
    limit,
    order,
  }, {
    mutexKey,
    mutexObj: this.mutexObject // mutexObject is initialised as {}
  });

In-memory cache and pre-aggregations in Cube also helped the COTA team achieve desired performance since the same table can be viewed in different ways based on applications. They are happy with the performance so far; however, they also experienced that pre-aggregations is likely one of the more complicated features in Cube. Ensuring they use all pre-aggregations features correctly required a fair amount of effort and community support on Slack.

The COTA team also employs dynamic schema generation with zero-setup originalSql pre-aggregations to materialize results for every CustomerID and ProductID pair of dimensions. This approach significantly reduces the response time for their per-customer or per-product queries:

asyncModule(async () => {
  const types = ['CustomerID', 'OrderDetail.ProductID'];

  cube(`GeneticTest`, {
    sql: `SELECT * FROM Order`,

    measures: {
      count: { sql: `orderid`, type: `countDistinct` }
    },
    dimensions: {
      orderId: { sql: `orderid`, type: `number` }
    },
    ...(types
      .map(type => ({
        [${getDimensionName(marker)}]: {
          sql: JSON_VALUE(${CUBE}.Order, `$.${type}`),
          type: `string`
        },
      }))
      .reduce((a, b) => Object.assign(a, b))
    ),
    preAggregations: {
      main: { type: `originalSql` }
    }
  })
});

The COTA team also provided early feedback on the advanced boolean logic, as this feature made it easier for the team to read and write code. Before the advanced boolean logic, it wasn't easy to do complex nested queries, so they used multiple queries by getting an inner query result and computing another query on that result set. With the advanced boolean logic, they took advantage of the Angular-QueryBuilder to create complex logic to get a result with a single API call.

Deploying Cube at COTA

For the Cube implementation at COTA, there were 4 people involved: a product manager, a data analyst, and two software engineers. The release of the minimum viable product (MVP) for their RWA product took about 6-8 months.

As the COTA team was working on the Cube implementation, they took advantage of community support and insights on Slack and the community forum. They also appreciated the ability to file bugs and feature requests on GitHub. More recently, they learned from feature demo discussions during Monthly Community Calls on ways to fine-tune their Cube implementations (e.g., optimizing countDistinct pre-aggregations).

Future plans

As areas for improvement in Cube, the COTA team would like to see more out-of-the-box Angular support from Cube to help with the speed of integration at COTA. For example, implementing AngularHttpTransport would help them leverage features like HttpInterceptor to simplify canceling the requests.

Internally, the COTA team also wants to migrate Cube to containerized Docker deployments to align with the rest of their infrastructure at the company.

Interested in joining COTA's success with Cube?

Explore Cube examples & tutorials and get started today. To jumpstart your efforts, please join us on Slack, follow us on Twitter, and get engaged with the growing Cube community.