Deepnote tutorial: Collaborative data analysis in a Jupyter-compatible data notebook

Last month, just two days after our update to the SQL API, Deepnote has announced its general availability to the world's best data teams.

What is Deepnote?

Deepnote represents a new generation of Jupyter-compatible notebooks that enables real-time collaboration between teams of data engineers, data scientists, and analysts. With Deepnote, you can explore, analyze, and present your data in infinite ways while using Cube to consistently organize your data upstream.

You can think of Deepnote as the "Google Docs" of data science: it's a data notebook that allows for instantaneous collaboration through shared notebooks and workspaces. The notebook application runs in the cloud that provides all compute resources so you don't need to provision your own hardware. In a way, Deepnote a fully managed multi-user Jupyter server done right and for your convenience.

What is Cube?

Cube is the headless BI platform that sits between your data source and downstream data applications. Because of its position in the data stack, Cube is able to provide consistent metrics definitions for all your data flowing downstream.

Cube works with all modern data stores, creates a metrics layer that standardizes the data for any purpose, and delivers it downstream to any data notebook, application, or BI tool via SQL, REST, or GraphQL APIs.

Consistency for Data Teams

One of the biggest benefits of using Deepnote and Cube together is providing a common starting line for various teams working together within an organization.

Deepnote and Cube

Companies that work cross-functionally and collaboratively often face a similar problem: inconsistent data, especially across different teams. It generally originates to varying metrics definitions across the many tools that each team uses. At some point, teams wrangling the same data realize that they use different SQL queries to calculate metrics and, unfortunately, mislead their users with inconsistent insights.

Let’s go through a super high-level example. Imagine both the product and customer success teams want to calculate product success "over the last year". The product team queries the 2021 data (01/01/2021 to 12/31/2021), whereas customer success queries it as year-over-year (06/01/2021 to 06/01/2022).

Since the data pipelines of both teams are siloed, neither team necessarily has visibility into how the other defined the query; it’s likely these teams use different tools too. This discrepancy in query results can absolutely misalign the teams working towards a single goal, but working off of disparate data.

Now, imagine a company that uses both Deepnote and Cube. Cube maintains, stores, and defines the data models upstream of the teams’ data applications. Meanwhile, Deepnote allows for cross-functional collaboration, exploration, and visualization of those consistently defined metrics. No more data or workflow silos—just reliable data to drive the whole organization in the same direction.

Defining Consistent Metrics

Let’s illustrate how Cube complements Deepnote in the data engineering pipeline. In this example, we’ll use a e-commerce dataset and build a data model around it. We’ll also customize the data model a bit to show how powerful Deepnote and Cube are together.

Running Cube. To keep things simple and save time, we'll run fully managed Cube in Cube Cloud. Please proceed to Cube Cloud’s sign up page and follow along the instructions on connecting to your data source and generating a data model.

You can use the following credentials to get up and running in no time:

Hostname:  demo-db-examples.cube.dev
Port:      5432
Database:  ecom
Username:  cube
Password:  12345

This is what your deployment in Cube will look like:

Cube deployment

Defining the data model. To see the data model you created above, navigate to the "Schema" tab. You will see files named LineItems.js, Orders.js, Users.js, etc. under the "schema" folder which defines the cubes in your data model.

Data schema

Key ideas on what we’re seeing above:

The cube is a logical entity that groups measures (quantitative data) and dimensions (qualitative data) together into metrics definitions.
In this case, we have defined a cube over the entire public.line_items table.
Measures are defined as aggregations (e.g. count, sum, etc.) over columns in the table.
Dimensions are defined over textual, numeric, or temporal columns in the table.
You can define complex measures and dimensions with custom SQL statements and references to other measures and dimensions.

Customizing the data model. While Cube Cloud has a ton of time-saving features, let’s play around with a particularly useful one: the rolling window measure.

Let’s assume you want to calculate the total revenue brought in from sold line items over all time. This can be done by writing a SQL query in your local Jupyter notebook or any other BI tool you may be using. But what happens if you want to observe the total revenue over different time granularities? Or add another dimension to observe the revenue data?

This is where your "LineItems" cube can be modified to abstract away your varied data requests. Simply define the metrics in your data model once, and then use them time and again across teams to achieve your business goals.

For this example, we’ll customize the data model in the cloud by adding the "total revenue" as a measure to the LineItems cube.

// A rolling window measure
revenue: {
  sql: `price`,
  type: `sum`,
  rollingWindow: {
    trailing: `unbounded`,
  },
}

To do this, click on the Enter Development Mode button to unlock the schema files for modifications. This essentially creates a development environment (or "branch") where you can make changes to the data model.

The following code snippet shows the contents of the LineItems.js file (which defines the metrics for the "LineItems" cube) with the "revenue" rolling window added to the measures section.

Navigate to the LineItems.js file and replace its contents with the snippet below.

cube(`LineItems`, {
  sql: `SELECT * FROM public.line_items`,

  joins: {
    Products: {
      sql: `${CUBE}.product_id = ${Products}.id`,
      relationship: `belongsTo`
    },

    Orders: {
      sql: `${CUBE}.order_id = ${Orders}.id`,
      relationship: `belongsTo`
    }
  },

  measures: {
    count: {
      type: `count`,
      drillMembers: [id, createdAt]
    },

    quantity: {
      sql: `quantity`,
      type: `sum`
    },

    price: {
      sql: `price`,
      type: `sum`
    },

    // A rolling window measure
    revenue: {
      sql: `price`,
      type: `sum`,
      rollingWindow: {
        trailing: `unbounded`,
      }
     }
  },

  dimensions: {
    id: {
      sql: `id`,
      type: `number`,
      primaryKey: true
    },

    createdAt: {
      sql: `created_at`,
      type: `time`
    }
  }
});

Click Save All to apply the changes to the development version of your API. Lastly, click Commit & Push to merge your changes back to the main branch. You will see your changes deployed in the "Overview" tab.

Awesome! You’ve just built a customized data model to suit your project. The next part involves connecting your data to Deepnote.

Connecting Deepnote to Cube

Cube offers three different APIs to deliver your data (SQL, REST, and GraphQL) to downstream applications, and the SQL API would be the best option to bring standardized data to Deepnote.

Click on the Deploy SQL API button to generate your credentials:

SQL API

Obviously, the first step is to sign up for a Deepnote account if you don't have one already. Then, follow these instructions to connect your Deepnote to Cube. (Please note that you have to select PostgreSQL integration type, contrary to what's shown in the video.)

You can also visualize, add comments, and explore the full palette of features in Deepnote.

Wrapping up

By connecting Deepnote with Cube upstream, you have consistently defined data as the starting line for your data science or business analysis projects.

Want to see how Deepnote and Cube can power your project? Check the docs and create a free Cube Cloud account today.

Have questions or feedback? Drop us a line, join our Slack community of more than 5000 data folks, and give Cube a star on GitHub.