Implementing Funnel Analysis

This content is being moved to the Cube.js community forum. We encourage you to follow the content and discussions in the new forum post.

Funnels represent a series of events that lead users towards a defined goal. Funnel analysis is an approach commonly used in product, marketing and sales analytics.

Regardless of the domain, every funnel has the following traits:

  • The identity of the object moving through the funnel – e.g. user or lead
  • A set of steps, through which the object moves
  • The date and time of each step
  • The time to convert between steps

Since funnels have a pretty standard structure, they are good candidates for being extracted into reusable packages. Cube.js comes pre-packaged with a standard funnel package.

// First step is to require the Funnel package
const Funnels = require(`Funnels`);

cube(`PurchaseFunnel`, {
  extends: Funnels.eventFunnel({
    userId: {
      sql: `user_id`,
    },
    time: {
      sql: `timestamp`,
    },
    steps: [
      {
        name: `view_product`,
        eventsView: {
          sql: `select * from events where event = 'view_product'`,
        },
      },
      {
        name: `purchase_product`,
        eventsView: {
          sql: `select * from events where event = 'purchase_product'`,
        },
        timeToConvert: '1 day',
      },
    ],
  }),
});

Cube.js will generate an SQL query for this funnel. Since funnel analysis in SQL is not straight forward, the SQL code itself is quite complicated, even for such a small funnel.

Show Funnel's SQL
SELECT
  purchase_funnel.step "purchase_funnel.step",
  count(purchase_funnel.user_id) "purchase_funnel.conversions"
FROM
  (
    WITH joined_events AS (
      select
        view_product_events.user_id view_product_user_id,
        purchase_product_events.user_id purchase_product_user_id,
        view_product_events.t
      FROM
        (
          select
            user_id user_id,
            timestamp t
          from
            (
              select
                *
              from
                events
              where
                event = 'view_product'
            ) e
        ) view_product_events
        LEFT JOIN (
          select
            user_id user_id,
            timestamp t
          from
            (
              select
                *
              from
                events
              where
                event = 'purchase_product'
            ) e
        ) purchase_product_events ON view_product_events.user_id = purchase_product_events.user_id
        AND purchase_product_events.t >= view_product_events.t
        AND (
          purchase_product_events.t :: timestamptz AT TIME ZONE 'America/Los_Angeles'
        ) <= (
          view_product_events.t :: timestamptz AT TIME ZONE 'America/Los_Angeles'
        ) + interval '1 day'
    )
    select
      user_id,
      first_step_user_id,
      step,
      max(t) t
    from
      (
        SELECT
          view_product_user_id user_id,
          view_product_user_id first_step_user_id,
          t,
          'View Product' step
        FROM
          joined_events
        UNION ALL
        SELECT
          purchase_product_user_id user_id,
          view_product_user_id first_step_user_id,
          t,
          'Purchase Product' step
        FROM
          joined_events
      ) as event_steps
    GROUP BY
      1,
      2,
      3
  ) AS purchase_funnel
WHERE
  (
    purchase_funnel.t >= '2018-07-01T07:00:00Z' :: timestamptz
    AND purchase_funnel.t <= '2018-07-31T06:59:59Z' :: timestamptz
  )
GROUP BY
  1
ORDER BY
  2 DESC
LIMIT
  5000

A unique key to identify the users moving through the funnel.

userId: {
  sql: `user_id`;
}

In the situation where user_id changes between steps, you can pass a unique key to join two adjacent steps. For example, if a user signs in after having been tracked anonymously until that point in the funnel, you could use nextStepUserId to define a funnel where users are tracked by anonymous ID on the first step and then by an identified user ID on subsequent steps.

const Funnels = require(`Funnels`);

cube(`OnboardingFunnel`, {
  extends: Funnels.eventFunnel({
    userId: {
      sql: `id`,
    },
    time: {
      sql: `timestamp`,
    },
    steps: [
      {
        name: `View Page`,
        eventsView: {
          sql: `select anonymous_id as id, timestamp from pages`,
        },
      },
      {
        name: `Sign Up`,
        eventsView: {
          sql: `select anonymous_id as id, user_id, timestamp from sign_ups`,
        },
        nextStepUserId: {
          sql: `user_id`,
        },
        timeToConvert: '1 day',
      },
      {
        name: `Action`,
        eventsView: {
          sql: `select user_id as id from actions`,
        },
        timeToConvert: '1 day',
      },
    ],
  }),
});

A timestamp of the event.

time: {
  sql: `timestamp`;
}

An array of steps. Each step has 2 required and 1 optional parameters:

  • name (required) - Name of the step. It must be unique within a funnel.
  • eventsView (required) - Events table for the step. It must contain userId and time fields. For example, if we have defined the userId as user_id and time as timestamp, we need to have these fields in the table we're selecting from.
  • timeToConvert (optional) - A time window during which conversion should happen. Set it depending on your funnel logic. If this is set to 1 day, for instance, it means the funnel will include only users who made a purchase within 1 day of visiting the product page.
steps: [
  {
    name: `purchase_product`,
    eventsView: {
      sql: `select * from events where event = 'purchase_product'`,
    },
    timeToConvert: '1 day',
  },
];

In order to provide additional dimensions, funnels can be joined with other cubes using user_id at the first step of a funnel. This will always use a belongsTo relationship, so hence you should always join with the corresponding user cube. Here, by 'user' we understand this to be any entity that can go through a sequence of steps within funnel. It could be a real web user with an auto assigned ID or a specific email sent by an email automation that goes through a typical flow of events like 'sent', 'opened', 'clicked', and so on. For example, for our PurchaseFunnel we can add a join to another funnel as following:

cube(`PurchaseFunnel`, {
  joins: {
    Users: {
      relationship: `belongsTo`,
      sql: `${CUBE}.first_step_user_id = ${Users}.id`,
    },
  },

  extends: Funnels.eventFunnel({
    // ...
  }),
});

Cube.js is based on multidimensional analysis

Funnel-based cubes have the following structure:

  • conversions - Count of conversions in the funnel. The most useful when broken down by steps. It's the classic funnel view.
  • conversionsPercent - Percentage of conversions. It is useful when you want to inspect a specific step, or set of steps, and find out how a conversion has changed over time.

  • step - Describes funnels' steps. Use it to break down conversions or conversionsPercent by steps, or to filter for a specific step.
  • time - time dimension for the funnel. Use it to filter your analysis for specific dates or to analyze how conversion changes over time.

In the following example, we use the conversions measure along with the steps dimension to display a classic bar chart showing the funnel's steps.

Funnel joins are extremely heavy for most modern databases and complexity grows in a non-linear way with the addition of steps. However, if the cardinality of the first event isn't too high, very simple optimization can be applied: originalSql pre-aggregation.

It is best to use partitioned rollups to cache the steps instead. Add one to the PurchaseFunnel cube as follows:

cube(`PurchaseFunnel`, {
  extends: Funnels.eventFunnel({
    // ...
  }),

  preAggregations: {
    main: {
      type: `originalSql`,
    },
  },
});

Did you find this page useful?