GraphQL federation with Hasura GraphQL Engine

Use Hasura Remote Schemas to build a low-latency data application.

Cover of the 'GraphQL federation with Hasura GraphQL Engine' blog post

Hasura GraphQL engine helps build modern apps & APIs 10x faster. In this article, we’ll see how you can build a data-intensive app with Hasura that would work 10x faster. To achieve that, we’ll use Hasura Remote Schemas with Cube and its GraphQL API to boost query performance.

You can find the complete source code for this application on GitHub.

demo app preview

What are we building?

In this tutorial, you’ll learn how to extend Hasura with a Remote Schema by using Cube as a caching layer. This will enable delivering data in milliseconds to end users of a web application built with React and Apollo GraphQL.

cube diagram

We’ll use an example dataset of fraudulent financial transactions. The dataset is collected from Kaggle and only has one table called fraud. To read more, check out the full explanation of the dataset.

Let’s jump in!

What is Hasura?

Hasura is an open-source GraphQL engine that helps developers build modern APIs and apps faster. You can run Hasura locally or in the cloud, and connect it to your new or existing databases to instantly get a production-grade GraphQL API.

The Hasura GraphQL engine is open-source with ~28,000 GitHub stars!

Hasura recommends running Hasura Cloud for your production environment with the option to use Hasura Enterprise, with added security and performance features, if you need to deploy it on-premises.

Hasura currently supports four databases: PostgreSQL, Microsoft SQL Server, Google BigQuery, and AWS Aurora. And, four more are coming soon: Oracle, Mongo, MySQL, Elastic.

Hasura has built-in authorization and caching. The row-level security authorization engine allows you to conveniently specify authorization rules at a model level, and safely expose the GraphQL API to developers inside or outside your organization. Both Hasura Cloud and Enterprise provide support for caching query responses, to improve performance for queries that are executed frequently. This includes actions and queries against remote schemas as well!

Hasura support both GraphQL and REST APIs. You can create idiomatic REST endpoints based on GraphQL templates. With Hasura you don’t need to be a GraphQL expert upfront. It saves time and effort in shipping and maintaining the APIs for you.

Getting Started with Hasura

The easiest way to get started with Hasura is by using Hasura Cloud. Head over to hasura.io and create a new account if you already haven't done so.

Start by creating a new instance.

Let’s connect the database to Hasura. From your Hasura console, select the data icon and click the Connect Database button. Next, select connect to existing database options.

hasura data manager

You will get a form to enter your database credentials. Here’s where you add the credentials to the database we provided and click “Connect Database”.

Hostname: demo-db-examples.cube.dev
Port: 5432
Database: fraud
Username: cube
Password: 12345

enter database credentials

Once you connect the database, click the Track button to create a GraphQL API from the fraud table.

track db schema

Once tracking is enabled, you can modify columns and preview the data.

browse rows

Let’s jump into Hasura and run a few analytical queries.

Running analytical queries in Hasura

Navigate to the API tab in the Hasura console. Open the GraphQLi IDE. Let’s use aggregate fields to fetch aggregated data. Hasura is great to expose this aggregation feature!

query MyQuery {
fraud_aggregate (limit: 10000) {
aggregate {
sum {
amount
}
}
nodes {
step
type
}
}
}

Notice that it takes about 3 seconds to load the data.

api analytical query

Hasura exposes a caching feature, but again, it’s difficult to cache queries that often have varying filters.

api running analytical query

Caching alone can only cover a finite number of use cases. If I add a where clause, I get the same issue again even though caching is enabled.

api with cache

We can also create aggregations with views in Hasura.

Once again go to the Data tab in Hasura and click the SQL tab in the left-hand navigation.

Here you can add raw SQL to create a view.

CREATE VIEW fraud_amount_sum_frauds AS
SELECT
"fraud"."isFraud" "fraud__is_fraud",
"fraud"."step" "fraud__step",
"fraud"."type" "fraud__type",
sum("fraud"."amount") "fraud__amount_sum"
FROM public.fraud AS "fraud"
WHERE "fraud"."isFraud" = 1
GROUP BY 1, 2, 3
ORDER BY 1 ASC;
CREATE VIEW fraud_amount_sum_non_frauds AS
SELECT
"fraud"."isFraud" "fraud__is_fraud",
"fraud"."step" "fraud__step",
"fraud"."type" "fraud__type",
sum("fraud"."amount") "fraud__amount_sum"
FROM public.fraud AS "fraud"
WHERE "fraud"."isFraud" = 0
GROUP BY 1, 2, 3
ORDER BY 1 ASC;

raw sql create view

But, also enable tracking as well. This will let you query the data using the GraphQL API.

query MyQuery{
fraud_amount_sum_frauds(
order_by: { fraud__step: asc }
limit: 1000
) {
fraud__amount_sum
fraud__step
fraud__type
}
}

query with view

The query now takes around 2 seconds to return a response. Building front-end visualizations with charts that take this long to load are unacceptable.

Let’s try getting to the bottom of why running analytical queries is slow with Hasura and PostgreSQL.

Why are the analytical queries slow in Hasura?

Data, a lot of it. Postgres is a traditional row-oriented database. Row-oriented databases store information in the disk row by row.

Analytical queries require aggregating the data by column. Row-oriented databases don’t perform well in this case because all columns in every row need to be read from the disk instead of a few ones involved in a query. Also, certain optimizations like compression can only be effectively applied to columnar data.

You can learn more about how row-oriented databases work and their limitations in this blog post.

You need a caching mechanism that’s reliable and flexible. User experience research consistently shows a positive correlation between faster response times and higher customer satisfaction. Half of your users will immediately drop off if the response time is over 3 seconds if we believe Google.

What we’ll do to mitigate this is to add Cube into our data stack. It’ll help us model data and accelerate queries with caching and pre-aggregations.

What is Cube?

Cube is an open-source API-first headless business intelligence platform that connects to your data sources and makes queries fast, responsive, cost-effective, and consistent across your applications.

It enables data engineers and application developers to access and organize data to build performant data-intensive applications.

https://cubedev-blog-images.s3.us-east-2.amazonaws.com/0ec5f24c-a0e1-4202-8d35-5531be1a60e3.jpg

Cube’s API layer can efficiently aggregate your data and serve it to applications. Instead of querying complex, large datasets directly from your Postgres database, you can use Cube as a middleware layer. Cube performs caching, pre-aggregation, and much more, making your analytical queries faster and more efficient.

Moreover, Cube has a GraphQL API that easily connects to your Hasura engine as a Remote Schema. This way you can unify all your GraphQL endpoints in Hasura and get the added benefit of performant analytical queries.

Setting up Cube

The easiest way to get started with Cube is with Cube Cloud. It provides a fully managed, ready-to-use Cube cluster. However, if you prefer self-hosting, follow this guide in the docs.

Let’s move on and create a new Cube deployment in Cube Cloud. You can select a cloud platform of your choice.

cube setup

Next, select + Create to get started with a fresh instance from scratch.

cube create from scratch

Next, provide the database connection information. Select PostgreSQL.

select db in cube

Now enter the same database credentials we used above when setting up Hasura, and select continue.

Hostname: demo-db-examples.cube.dev
Port: 5432
Database: fraud
Username: cube
Password: 12345

add db creds in cube

Cube auto-generates a Data Schema from your SQL tables. Similar to Hasura’s Track . feature mentioned above. It’s used to model raw data into meaningful business definitions.

Select the fraud table for schema generation, and click Generate . It takes a few minutes for the Cube instance to get provisioned.

generate schema in cube

Now, we can move on to defining our data model and accelerating queries with pre-aggregations.

Centralized data modeling

In your Cube deployment, select  Schema in the left-hand navigation and click Enter Development Mode . Now let’s edit the Fraud.js schema definition to add a measure for the sum of transaction amounts.

schema editor

Once Development Mode is enabled, go ahead and paste the code below into the Fraud.js schema file.

cube(`Fraud`, {
sql: `SELECT * FROM public.fraud`,
preAggregations: {},
joins: {},
measures: {
count: {
type: `count`,
drillMembers: [nameorig, namedest]
},
amountSum: {
sql: `${CUBE}."amount"`,
type: `sum`,
},
},
dimensions: {
step: {
sql: `${CUBE}."step"`,
type: `number`
},
newbalancedest: {
sql: `${CUBE}."newbalanceDest"`,
type: `string`
},
nameorig: {
sql: `${CUBE}."nameOrig"`,
type: `string`
},
oldbalanceorg: {
sql: `${CUBE}."oldbalanceOrg"`,
type: `string`
},
namedest: {
sql: `${CUBE}."nameDest"`,
type: `string`
},
newbalanceorg: {
sql: `${CUBE}."newbalanceOrg"`,
type: `string`
},
oldbalancedest: {
sql: `${CUBE}."oldbalanceDest"`,
type: `string`
},
type: {
sql: `${CUBE}."type"`,
type: `string`
},
amount: {
sql: `${CUBE}."amount"`,
type: `number`
},
isFraud: {
sql: `${CUBE}."isFraud"`,
type: `boolean`
},
isFlaggedFraud: {
sql: `${CUBE}."isFlaggedFraud"`,
type: `boolean`
}
}
});

schema editor dev mode

Save the changes, and the data model will be updated. Next, commit and push these changes. Cube uses Git for version control. You can revert your changes anytime you like.

git commit and push

Moving over to the Playground, you can run the same analytical query we ran previously in Hasura.

query in cube

The query will also be mapped to a GraphQL query through the GraphiQL IDE that’s exposed within Cube Cloud.

query in cube with graphqli

Now we have a centralized data model where we can reliably handle business definitions without splitting up logic between PostgreSQL and Hasura.

But, we’re only halfway there. Let’s add query acceleration with pre-aggregations as well.

Adding pre-aggregations to increase query performance

One of Cube’s most powerful features is pre-aggregations. They can reduce the execution time of a query drastically. In our case with this tutorial, we’ll be reducing the response time to well below 200 ms, or even less, for queries that took above 3 seconds.

In Cube, pre-aggregations are condensed versions of the source data. They are materialized ahead of time and persisted as tables separately from the raw data. To learn more about pre-aggregations, please follow this tutorial.

We also highly recommend you check these in-depth video workshops on pre-aggregations: Mastering Cube Pre-Aggregations and Advanced Pre-aggregations in Cube.

But now, let’s jump back into Development Mode*. Select the Fraud.js schema file again. Update the preAggregations section to add a pre-aggregation definition.

cube(`Fraud`, {
sql: `SELECT * FROM public.fraud`,
preAggregations: {
fraudAmountSum: {
measures: [
Fraud.amountSum
],
dimensions: [
Fraud.isFraud,
Fraud.step,
Fraud.type
]
}
},
...
});

Save the changes, click Commit and push , and the pre-aggregation will be built for our analytical query. Here’s what the pre-aggregation should look like once the schema has been updated.

pre-aggregation

When you run the query next time in Cube, the data will be pre-aggregated and saved in Cube's caching layer inside of Cube Store.

Running this query again, you’ll see a massive performance increase.

pre-aggregation query in cube

The true power lies in still retaining query acceleration when using filters. That’s why pre-aggregations are so much more powerful than basic caching strategies.

pre-aggregation query in cube with filter

At this point, your Cube instance is ready to be hooked up with Hasura as a remote graph.

Add Cube as Remote Schema in Hasura

Hasura can merge remote GraphQL schemas and provide a unified GraphQL API. Think of it like automated schema stitching. Any GraphQL service that lives outside of Hasura is considered a remote graph.

You can effortlessly merge Cube's GraphQL schema using the Remote Schema option in Hasura.

From the Hasura console, select Remote Schemas, then select Add.

screely-1662385730162.png

To add a remote schema, first copy the GraphQL API and Authorization token from your Cube deployment. You can find the GraphQL API endpoint and Authorization token by clicking the How to connect your application button in your Cube deployment Overview tab.

cube graphql api and token

Give the Remote Schema a name and paste the GraphQL API and Authorization token in the input fields in Hasura. Click Add when you’re done.

hasura remote schema

If you go back to the API tab in Hasura, you can now run Cube queries directly. Let’s run the same query we ran a moment ago in the Cube Cloud Playground.

query CubeQuery {
cube {
fraud(orderBy: {step: asc } ) {
amountSum
step
type
}
}
}

You will notice a significant performance improvement in your query this time.

hasura api with cube as remote schema

⚠️ Caution: You may not see a significant performance improvement for the first few queries if you use a free tier of Hasura and a free tier of Cube. After the first 2/3 queries, it should work as expected. The free tier for both services goes into sleep mode when not used for some time.

Building data visualization with Apollo and React

For the front-end app, we’ll use React and Apollo Client, and query the Hasura GraphQL server that’s connected to Cube as a remote schema. We’ll use the nivo charting library, a modern production-ready data visualization tool.

You can check the full source code on GitHub and instantly run it with yarn dev. You'll get a copy of this demo application.

demo app preview

The entry point is src/index.js, and it uses a LineChart.jsx file to generate the nivo line chart.

We decided to showcase the power of pre-aggregations by generating queries that filter the steps into pages of 50 each, as well as choosing whether to show valid or fraudulent transactions.

Even though the query uses filters, it will still be accelerated due to using pre-aggregations in Cube!

Let’s walk through the contents of the React files. First the index.js.

import React from 'react';
import { useState, useEffect } from 'react'
import * as classes from './index.module.css'
import * as ReactDOM from 'react-dom/client';
import {
ApolloClient,
InMemoryCache,
ApolloProvider,
useQuery,
gql,
createHttpLink,
ApolloLink,
from,
} from '@apollo/client';
import { setContext } from '@apollo/client/link/context';
import {
range,
tablePivotCube,
tablePivotHasura,
availableStepRanges,
defaultIsFraudSelection,
defaultStepSelection,
DisplayFraudAmountSum,
randomIntFromInterval,
} from './utils/utils';
const httpLink = createHttpLink({
uri: `${process.env.HASURA_GRAPHQL_API_URL}`,
});
const authLink = setContext((_, { headers }) => {
return {
headers: {
...headers,
'x-hasura-role': `${process.env.X_HASURA_ROLE}`,
}
}
});
let timestampsGlobal = {};
const roundTripLink = new ApolloLink((operation, forward) => {
operation.setContext({ start: new Date() });
timestampsGlobal = {};
return forward(operation).map((data) => {
timestampsGlobal[operation.operationName] = new Date() - operation.getContext().start;
return data;
});
});
const client = new ApolloClient({
link: from([roundTripLink, authLink.concat(httpLink)]),
cache: new InMemoryCache()
});
ReactDOM
.createRoot(document.getElementById('app'))
.render(
<ApolloProvider client={client}>
<App />
</ApolloProvider>,
)
function App() {
const [ timestamps, setTimestamps ] = useState(0);
useEffect(() => {
setTimestamps(timestampsGlobal)
}, [ timestampsGlobal ]);
const [ fraudChartDataCube, setFraudChartDataCube ] = useState([])
const [ fraudChartDataHasura, setFraudChartDataHasura ] = useState([])
const [ stepSelection, setStepSelection ] = useState(defaultStepSelection);
const selectedStep = availableStepRanges.find(x => x.id === stepSelection);
const selectedStepRange = range(selectedStep.start, selectedStep.end);
const [ isFraudSelection, setIsFraudSelection ] = useState(defaultIsFraudSelection);
const shuffleAndRun = () => {
setStepSelection(randomIntFromInterval(1, 14));
setIsFraudSelection(randomIntFromInterval(0, 1));
}
const GET_FRAUD_AMOUNT_SUM_CUBE_REMOTE_SCHEMA = gql`
query CubeQuery {
cube(
where: {fraud: {AND: [
{step: {gte: ${selectedStep.start} }},
{step: {lte: ${selectedStep.end} }},
{isFraud: {equals: "${isFraudSelection}" }}
]}},
orderBy: {fraud: {step: asc}}
) {
fraud {
amountSum
step
type
}
}
}
`;
const {
loading: loadingFraudDataCube,
error: errorFraudDataCube,
data: fraudDataCube,
} = useQuery(GET_FRAUD_AMOUNT_SUM_CUBE_REMOTE_SCHEMA);
useEffect(() => {
if (loadingFraudDataCube) { return; }
setFraudChartDataCube(tablePivotCube(fraudDataCube));
}, [ fraudDataCube ]);
const GET_FRAUD_AMOUNT_SUM_HASURA_FRAUDS = gql`
query HasuraQuery{
fraud_amount_sum_frauds(
where: {
fraud__step: {_in: [${selectedStepRange}]}
}
order_by: { fraud__step: asc }
) {
fraud__amount_sum
fraud__step
fraud__type
}
}
`;
const GET_FRAUD_AMOUNT_SUM_HASURA_NON_FRAUDS = gql`
query HasuraQuery{
fraud_amount_sum_non_frauds(
where: {
fraud__step: {_in: [${selectedStepRange}]}
}
order_by: { fraud__step: asc }
) {
fraud__amount_sum
fraud__step
fraud__type
}
}
`;
let GET_FRAUD_AMOUNT_SUM_HASURA;
if (isFraudSelection) {
GET_FRAUD_AMOUNT_SUM_HASURA = GET_FRAUD_AMOUNT_SUM_HASURA_FRAUDS;
} else {
GET_FRAUD_AMOUNT_SUM_HASURA = GET_FRAUD_AMOUNT_SUM_HASURA_NON_FRAUDS;
}
const {
loading: loadingFraudDataHasura,
error: errorFraudDataHasura,
data: fraudDataHasura,
} = useQuery(GET_FRAUD_AMOUNT_SUM_HASURA);
useEffect(() => {
if (loadingFraudDataHasura) { return; }
if (isFraudSelection) {
setFraudChartDataHasura(tablePivotHasura(fraudDataHasura.fraud_amount_sum_frauds));
} else {
setFraudChartDataHasura(tablePivotHasura(fraudDataHasura.fraud_amount_sum_non_frauds));
}
}, [ fraudDataHasura ]);
return <>
<div style={{display: 'flex', justifyContent: 'center'}}>
<select
className={classes.select}
value={stepSelection}
onChange={e => setStepSelection(parseInt(e.target.value))}
>
<option value="" disabled>Select transaction step in time...</option>
{availableStepRanges.map(stepRange => (
<option key={stepRange.id} value={stepRange.id}>
Transactions from {stepRange.start} to {stepRange.end}
</option>
))}
</select>
<select
className={classes.select}
value={isFraudSelection}
onChange={e => setIsFraudSelection(parseInt(e.target.value))}
>
<option value="" disabled>Select if it's a fraudulent transaction...</option>
<option key={0} value={0}>
Non-fraudulent transactions
</option>
<option key={1} value={1}>
Fraudulent transactions
</option>
</select>
<div className={`${classes.buttonwrp}`}>
<button className={`Button Button--size-s Button--pink`} onClick={shuffleAndRun}>
Shuffle and Run!
</button>
</div>
</div>
<table style={{ width: '100%' }}>
<tbody>
<tr>
<td style={{ width: '50%' }}>
<div style={{ height: '375px', margin: '20px 0' }}>
<h3 style={{display: 'flex', justifyContent: 'center'}}>Hasura + Cube {timestamps.CubeQuery ? `(${timestamps.CubeQuery / 1000}s)` : ``}</h3>
<DisplayFraudAmountSum
loading={loadingFraudDataCube}
error={errorFraudDataCube}
chartData={fraudChartDataCube}
/>
</div>
</td>
<td style={{ width: '50%' }}>
<div style={{ height: '375px', margin: '20px 0' }}>
<h3 style={{display: 'flex', justifyContent: 'center'}}>Hasura + PostgreSQL {timestamps.HasuraQuery ? `(${timestamps.HasuraQuery / 1000}s)` : ``}</h3>
<DisplayFraudAmountSum
loading={loadingFraudDataHasura}
error={errorFraudDataHasura}
chartData={fraudChartDataHasura}
/>
</div>
</td>
</tr>
</tbody>
</table>
</>
}

Let me explain the main points of the code above.

  • We use @apollo/client and wrap the React <App /> in <ApolloProvider>...<ApolloProvider/>.
    • This includes using httpLink and authLink to load the Hasura GraphQL API endpoint and secret token.
  • A typical API interaction flow in a React app with React hooks looks like this:
    • use useState to create a state variable (e.g., fraudChartData);
    • compose a GraphQL query (e.g., GET_FRAUD_AMOUNT_SUM_CUBE_REMOTE_SCHEMA);
    • call useQuery to fetch the result set (e.g., fraudDataCube);
    • use useEffect to await for the data and to transform it into fraudChartDataCube to be loaded into LineChart;
    • assign the data to the state variable (e.g., with setFraudChartDataCube).
  • We configure the GET_FRAUD_AMOUNT_SUM_CUBE_REMOTE_SCHEMA GraphQL query to load parameters dynamically from the two dropdown selectors.
  • Lastly, the data is rendered by using DisplayFraudAmountSum with LineChart.

That's it!

Final thoughts

Cube comes packed with features that can help you build data-intensive apps fast. Cube supports features like multiple database connections, multi-tenancy, SQL API for connecting BI tools, and more. By using Hasura’s remote schema, you can keep using your Hasura GraphQL server even when faced with running time-consuming analytical queries!

You can sign up for Cube Cloud for free and try it for yourself. To learn more about how Cube can help you to build your project, head over to the official documentation page.

If you have questions or feedback, we would love to hear what you have to say! Come join our Slack community. Click here to join!

That’s all for today. Feel free to leave Cube a ⭐ on GitHub if you liked this article!

Share this article