There are many ways to visualize data, but when it comes to location-based (or geospatial) data, map-based data visualizations are the most comprehensible and graphic.
We'll also learn how to make this map data visualization interactive (or dynamic), allowing users to control what data is being visualized on the map.
To make this guide even more interesting, we'll use Stack Overflow open dataset, publicly available in Google BigQuery and on Kaggle. With this dataset, we'll be able to find answers to the following questions:
- Where do Stack Overflow users live?
- Is there any correlation between Stack Overflow users' locations and their ratings?
- What is the total and average Stack Oerflow users' rating by country?
- Is there any difference between the locations of people who ask and answer questions?
So, that's our plan — and let's get hacking! 🤘
Oh, wait! Here's what our result is going to look like! Amazing, huh?
Dataset and API
Original Stack Overflow dataset contains locations as strings of text. However, Mapbox best works with locations encoded as GeoJSON, an open standard for geographical features based (surprise!) on JSON.
That's why we've used Mapbox Search API to perform geocoding. As the geocoding procedure has nothing to do with map data visualization, we're just providing the ready to use dataset with embedded GeoJSON data (the file size is about 600 MB).
We've also set up a public Postgres instance that we'll use throughout this tutorial so you don't need to set it up yourself.
Setting Up an API 📦
Let's use Cube, an open-source analytical API platform, to serve this dataset over an API. Run this command:
Cube uses environment variables for configuration. To set up the connection to our database, we need to specify the database type and name.
In the newly created
stackoverflow__example folder, please replace the contents of the .env file with the following:
Now we're ready to start the API with this simple command:
To check if the API works, please navigate to http://localhost:4000 in your browser. You'll see Cube Developer Playground, a powerful tool which greatly simplifies data exploration and query building.
The last thing left to make the API work is to define the data schema: it describes what kind of data we have in our dataset and what should be available at our application.
Let’s go to the data schema page and check all tables from our database. Then, please click on the plus icon and press the “generate schema” button. Voila! 🎉
Now you can spot a number of new
*.js files in the
So, our API is set up, and we're ready to create map data visualizations with Mapbox!
Frontend and Mapbox
Navigate to the templates page and choose one of predefined templates or click "Create your own". In this guide, we'll be using React, so choose accordingly.
After a few minutes spent to install all dependencies (oh, these
node_modules) you'll have the new
dashboard-app folder. Run this app with the following commands:
Great! Now we're ready to add Mapbox to our front-end app.
Setting Up Mapbox 🗺
react-map-gl with this command:
To connect this package to our front-end app, replace the
src/App.jsx with the following:
You can see that
MAPBOX_TOKEN needs to be obtained from Mapbox and put in this file.
At this point we have an empty world map and can start to visualize data. Hurray!
Planning the Map Data Visualization 🔢
Here's how you can any map data visualization using Mapbox and Cube:
- load data to the front-end with Cube
- transform data to GeoJSON format
- load data to Mapbox layers
- optionally, customize the map using the
propertiesobject to set up data-driven styling and manipulations
In this guide, we'll follow this path and create four independent map data visualizations:
- a heatmap layer based on users' location data
- a points layer with data-driven styling and dynamically updated data source
- a points layer with click events
- a choropleth layer based on different calculations and data-driven styling
Let's get hacking! 😎
Okay, let's create our first map data visualization! 1️⃣
Heatmap layer is a suitable way to show data distribution and density. That's why we'll use it to show where Stack Overflow users live.
However, some Stack Overflow users have amazing locations like "in the cloud", "Interstellar Transport Station", or "on a server far far away". Surprisingly, we can't translate all these fancy locations to GeoJSON, so we're using the SQL
WHERE clause to select only users from the Earth. 🌎
Here's how the
schema/Users.js file should look like:
Also, we'll need the
dashboard-app/src/components/Heatmap.js component with the following source code. Let's break down its contents!
First, we're loading data to the front-end with a convenient Cube hook:
To make map rendering faster, with this query we're grouping users by their locations.
Then, we transform query results to GeoJSON format:
After that, we feed this data to Mapbox. With
react-map-gl, we can do it this way:
Note that here we use Mapbox data-driven styling: we defined the
heatmap-weight property as an expression and it depends on the "properties.value":
You can find more information about expressions in Mapbox docs.
Here's the heatmap we've built:
- Heatmap layer example at Mapbox documentation
- Heatmap layers params descriptions
- Some theory about heatmap layers settings, palettes
Dynamic Points Visualization
The next question was: is there any correlation between Stack Overflow users' locations and their ratings? 2️⃣
Spoiler alert: no, there isn't 😜. But it's a good question to understand how dynamic data loading works and to dive deep into Cube filters.
We need to tweak the
schema/User.js data schema to look like this:
Also, we'll need the
dashboard-app/src/components/Points.js component with the following source code. Let's break down its contents!
First, we needed to query the API to find out an initial range of users reputations:
Then, we create a
Slider component from Ant Design, a great open source UI toolkit. On every chnage to this Slider's value, the front-end will make a request to the database:
To make maps rendering faster, with this query we're grouping users by their locations and showing only the user with the maximum rating.
Then, like in the previous example, we transform query results to GeoJSON format:
Please note that we've also applied a data-driven styling at the layer properties, and now points' radius depends on the rating value.
When the data volume is moderate, it's also possible to use only Mapbox filters and still achieve desired performance. We can load data with Cube once and then filter rendered data with these layer settings:
Here's the visualization we've built:
Points and Events Visualization
Here we wanted to show the distribution of answers and questions by countries, so we rendered most viewable Stack Overflow questions and most rated answers. 3️⃣
When a point is clicked, we render a popup with information about a question.
Due to the dataset structure, we don't have the user geometry info in the
We need to add the following code to the
Then, we need to have the
dashboard-app/src/components/ClickEvents.js component to contain the following source code. Here are the most important highlights!
The query to get questions data:
Then we use some pretty straightforward code to transform the data into geoJSON:
The next step is to catch the click event and load the point data. The following code is specific to the
react-map-gl wrapper, but the logic is just to listen to map clicks and filter by layer id:
When we catch a click event on some point, we request questions data filtered by point location and update the popup.
So, here's our glorious result:
Finally, choropleth. This type of map chart is suitable for regional statistics, so we're going to use it to visualize total and average users’ rankings by country. 4️⃣
To accomplish this, we'll need to complicate our schema a bit with a few transitive joins.
First, let's update the
The next file is
schema/Mapbox.js, it contains country codes and names:
schema/MapboxCoords.js which, obviously, hold polygon coordinates for map rendering:
Please note that we have a join in
And another one in
With the Stack Overflow dataset, our most suitable column in the
Mapbox table is
geounit, but in other cases, postal codes, or
iso_a2 could work better.
That's all in regard to the data schema. You don't need to join the
Users cube with the
MapboxCoords cube directly. Cube will make all the joins for you.
The source code is contained in the
dashboard-app/src/components/Choropleth.js component. Breaking it down for the last time:
The query is quite simple: we have a measure that calculates the sum of users’ rankings.
Then we need to transform the result to geoJSON:
After that we define a few data-driven styles to render the choropleth layer with a chosen color palette:
And that's basically it!
Here's what we're going to behold once we're done:
Looks beautiful, right?
The glorious end
So, here our attempt to build a map data visualization comes to its end.
We hope that you liked this guide. If you have any feedback or questions, feel free to join Cube community on Slack — we'll be happy to assist you.
Also, if you liked the way the data was queries via Cube API — visit Cube website and give it a shot. Cheers! 🎉