ShopBack is the largest shopping rewards and discovery platform in Asia-Pacific, with a presence in Australia, Indonesia, South Korea, the Philippines, Singapore, Malaysia, Taiwan, Thailand, and Vietnam.
From a small team of six in 2014, we now house over six hundred employees across the region, having scaled to nine countries to provide a more rewarding shopping experience for over 28 million users. Our platform enables users to make better purchase decisions while delivering performance-based marketing with high and measurable return on investment to merchants.
ShopBack continues to innovate, evolve, and create value for our users and merchant partners. In 2020 alone, we powered close to US$3 billion in sales for over 5,000 merchant partners across the Asia-Pacific. To date, we have also awarded over US$200 million of cashback to our users.
For all transactions on our ShopBack sites, we need to analyze all transactions, including purchase value, sales volume, etc. In January 2020, we started working on a new project that required a lot of dashboard reporting of aggregated data for both internal and external (e.g. merchant partners) application users. One of the options considered was building an in-house application and storing the data in a graph database. However, in consultation with Yann AïtBachir (Head of Data at ShopBack), our engineers decided to go with Cube as it was a better fit for our use case as our data was very much relational and will be pre-aggregated into OLAP cubes for analytics.
Since we first started using Cube, there has been an ongoing effort to add new Cube schemas, measures, etc., as our data continued to grow while improving the performance of our queries at the same time. To address performance concerns, we implemented pre-aggregations in early 2021. Before implementing pre-aggregations, our p95 query loading time could take as long as ~50 seconds, which is an insanely bad user experience. Once we started implementing pre-aggregations, we were able to go as low as ~20 seconds p95. After we started further optimizing our pre-aggregations with indexes, modifying our approach to pre-aggregations for unique queries, and a couple of optimizations outside of Cube, we were able to get to below 5 seconds p95.
We still have a few more optimizations planned to reach our target of <1 second p95, but this goes to show that proper application of pre-aggregations can drastically improve the performance of your Cube queries. The performance improvement with pre-aggregations is further demonstrated in the two figures below:
Since we started working with Cube in 2020, we have had nine people working on this project full-time with six engineers, 2 product managers, and 1 QA engineer. In addition, we had several other engineers who helped over the past 1½ year as we implemented Cube at ShopBack.
What you see on the figure above is an architectural diagram of our Cube implementation.
While working with Cube, here are some of the challenges/learnings that we came across:
outletId) helped contribute significantly to speeding up our queries.
The next major Cube project for us is implementing Cube Store to achieve lower latency and higher concurrency. We are looking forward to implementing Cube Store after our region’s shopping campaigns/festivals and once our team has enough bandwidth (you can check out what else our tech team is working on here). From our investigation into Cube Store, we know that we will need to explore some workarounds:
CUBESTORE_AWS_SECRET_ACCESS_KEY, and these are causing unexpected behaviors with our network calls.
Another key benefit of working with Cube is the community, as we believe it is a great example of an active community. I worked with other open source communities before joining ShopBack, and I typically saw the number of questions/issues pile up to a pretty high number. On the other hand, with the Cube community, whenever we have questions, they typically get answered within 24 hours of posting them.
Also, the community members are quite collaborative in terms of helping each other out. I have had a number of direct messages on Slack with other community members to discuss how they did their implementations/deployments. In return, I also try to reply to threads that I feel I can help with.