Speeding Up Query Performance and Managing Massive Amounts of Data

IndustryManufacturing

Employees4,000

StackDatabricks, Highcharts

Use Cases Embedded Analytics

This Fortune 500 Manufacturing Company is a leader in high-quality affordable bottled beverages with production facilities throughout the United States. The dedication to operational improvements, a consistent recipe of hard work, integrity, attention to detail, and the relentless quest for perfection has driven the company to become one of the largest bottled beverage companies in the United States.

This dedication to operational improvements led the team, including their Application Manager, to dream of building the most advanced “internet of things” sensor-based warehouse in the US and maybe in the world. Three years ago, they began implementing sensors or “tags” in all the shop floor equipment. With 1000 or so tags, they started collecting data and moving it to the cloud to visualize it. However once the team realized the value of this data, they wanted to expand it to more and more devices in the plant - and expand to all their plants across the network.

“That's when we realized that the amount of data which was streaming to the cloud is way, way, way more than what we expected.”

Traditional BI tools can’t handle the large data load

Originally, they had used Power BI on top of their Databricks lakehouse to visualize this data, but at some point it was not able to handle the volume and visualizations would not even render, not to mention the speed and latency of the analytics produced.

The second challenge was that the cost of storing and processing all that data was much higher than what they were willing to spend. “We had to change technologies not only on the compute front, but also on the storage front.”

Embedded Analytics for an internal use case

The team started by doing a proof of concept (POC) with both GoodData and Cube Cloud looking at how the semantic layer can deliver data to a custom front end they would build in HighCharts.

Cube was able to connect to Highcharts and significantly improve the speed of queries with pre-aggregations. But the company had 40+ plants that wanted to use these analytics tools and each plant has a significant amount of data. Running that amount of data through one production cluster will eventually fail and creating a separate production cluster for each individual plant essentially creates as many data models, environments, endpoints and REST API URLs as there are plants - an inefficient and error prone approach.

But Cube has the ability to help the company scale horizontally with Cube’s multi-cluster deployment capability that allows them to spin-up separate Cube API instances and separate Cube Store infrastructure within just one deployment. Many production clusters within one deployment sharing one REST API URL, one data model, and one environment. The company even has control over how they bucket the data, choosing which plants will be in which bucket.

Cube Cloud delivers game-changing support

GoodData’s technology was not able to match Cube’s pre-aggregation or multi-cluster functionality. But the Application Manager adds, “The real game changer was the support provided by the Cube team in this evaluation period. That kind of support we never got from GoodData. The support from Cube Cloud was tremendous.”

The company noted that the support team also proactively reaches out to help optimize queries that seem to be taking longer than expected. “This kind of engagement we typically don't get from other vendors.”

Innovation needed to scale to 50 plants and half-a-million sensors

Today, the company has half a million tags across its 50 plants. This includes all the motors, pumps, chillers, palletizers, packers, and every single piece of equipment in the plant.

“The plant teams love the data and we’ve seen that they don't even want to wait until a new plant is settled before starting to get it. They want to go live as soon as possible and start collecting and visualizing the data from day one.” - Application Manager

The path to rolling out their data stack to all of their plants was not always straightforward because of the massive volume of data that they were collecting and analyzing.

Databricks and Cube managing massive amounts of data

With the help of the Cube team, the company settled on an architecture consisting of Databricks for their core lakehouse storage and data processing, Cube Cloud for their semantic layer, security, pre-aggregations and APIs and Highcharts for visualizations.

This pattern allows them to handle the immense amount of data generated by their plant monitoring, retain historical data as efficiently as possible, but also feed their reports and alerts the real-time data critical to detecting potential device failures and production interruptions.

Big wins for the data team

As the company has rolled out Cube Cloud and their data platform to more and more plants, they have started to see tangible results.

Getting ahead of catastrophe

Recently, there have been several cases where the analytics system alerted the team that a specific motor would likely break down in the next 48 hours. Since the plant managers could see this data, they were able to successfully alert the maintenance team and avoided the downtime that a failing motor would have caused.

Cost savings with conditional-based maintenance

Prior to Cube Cloud and their data solution, maintenance was calendar-based. A motor would be inserted every 30 days irrespective of whether it is needed or not. But now they have enough data to identify that a motor was still at 50% efficiency and replacing it can be delayed - offering huge cost savings across 50 plants and thousands of motors.

Granular data delivers big results

With one expensive motor, the data analysis recognized a vibration and knew exactly what the cause was. It was a small issue to fix, only a couple of $100s for some parts. But if they hadn’t had the data to analyze the issue and if these small parts had failed, they would have had to replace the enter motor which is at least $100k. Being able to recognize the issue and address it quickly - before the entire motor was affected - meant the company saved a potential loss of $100k+ as well as the lost productivity in the plant.

Right place, right technology, right partner

“This was an exciting journey with a great deal of changes in the technologies. What we started with, we no longer use. The landscape has changed but right now I feel we are in the right place with the right technologies and right partner in Cube”

The company plans to expand their plants by at least three to five plants a year. And apart from growing the number of plants, they also add lines to the existing plants. “The data which we are collecting right now will exponentially increase - there is no way we are going to stagnate,” The Application Manager said, “I have high confidence that the Cube team will be able to fully support our growth.”