customer story
Achieving demand forecasting at scale

Leveraging big data and AI to optimize operations across 500,000 stores

INDUSTRY: Consumer packaged goods

SOLUTION: Demand forecasting

TECHNICAL USE CASE: Data ingest and ETL, machine learning

REGION: UK & Europe

As a multinational consumer goods manufacturing company serving millions of retail customers, Reckitt Benckiser Group (RB) struggled with the complexity of forecasting demand, with large volumes of different types of data across many disjointed pipelines. Today, Azure Databricks provides RB with a Unified Data Analytics Platform that enables its data teams to deliver ML-powered insights to the business, improving the support of neighborhood grocery stores through predictive analytics, product placement, and business forecasting.


RB distributes their products to consumers across 60+ countries. One of their key market segments is called traditional trade or neighborhood grocery stores. This market is highly fragmented and consists of millions of small mom and pop stores, mostly in emerging markets in Asia, Africa, and South America. To serve this market, they have a team of over 16,000 reps who visit these stores with the goal of helping store owners select the best products to meet the unique needs of their markets.

Data is one of the most critical assets they have to improve demand forecasting. However, RB struggled with large volumes of different types of data across many disjointed pipelines — making it difficult for them to efficiently extract insights to help the sellers on the streets operate efficiently and drive more business.

  • Process over 2TB of data every data across 250+ data pipelines that are running 24×7
  • Internal business teams (finance, sales, operations) struggle to access and process external data sets such as point of sales, ecommerce, Nielsen, consumer analytics.
  • Hadoop infrastructure proved to be complex, cumbersome, and costly to scale. This legacy system struggled with performance and also in terms of deploying new data sets into it. As a result, the DevOps team was extremely busy monitoring and fixing issues — making it difficult to deliver timely insights.


Azure Databricks provides RB with a Unified Data Analytics Platform that has fostered a scalable and collaborative environment across data science and engineering, allowing data teams to more quickly innovate and deliver ML-powered insights to the business.

  • Fully managed platform with automated cluster management simplifies the infrastructure and operations at any scale.
  • Collaborative notebook environment with support for multiple languages (SQL, Scala, Python, R) enables a diverse team of users to work together in their preferred language.
  • Native support for Delta Lake allowed them to compress their data sets, greatly improving cost optimization and storage space.


With Databricks, RB has seen significant performance gains and cost management improvements which have allowed them to scale their business and uncover new opportunities faster.

  • Improved cost optimization: Able to leverage Delta Lake to compress their data from 80TB to about 2TB of data which greatly improved cost management while also accelerating pipelines for downstream analytics.
  • Faster time-to-insight: Databricks has helped reduce pipeline performance — accelerating the running of 24×7 jobs by 2x (from 24 hours to 13 hours to run all of their pipelines). This has allowed them to greatly reduce DevOps costs while allowing these resources to focus on additional use cases.
  • Increased marketshare: With the support of Databricks, RB has increased its ability to support its customers by over 10x. Before Databricks, their maximum capacity was around 45,000 stores. With Databricks, they are quickly scaling to nearly 500,000 stores.
  • 10x
    Increased capacity to support business volume
  • 98%
    Data compression from 80TB to 2TB, reducing operational costs
  • 2x
    Faster data pipeline performance for 24×7 jobs

Databricks is the key enabler for us to experiment fast and then scale quickly — that’s how the platform is adding value to the business and helping us grow.”

– Atif Ahmed, Director of Advanced Analytics, RB

Related Content

Technical Talk at Spark + AI Summit EU 2019