Starbucks serves millions of coffee lovers every day all across the globe. In order to maintain the highest level of customer service, they focus on building lasting customer connections, creating product innovations, and accelerating the digital experience for customers. The key to their success lives within their data and can impact multiple domains, from improving inventory management to personalizing the digital experience. With Databricks, they have put a unified data and analytics infrastructure in place that can be leveraged enterprise-wide, building fast data pipelines at petabytes-scale that allows them to rapidly build ML models that improve inventory management, and unlock new product and service innovations.
Data is crucial at Starbucks. Across 30,000+ stores, they generate billions of transactional data points that can be used to fuel data-driven innovations and operational improvements. Their data strategy and guiding principles are built on three pillars: 1. A single version of the truth; 2. Data and analytics enablement; and 3. Trusted data. However, extracting value from their data was the first and foremost challenge.
With petabytes of data to be ingested for downstream machine learning and analytics, their architecture struggled to handle the scale. They also dealt with a variety of structured and unstructured data that was fast-changing and fragmented across various systems, making it difficult to gain a complete view of the customer and business.
With a huge variety of data sources and types, data reliability and governance was of utmost importance, but difficult to achieve. They needed a way to build out their historical data and live aggregations together to ensure they were delivering real-time, accurate insights to their stores and partners.
They also struggled to provision clusters to support their data needs. Data engineering was often overwhelmed with spinning up and maintaining clusters. “Our engineering services were not optimal,” explained Vishwanath Subramanian, Director of Data Engineering and Analytics at Starbucks. “We struggled to scale compute in a timely manner, often taking over 30 minutes to scale clusters.”
Once the data did make it downstream to the data science and analytics teams, the lack of a unified user experience acted as an impediment to innovation, blocking exploration, experimentations, and reproducibility. To truly create meaningful connections with their customers, they needed to remove these barriers to innovation.
To address these challenges, Starbucks’ developed BrewKit, a zero-friction analytics framework, built on top of Azure Databricks. Their goals were to ensure the democratization of data by creating a single source of truth, while creating an environment that fosters cross-team collaboration to unlock the possibilities of machine learning at scale.
“We wanted to make sure the smallest of teams at Starbucks had the ability to do data science and data engineering at scale,” said Subramanian. “The only way to enable that was to empower them with a massively scalable unified analytics platform.”
With Azure Databricks and Delta Lake, their data engineers are able to build pipelines that support batch and real-time workloads on the same platform. This has enabled their data science teams to blend various data sets to train new models that improve the customer experience. Most importantly, data processing performance has improved dramatically, allowing them to deploy environments and deliver insights in minutes.
From a data science perspective, the interactive notebooks have enabled users to onboard quickly and collaborate more efficiently and more easily manage various use cases. Once models are developed, MLflow allows them to easily experiment and test models in a rapid fashion. “From a data team collaboration productivity standpoint, this has been huge. The tooling has been collaborative. We also now foster a culture of experimentation and self-service, and maintain shared responsibility across environments,” said Subramanian.
With a unified data analytics platform at the core of their data strategy, Starbucks’ entire data strategy has been transformed. Data can now flow seamlessly through their pipelines and models, allowing for new ideas and solutions to flourish. The processing power of Databricks and Delta Lake paired with Azure services has increased performance 50-100x, giving data science and analytics teams the data they need faster.
Delta Lake provides a trusted, persistent storage layer that securely delivers quality data that enables downstream data analytics. This allows them to explore many analytics use cases across the board such as tour operations, quality of service analysis, demand forecasting and inventory management, personalized shopping experiences, and much more.
“With Databricks, we can now take a strategic view into data analytics,” expressed Subramanian. “So much so, that our teams can now focus on business problems up the value chain rather than simply moving data from point A to point B.”
As Starbucks continues to focus on providing world-class customer experiences, Subramanian is excited about the impact Databricks will continue to have in achieving their mission. “At Starbucks, we are elevating customer connections through the convergence of data and AI,” concluded Subramanian. “As we extend our channels for delivery and adjust to the new norms in today’s new era, data will play an extremely crucial role in this effort.”
With Databricks, we can now take a strategic view into data analytics. Our teams can spend time focusing on business problems up the value chain, rather than simply moving data from point A to point B.”
– Vishwanath Subramanian, Director of Data Engineering and Analytics, Starbucks
Champions of Data + AI podcast: Jon Francis, Chief Analytics Officer, Starbucks
Spark + AI Summit NA 2020 Session: Operationalizing Machine Learning at Scale at Starbucks
Spark + AI Summit NA 2020 Session: Operationalizing Big Data Pipelines At Scale