We recently hosted a live webinar — How Starbucks Forecasts Demand at Scale with Facebook Prophet and Databricks — During this webinar we learnt why Demand Forecasting is critical to Retail/ CPG firms and how it enables 22 other use cases. Brendan O'Shaughnessy, Data Science Manager at Starbucks walked us through how Starbucks does demand forecasting at scale. We also did a step by step demo on how to perform fine-grained demand forecasts on a day/store/SKU level with Databricks and Facebook's Prophet
Slide deck for webinar available here.
Why Granular Demand Forecasting and How Starbucks does it?
Performing fine-grained forecasts on day-store-SKU is beyond the ability of legacy, data warehousing based forecasting tools. Demand for products varies by product, store and day, and yet traditional demand forecasting solutions perform their forecasts at the aggregate market, week and promo group levels.
With the introduction of the Databricks Unified Data Analytics Platform, retailers are able to see double-digit improvements in their forecast accuracy. They can perform fine-grained forecasts at the SKU, store and day as well as include hundreds of additional features to improve the accuracy of models. They can further enhance their forecasts with localization and the easy inclusion of additional data sets. And they're running these forecasts daily, providing their planners and retail operations team with timely data for better execution.
In this webinar, we reviewed:
- How to perform fine-grained demand forecasts on a day/store/SKU level with Databricks
- How to forecast time series data precisely using Facebook's Prophet
- Also, how Starbucks does custom forecasting with relative ease
- How to train a large number of models using the defacto distributed data processing engine, Apache Spark™
- Finally, we then presented this data to analysts and managers using BI tools to enable the decision making required to drive the required business outcomes
At the end of the webinar, we held a Q&A. Below are the questions and answers:
Q: What model versioning techniques do you apply to show how models are being improved over time?
Many of our customers use MLflow to track their experiments. They can use MLflow to track various parameters associated with these models and compare performance metrics across models. This is helpful in tracking improvements as well as libraries they are using to draw insights. MLflow helps take these models from experimentation to production faster.
Q: Why use UDFs instead of MLlib? Is this in order to access SciKit learn models?
We are using UDFs so we have the flexibility to leverage any number of libraries. Facebook Prophet is very popular right now, but there are numerous libraries we can use for time series. Some are more appropriate in some scenarios than others. So by using UDFs, we get ultimate flexibility while still leveraging parallelization.
Q: How does Delta Lake help with Demand Forecasting?
There are a lot of questions around if I am going to go big, how much is this going to cost me? One thing we clearly want to do is take advantage of the cloud and leverage those resources, run our forecasts at scale as quickly and aggressively as possible. And then when we want to release those resources back to the cloud provider, so we are not paying for that. When I do that, what do I do with my forecasts? I don't want to lose the insights that I draw from running the models. Those results are in a data frame, which means they ultimately reside in memory. So what we do is, we persist that data and store it. Our preferred format is Delta Lake. Delta Lake is going to allow me to quickly interact with this data and open it up as a table. By persisting that data, I now have the option to bring a scaled-down cluster to that data, to allow for interactive query. I can use BI tools to make these models available to store or distribution managers.
Q: Facebook's Prophet is a good solution for seasonal time series. How about non-seasonal time series? How is forecasting accuracy determined?
I agree Facebook Prophet works well with seasonal data. With UDFs you can use ARIMA and other common libraries as well. You could also try RMSE and other techniques to figure out which works better for you. Prophet comes with its own tools to determine accuracy as well.
In our blog post, the information that Bilal demoed is carefully documented. In the post, we create a second UDF, where we calculate evaluation metrics. You can use any number of ways to evaluate this and bring them back for consideration as you look at your forecast results.
Additional Retail/CPG and Demand Forecasting Resources
- Sign-up for a free trial and download these notebooks to start experimenting:
- Take a self-guided tour of our Demand Forecasting resources.
- Read our recent blog Fine-Grained Time Series Forecasting At Scale With Facebook Prophet And Apache Spark to learn how Databricks Unified Data Analytics Platform addresses challenges in a timely manner and at a level of granularity that allows the business to make precise adjustments to product inventories
- Download our Guide to Data Analytics and AI at Scale for Retail and CPG
- Visit our Retail and CPG page to learn how Dollar Shave Club and Zalando are innovating with Databricks