Helping brands grow through real-time, actionable insights
To process 5 years of data
(reduced from several months)
Receipts processed per month
Across industries from retail to media and entertainment, it is essential to make intimate connections with customers to drive engagement and conversion. Through the use of analytics and AI, the biggest brands in the world are delivering more personalized experiences that delight their customers. A global business with more than 27,000 associates, Kantar is a trusted partner to more than half of the Fortune 500, providing data-driven insights to help clients understand people, shape the future and inspire growth. In 2020, Kantar turned to Databricks to develop a new data processing platform (DPP) for its Worldpanel Plus research service, which translates shopping data from large samples of consumers into actionable insights for their clients. Replacing three architectures that had been created in-house, the Databricks Data Intelligence Platform on Azure has transformed the way Kantar realizes its vision of helping brands grow through meaningful data and offering strategic brand advisory at speed and scale.
High-speed, high-volume data processing to meet client demands
On a daily basis, Kantar’s Worldpanel Plus research service collects data from more than 110,000 consumer purchase receipts through its dedicated Shoppix app, equating to 3 million+ receipts per month. Each receipt is scanned through an optical character recognition (OCR) system before reaching the data processing platform as structured, uncleaned data, together with demographics for each customer. This data is then transformed into information that represents the purchasing behavior of the UK population across all channels and industries. The main challenge for the team at Kantar was to find a scalable architecture that could handle ever-growing volumes of data, with the ability to solve difficult machine learning problems, as well as the capability to reprocess five years of data should processes change, or for new client briefs. Enter the Databricks Data Intelligence Platform on Azure, which is able to easily handle large-scale volumes, combine multiple data sources, and leverage machine learning and AI technology to deliver clean, accurate data.
Richard Goldsby-West, Global Product Owner at Kantar, commented, “Our previous platform based on SQL Server was able to process new receipts but struggled with reprocessing of historical data. One of our key services is providing trends over time, such as market share changes, so the ability to reprocess the last five years’ data consistently and efficiently was crucial to delivering these insights. With the Databricks Data Intelligence Platform, we are able to process five years of data in two days instead of potentially taking several months.”
Data engineering challenges solved in days to meet business requirements
According to André Gabriel Garrido, Head of Software Development at Kantar, there is no business requirement that cannot be answered now that the Databricks Data Intelligence Platform is in place. “It’s so simple and easy to set up workloads, and we’ve been able to respond to tight SLAs. Out-of-the-box functionalities such as Unity Catalog have improved our operations significantly. We now have more control on permissions and the flow of data. It’s enabled us to conduct thorough monitoring and auditing of our entire data estate, encompassing multiple data domain. Leveraging Databricks SQL Serverless helps us have the right clusters for the right workloads, so we can be more focused on producing the code and adding value to the business, instead of concentrating on the infrastructure. And it helps speed up development. As more serverless features become available, the team plans to take advantage of them.”
Garrido added that Delta Sharing has given Kantar the ability to share data to specific clients regardless of platform or cloud, so that clients can use the tools of their choice to consume the data. This has allowed Kantar to easily create new ways to deliver data to its clients. The process took just one week to deploy.
Streaming will become even more prevalent for Kantar in the future. Garrido confirmed that Delta Live Tables makes streaming simple and workflows more visible, as users are able to see the workflow in the UI. The data team is able to see all the connections and how the data is flowing between the server models. All these new functionalities ensure a very simple platform that allows users to see exactly what has been built. He also added that streaming can be switched on and off when needed, which keeps costs contained.
The lakehouse architecture was fundamental to Kantar’s success during the COVID-19 pandemic, when consumer shopping behavior suddenly became erratic. With enormous changes in patterns, such as panic and bulk buying, Kantar’s clients were in need of real-time daily reporting instead of the usual four-weekly insights. Kantar was able to turn this around immediately, responding to its clients’ demands, and this was made possible thanks to Databricks. Garrido commented, “During this time, it was extremely easy to create these new reports. With Databricks, there are no boundaries with what we can do with our data.”
Improved machine learning speed, visibility and collaboration
Kantar’s SAS system was too rigid, complex and costly to maintain and scale — machine learning models were simply not possible, so the introduction of MLflow with the Databricks architecture answered the company’s complex ML needs. Ana Portelo, Lead Data Scientist at Kantar, explained, “MLflow is extremely straightforward when it comes to model development and deployment. When developing a model, we need to carry out experiments to test our hypothesis. MLflow stores all the data, models and parameters for each experiment in a single location. After running all experiments, we can rapidly reproduce them and deploy the chosen model. Databricks Model Serving allows our teams to move much faster.”
Kantar worked closely with the software development team at Databricks to implement the new platform and ML capabilities. Databricks was there to support Kantar whenever a new challenge arose. Portelo confirmed that the Kantar data teams were able to look at the same code with minimal effort using notebooks, which also improved collaboration and efficiency.
Goldsby-West added, “A significant benefit has been the visibility of code within notebooks to all platform users. End users can go in and see the actual code running in the notebook that stores the cleaning rules, for example, so they can easily understand the choices that were made. They can see the real logic that has run on that data. It’s hugely beneficial, as the documentation is the code, so you know it’s right, and it’s easier to suggest changes or improvements to the rules.”
A blueprint architecture for a secure data-driven future
Implementation of the Databricks Data Intelligence Platform has brought new levels of collaboration, agility and value to Kantar data teams, including the ability to reprocess five years of data in two days rather than in months. The new DPP lakehouse architecture is considered so robust that it has become a blueprint for the whole of the Kantar Group.
Looking to the future, Dan Kinneally, Business Operations Director at Kantar, commented, “We would look to duplicate this approach for new research panels going forward. The Databricks Data Intelligence Platform is cost-effective, scalable and future-proof, with the ability to leverage automation and machine learning.” He added that security and compliance will continue to be a focus as Kantar creates future data-as-a-service solutions for its clients, which will be assured thanks to the first-class security and governance capabilities of the Databricks Data Intelligence Platform.