Digital technologies are rapidly transforming the way retailers engage with customers and differentiate their brands. To help reailers to stand out from the crowd, Flipp has emerged as a one-stop online marketplace that aggregates weekly shopping circulars so that consumers get the best deals in their area, without having to clip any actual coupons. Their primary data sources — content from retail partners and user-generated behavior — were siloed, making it difficult for data to be wrangled and made sense of. After the implementation of the Databricks Lakehouse Platform, Flipp’s data teams were finally able to access and democratize their data, enabling them to do their jobs more easily and effectively as well as bring better deals to users and more meaningful insights to partners.
As retailers continue to evolve into e-commerce companies, they’re looking for new opportunities to engage customers where they spend the most time — their devices. Enter Flipp, an innovative technology company on a mission to put a new twist on traditional coupons and advertising to boost customer engagement and conversion. Their platform aggregates weekly circulars from retailers to attract users into retail locations.
Flipp’s primary data sources are the content they receive from retail partners and user-generated behavior, such as how shoppers respond to different ads. Unfortunately, this data was unstructured, hard to access and almost impossible to work with. For example, each individual user-generated event had to be logged and parsed, and it took an exceptionally long time to clean and respond to alerts due to bad data crashing the pipelines.
To make things even more complicated, when it came time to do data analysis, data teams were using spreadsheets to keep track of their experiments. “Spreadsheets helped keep track of progress but were incredibly tedious,” explained Jake Greene, Flipp’s staff data scientist. “Imagine entering in every parameter and every value for that parameter per line. And without standardization across the different projects, it was very hard to compare work that was being done by different data scientists.”
In fact, Flipp’s old approach was so much work that data scientists wouldn’t track the experiment at all, causing a posterity problem: Without a record of what had already been tried, future data scientists were likely to repeat the same experiments or do similar work.
Flipp is utilizing a lakehouse data architecture on Databricks and Delta Lake, which enables them to get data into the system fast, refine it, and deliver it to multiple audiences depending upon their use cases. Business analysts and sales teams can see how partners and customers are progressing, data science teams can build powerful predictive analytics for their recommendation engines, and engineering teams can create new product features that delight and engage customers. The lakehouse approach makes all the data in the data lake available to these groups for both regular reporting and for ad hoc investigations.
With analytic workloads running quickly and smoothly off Delta Lake and visualized with Tableau through reports and dashboards, things quickly began falling into place. “The first and most stark difference was the rapid access to data,” said Greene. “It’s now extremely easy to directly access any kind of data I could ever want using very simple Apache Spark™ APIs.”
In the past, Flipp’s limited access to their own data meant they were only able to pull high-level insights for dashboard solutions, automated PDFs, etc. Without easy access to raw data, more complex analysis was impossible. With Databricks, data engineers and analysts have a common platform to collaborate and iterate faster on the development lifecycle. Analysts can consume data from any layer in Delta Lake to create new data products, and they’re able to deploy directly to Flipp’s internal reporting tool: Tableau.
With access to all their data, the data science team can also deliver models to power their personalization solutions designed to provide real-time offers to consumers in order to attract users into retail locations.
Meanwhile, MLflow has made it incredibly easy for Flipp’s data teams to track experiments automatically, including the different iterations as they’re executed and how they track against defined metrics.
Today, Flipp’s data teams can easily perform both regularly cadenced reporting on their efforts as well as complex, ad hoc exploration through Tableau. They use the Extract feature to provide a set of dashboards for ongoing performance monitoring and the Live Connection feature to drive collaboration between the business development team and data analysts for answers to ad hoc questions.
Bringing Delta Lake and Databricks to Flipp has enabled them to democratize the data they were already collecting, allowing them to not only do their jobs more easily and effectively but also bring better deals to Flipp users and more meaningful insights to partners.
As for future usage of Databricks, the data teams at Flipp have high hopes as they continue to identify new use cases and workloads to run off Databricks. “As anyone in our domain knows, some of the greatest challenges we face in machine learning are reproducibility and model serving,” said Greene. “Databricks provides us with the foundation to efficiently unify data with our analytical needs to meet our business requirements more easily.”
Bringing Databricks to Flipp has enabled us to democratize data more effectively, not only making our jobs easier, but also bringing better deals to our customers and more meaningful insights to our partners.”
– Guenia Izquierdo, Data Engineering Manager, Flipp