CUSTOMER
STORY

Transforming data to elevate food deliveries and experiences

67%

Reduction in processing and storage costs

70%

Reduction in maintenance efforts

30%

Less coding time

Watch video

Additional use cases

Feeding Brazil and delivering a brighter future

Product descriptions:

Lakeflow Spark Declarative Pipelines

iFood, a leading food delivery platform in Brazil, operates at a staggering scale: 300,000 drivers, 55 million users and 350,000 restaurant partners. This ecosystem generates an immense volume of streaming data daily, spanning customer orders, delivery logistics and app interactions. With a wide range of products and platforms collecting data across the company, iFood processes billions of data records daily for analysis and modeling to support strategic business decisions. With ambitions to expand and enhance their offerings, iFood needed a more reliable, scalable and efficient data platform to meet the growing demand for real-time insights and innovation.

Data silos and complex architecture stifled agility and accuracy

Prior to Databricks, iFood grappled with a fragmented data architecture that hindered their ability to scale effectively. The company relied on a complex infrastructure with multiple systems to manage vast amounts of users’ journey data — comprising billions of records from various sources, including the company’s order management system, consumer app and driver app. As the ecosystem grew and new business models emerged, the volume and variety of data grew, too, further complicating data processing. With these datasets spread across disparate systems, consolidating and accessing critical information quickly and reliably became a considerable challenge.

This fragmentation led to significant operational inefficiencies, as iFood’s teams struggled to keep track of scattered event data. With the data spread across numerous systems and lacking proper governance, troubleshooting and ensuring data accuracy became a time-consuming and error-prone process. iFood’s data engineering team faced immense challenges, including the significant engineering effort to manually code, optimize and maintain complex workflows for data processing. The company’s data engineering team previously spent countless hours troubleshooting errors and coordinating with multiple teams to implement even minor changes.

This constant firefighting drained resources, hindered innovation and left engineers with little time for strategic work. The situation was further complicated by iFood’s explosive data growth. What was once a legacy architecture capable of handling 100 million events per day was now overwhelmed by a surge to 8 billion–10 billion events daily. As the company began training real-time models to analyze app user journeys and surface actionable insights, low latency at scale became a critical requirement.

Streamlining pipelines, reducing maintenance and enabling real-time insights

Spark Declarative Pipelines proved to be a game changer for iFood. Spark Declarative Pipelines enabled iFood to shift to a declarative approach for pipeline development. Engineers could now describe their desired transformations in simple code, allowing Spark Declarative Pipelines to automatically handle the operational complexity behind these pipelines, including execution, scaling and monitoring. “So far we’ve reduced coding time by approximately 30% using the declarative approach, allowing us to build pipelines significantly faster than before,” said Thiago Julião, Data Architecture Specialist at iFood. This transition also simplified and consolidated iFood’s data architecture, reducing the number of tables from nearly 4,000 to just 100. This reduction also made governance more manageable and laid the foundation for improved data quality.

Before Spark Declarative Pipelines, iFood’s pipelines were hindered by out-of-memory errors during high-volume event ingestions, leading to frequent driver shutdowns and operational disruptions. These recurring failures required constant attention from the data engineering team. However, since implementing Spark Declarative Pipelines in production, the transformation has been remarkable.

“With Spark Declarative Pipelines, we gained greater ease in tracking the user’s journey in the application while ensuring high performance in data usage by consumer teams — this was a game changer in our process,” said Maristela Albuquerque, Data Manager, at iFood.

iFood’s unified data architecture

iFood’s technical architecture is now designed to process streaming data at an immense scale, ensuring efficiency, governance and scalability. Here’s a detailed breakdown of the architecture and how its components work together to handle 10 billion daily events while delivering real-time insights.

The data pipeline starts with real-time ingestion of events from iFood’s ecosystem, including the consumer app, delivery driver app and partner portal. These events flow through Amazon Kinesis queues, where approximately 10 billion records are ingested daily.

The ingestion pipeline, powered by Spark Declarative Pipelines, enables real-time data ingestion with scalability, resilience and quality. By adopting Spark Declarative Pipelines, iFood reduced ingestion latency from hours to seconds — ensuring low-latency data availability critical for real-time analytics, such as training models and extracting insights during the user journey without delays. “Pipelines now run error-free, delivering reliable performance even under the heaviest workloads,” said Julião. “The shift from frequent errors to near-zero issues moving to Spark Declarative Pipelines has not only improved operational efficiency but also freed up our team to focus on strategic initiatives instead of firefighting. We’ve reduced data pipeline maintenance efforts by about 70% by consolidating all pipelines to Spark Declarative Pipelines.”

iFood’s architecture leverages a structured medallion approach to manage massive data volumes effectively. The Bronze layer consolidates data from various platforms into a single table per product, using a predefined schema partitioned by processing date. Acting as a staging zone, it ensures extended data retention compared to the message queues.

In the Silver layer, iFood applies Spark Declarative Pipelines expectations and quality rules to validate the data. By replacing traditional partitioning with Liquid Clustering, all events for a product are consolidated into a single table, significantly improving performance and usability. This optimization allows iFood to manage massive datasets — such as their largest table, which spans 210TB and 800 billion records — while maintaining high data quality and governance. “Previously, managing two separate environments required constant communication across teams, making even small changes challenging. Now, with everything under our control, the process is streamlined and more efficient,” said Gabriel Campos, Head of Data and AI at iFood.

The new architecture has enabled iFood to centralize data governance and enhance data quality without compromising performance or usability. Additionally, it has significantly reduced processing and storage costs, achieving a 67% cost reduction — cutting expenses from tens of thousands to just thousands of dollars per month.

Architecture comparison

With Spark Declarative Pipelines automating and streamlining data pipeline management, iFood’s business analysts from the growth and product teams can now effortlessly access data from the Silver layer to create analytical Gold tables, empowering them to generate business-critical insights with ease. This supports user journey analysis and A/B testing on consumer behavior at various stages of their journey, enabling the creation of data-driven strategies to enhance the customer experience across iFood’s ecosystem. For example, the driver app provides critical insights to the logistics team, helping them understand how drivers interact with the app and optimize its usability. These insights allow iFood to fine-tune both consumer-facing and operational processes, ensuring a seamless experience for customers and efficiency for drivers.

iFood plans to further enhance their Databricks implementation by leveraging Databricks Asset Bundles (DABs) for streamlined development and serverless computing for greater flexibility. Upcoming initiatives include implementing column masking for sensitive data in the consumption layer and optimizing table performance with variant type handling for complex data structures like structs and maps.

iFood’s transformation to a modern, unified data architecture has redefined how the company processes and leverages their vast data ecosystem. And by adopting Spark Declarative Pipelines, iFood streamlined their operations, eliminated inefficiencies and established a foundation for real-time insights and enhanced governance. This shift has not only improved the reliability and agility of the company’s data pipelines but also freed up their teams to focus on innovation and delivering value to the business. With a scalable, efficient and future-ready architecture, iFood is now equipped to respond to the demands of a dynamic market while continuing to elevate the customer experience.

Share this post

Details

Ready to get started?

Try Databricks for free Learn more about our product Talk to an expert