Daniel holds a master’s degree in Computer Science from Federal University of Minas Gerais. He has co-founded Hekima, a data science consulting company, and the company has completed over 100 projects with large brazilian companies in logistics, sentiment analysis, entity deduplication, candidates screening and schedule optimization, among others. He currently works at iFood, LATAM’s largest foodtech company, as the technical leader of the ML Platform team.
iFood is the largest food tech company in Latin America. We serve more than 26 million orders each month from more than 150 thousand restaurants. As such, we generate large amounts of data each second: what dishes were requested, by whom, each driver location update, and much more. To provide the best possible user experience and maximize the number of orders, we built several machine learning models to provide accurate answers for questions such as: how long it will take for an order to be completed; what are the best restaurants and dishes to recommend to a consumer; whether the payment being used is fraudulent or not; among others. In order to generate the training datasets for those models, and in order to serve features in real-time so the models' predictions can be made correctly, it is necessary to create efficient, distributed data processing pipelines. In this talk, we will present how iFood built a real-time feature store, using Databricks and Spark Structured Streaming in order to process events streams, store them to a historical Delta Lake Table storage and a Redis low-latency access cluster, and how we structured our development processes in order to do it with production-grade, reliable and validated code.