Anthony Awuley

Machine Learning Lead, Flashfood

I hold an MSc in computer science from Brock University in Ontario, Canada with a research area in Genetic Algorithms. I co-authored an IEEE CEC publication titled “Feature Selection And Classification Using ALPS Genetic Programming”. I currently work as the lead machine learning engineer at Flashfood and have a huge interest in Big Data and Machine Learning.

Past sessions

Summit 2021 Code Once Use Often with Declarative Data Pipelines

May 26, 2021 03:15 PM PT

Did you know 160,000,000,000 pounds of food ends up in North American landfills each year? Flashfood is helping reduce food waste by providing a mobile marketplace where grocers can sell food nearing its best before date. In 2020 alone Flashfood diverted 11.2 million pounds of food waste while saving shoppers 29 million dollars on groceries.

To operate and optimize the marketplace, Flashfood ingests, processes, and surfaces a wide variety of data from the core application, partners, and external sources. As the volume, variety and velocity of sources and sinks proliferate, the complexity of scheduling and maintaining jobs increases in tandem. We noticed this complexity largely stemmed from different implementations of core ETL mechanics, rather than business logic itself.

We’ve implemented declarative data pipelines following a mantra of ‘code once use often’ to solve for this complexity. We started by building a highly configurable Apache Spark application which is initialized with details of the source, file type, transformation, load destination, etc. We then used Airflow to extend on the DatabricksRunSubmitOperator which allowed us to customize the cluster and parameters used in execution. Finally, we used airflow-declartive to generate DAGs in YAML, enabling us to set configurations, instantiate jobs, and orchestrate execution in a human readable file.

The declarative nature means less specialized personnel are able to set up an ETL with confidence, no longer requiring a deep knowledge of Apache Spark intricacies. Additionally, by ensuring that boilerplate logic was only implemented once, we reduced maintenance and increased delivery speed by 80%.

 

In this session watch:
Anthony Awuley, Machine Learning Lead, Flashfood
Carter Kilgour, Data Engineer, Flashfood

[daisna21-sessions-od]