Avrilia is a senior scientist at Microsoft’s Gray Systems Lab (GSL).
Her research broadly lies in the area of data management with a recent focus on machine learning model management and large-scale stream processing. Her current work attempts to simplify the data science lifecycle by automating some of the tasks that data scientists perform manually today. She also works on system problems that arise at very large-scale such as improving the performance of complex streaming pipelines as well as the resource utilization of cloud deployments. She actively contributes to the design of the Dhalion library which has been used to efficiently tackle some of the above problems in production. Avrilia has also made open-source contributions to Apache Heron (as committer) and to MLflow.
Avrilia received her Ph.D. in Computer Science from the University of Wisconsin-Madison. Before joining Microsoft, she spent 3 years at IBM Almaden Research Center working on SQL-on-Hadoop engines and natural language interfaces for databases.
The data science lifecycle consists of multiple iterative steps: data collection, data cleaning/exploration, feature engineering, model training, model deployment and scoring among others. The process is often tedious and error-prone and requires considerable human effort. Apart from these challenges, when it comes to leveraging ML in enterprise applications, especially in regulated environments, the level of scrutiny for data handling, model fairness, user privacy, and debuggability is very high. In this talk, we present the basic features of Flock, an end-to-end platform that facilitates adoption of ML in enterprise applications. We refer to this new class of applications as Enterprise Grade Machine Learning (EGML). Flock leverages MLflow to simplify and automate some of the steps involved in supporting EGML applications, allowing data scientists to spend most of their time on improving their ML models. Flock makes use of MLflow for model and experiment tracking but extends and complements it by providing automatic logging, deeper integration with relational databases that often store confidential data, model optimizations and support for the ONNX model format and the ONNX Runtime for inference. We will also present our ongoing work on automatically tracking lineage between data and ML models which is crucial in regulated environments. We will showcase Flock's features through a demo using Microsoft's Azure Data Studio and MLflow.