Subru is a Principal Architect at Microsoft in the GSL team, currently focusing on data science lifecycle automation platforms. Previously at Microsoft, Subru was a Principal Research Engineer working on different aspects of YARN scheduling, specifically scaling it to 50K+ nodes and providing SLA guarantees. The work is a critical driver for the internal Cosmos BigData clusters having scheduled nearly one trillion tasks that manipulated close to a Zettabyte of production data.
Prior to Microsoft, Subru worked at Yahoo! where he contributed to Oozie’s precursor, near real-time stream processing on Hadoop and HBase replication.He is also a member of the Apache Hadoop PMC where he has been actively contributing since 2007 with emphasis on YARN resource management. Subru’s research interests include large scale distributed systems, Systems-for-ML and ML-for-Systems.
The data science lifecycle consists of multiple iterative steps: data collection, data cleaning/exploration, feature engineering, model training, model deployment and scoring among others. The process is often tedious and error-prone and requires considerable human effort. Apart from these challenges, when it comes to leveraging ML in enterprise applications, especially in regulated environments, the level of scrutiny for data handling, model fairness, user privacy, and debuggability is very high. In this talk, we present the basic features of Flock, an end-to-end platform that facilitates adoption of ML in enterprise applications. We refer to this new class of applications as Enterprise Grade Machine Learning (EGML). Flock leverages MLflow to simplify and automate some of the steps involved in supporting EGML applications, allowing data scientists to spend most of their time on improving their ML models. Flock makes use of MLflow for model and experiment tracking but extends and complements it by providing automatic logging, deeper integration with relational databases that often store confidential data, model optimizations and support for the ONNX model format and the ONNX Runtime for inference. We will also present our ongoing work on automatically tracking lineage between data and ML models which is crucial in regulated environments. We will showcase Flock's features through a demo using Microsoft's Azure Data Studio and MLflow.