Nate is a Data Architecture and ML Engineering consultant at Accenture. He leads the design and technical delivery of complex ML applications. With his background in productionizing research applications, he helps enterprise clients develop their playbook to transition from promising research results to high value industrialized deployments.
Delivering AI solutions at an enterprise scale involves bringing together a variety of perspectives to augment an organization's data science capabilities. Accenture describes how to industrialize your ML applications to accelerate model delivery and optimize the data science workflow while maintaining the highest standards of model governance. Participants will take away an understanding of how ML Engineering can help you industrialize your ML pipelines and what is involved in the management of the end-to-end ML pipeline (not just model management). We will demonstrate how the Databricks Lakehouse platform is an ideal tool for delivering ML pipelines at scale.
[daisna21-sessions-od]
The 'feature store' is an emerging concept in data architecture that is motivated by the challenge of productionizing ML applications. The rapid iteration in experimental, data driven research applications creates new challenges for data management and application deployment. These challenges are complicated by production ML pipelines with interdependent modeling and featurization stages. Large tech companies have published popular reference architectures for 'feature stores' that address some of these challenges, and an active open source ecosystem provides a full workbench of power tools. Still, the abstract role of the feature store can be a barrier to implementation. We demonstrate an implementation of a feature store as an orchestration engine for a mesh of ML pipeline stages using Spark and MLflow. This is broader than the role of a metadata repository for feature discovery. The metadata in a feature store allows us to break the unit of deployment down to the level of the ML pipeline stage so that we can break the anti-pattern of 'clone and own' ML pipelines. We isolate concerns of pipeline orchestration and provide tooling for deployment management, A/B testing, discovery, telemetry and governance. We provide novel algorithms for pipeline stage orchestration, data models for feature stage metadata, and concrete systems designs you can use to create a similar feature store using open source tools.
Key Takeaways: