Nate is a Data Architecture and ML Engineering consultant at Accenture. He leads the design and technical delivery of complex ML applications. With his background in productionizing research applications, he helps enterprise clients develop their playbook to transition from promising research results to high value industrialized deployments.
May 27, 2021 05:00 PM PT
The feature store is a data architecture concept used to accelerate data science experimentation and harden production ML deployments. Nate Buesgens and Bryan Christian describe a practical approach to building a feature store on Delta Lake at a large financial organization. This implementation has reduced feature engineering "wrangling" time by 75% and has increased the rate of production model delivery by 15x. The approach described focuses on practicality. It is informed by innovative approaches such as Feast, but our primary goal is evolutionary extensions of existing patterns that can be applied to any Delta Lake architecture.
- Understand the key use cases that motivate the feature store from both a data science and engineering perspective.
- Consider edge cases where there may be opportunities for simplification such as "online" predictions.
- Review a typical logical data model for a feature store and how that can be applied to your business domain.
- Consider options for physical storage of the feature store in the Delta Lake.
- Understand common access patterns including metadata-based feature discovery.
May 27, 2021 12:10 PM PT
Delivering AI solutions at an enterprise scale involves bringing together a variety of perspectives to augment an organization's data science capabilities. Accenture describes how to industrialize your ML applications to accelerate model delivery and optimize the data science workflow while maintaining the highest standards of model governance. Participants will take away an understanding of how ML Engineering can help you industrialize your ML pipelines and what is involved in the management of the end-to-end ML pipeline (not just model management). We will demonstrate how the Databricks Lakehouse platform is an ideal tool for delivering ML pipelines at scale.
June 23, 2020 05:00 PM PT
The 'feature store' is an emerging concept in data architecture that is motivated by the challenge of productionizing ML applications. The rapid iteration in experimental, data driven research applications creates new challenges for data management and application deployment. These challenges are complicated by production ML pipelines with interdependent modeling and featurization stages. Large tech companies have published popular reference architectures for 'feature stores' that address some of these challenges, and an active open source ecosystem provides a full workbench of power tools. Still, the abstract role of the feature store can be a barrier to implementation. We demonstrate an implementation of a feature store as an orchestration engine for a mesh of ML pipeline stages using Spark and MLflow. This is broader than the role of a metadata repository for feature discovery. The metadata in a feature store allows us to break the unit of deployment down to the level of the ML pipeline stage so that we can break the anti-pattern of 'clone and own' ML pipelines. We isolate concerns of pipeline orchestration and provide tooling for deployment management, A/B testing, discovery, telemetry and governance. We provide novel algorithms for pipeline stage orchestration, data models for feature stage metadata, and concrete systems designs you can use to create a similar feature store using open source tools.