Productionizing real-time ML models poses unique data engineering challenges for enterprises that are coming from batch-oriented analytics. Enterprise data, which has traditionally been centralized in data warehouses and optimized for BI use cases, must now be transformed into features that provide meaningful predictive signals to our ML models. Enterprises face the operational challenges of deploying these features in production: building the data pipelines, then processing and serving the features to support production models. ML data engineering is a complex and brittle process that can consume upwards of 80% of our data science efforts, all too often grinding ML innovation to a crawl.
Based on our experience building the Uber Michelangelo platform, and currently building next-generation ML infrastructure for Tecton.ai, we’ll share insights on building a feature platform that empowers data scientists to accelerate the delivery of ML applications. Spark and DataBricks provide a powerful and massively scalable foundation for data engineering. Building on this foundation, a feature platform extends your data infrastructure to support ML-specific requirements. It enables ML teams to track and share features with a version-control repository, process and curate feature values to have a single source of centralized data, and instantly serve features for model training, batch, and real-time predictions.
Atlassian will join us to provide first-hand perspective from an enterprise who has successfully deployed a feature platform in production. The platform powers real-time, ML-driven personalization and search services for a popular SaaS application.
Mike Del Balso is the co-founder of Tecton.ai, where he is focused on building next-generation data infrastructure for Operational ML. Before Tecton.ai, Mike was the PM lead for the Uber Michelangelo ML platform. He was also a product manager at Google where he managed the core ML systems that power Google's Search Ads business. Previous to that, he worked on Google Maps. He holds a BSc in Electrical and Computer Engineering summa cum laude from the University of Toronto.
Geoff is a Principal Data Scientist at Atlassian, the software company behind Jira, Confluence & Trello. He works with the product teams and focuses on delivering smarter in-product experiences and recommendations to our millions of active users by using machine learning at scale. Prior to this, he was in the Customer Support & Success division, leveraging a range of NLP techniques to automate and scale the support function. Prior to Atlassian, Geoff has applied data science methodologies across the retail, banking, media, and renewable energy industries. He began his foray into data science as a research astrophysicist, where he studied astronomy from the coldest & driest location on Earth: Antarctica.