Machine Learning, Alternative Data, Delta Lake and More: My Picks for Data + AI Summit 2021

Published: May 14, 2021

Events3 min read

Get an early preview of O'Reilly's new ebook for the step-by-step guidance you need to start using Delta Lake.

The Data + AI Summit has become an essential conference for analysts, data scientists, developers, data engineers and data teams across the globe. Once again, I’ve had the pleasure of collaborating with Jules Damji and Jen Aman to put together the agenda for the conference. Built for the data community, Data + AI Summit offers keynotes from leading technologists, hands-on training, 200+ technical deep dives and AMA sessions. Here are just a few of the sessions that I’m looking forward to attending:

Commercializing Alternative Data: Jay Bhankharia (Head of Marketplace Platforms) and Srinivasa Podugu (Head of Marketplace Technology Platforms) of S&P Global explain the end-to-end lifecycle to productize and commercialize alternative datasets at S&P Global Market Intelligence.

Massive Data Processing in Adobe using Delta Lake: Yeshwanth Vijayakumar (Sr. Engineering Manager/Architect at Adobe Experience Platform) describes how the data team built a cost effective and scalable data pipeline using Apache Spark and Delta Lake to manage petabytes of data.

Object Detection with Transformers: Liam Li, who recently completed a PhD in Machine Learning from Carnegie Mellon,dives into cutting-edge methods that use transformers to drastically simplify object detection pipelines in computer vision, while maintaining predictive performance.

Model Monitoring at Scale with Apache Spark and Verta: Manasi Vartak, Founder & CEO at Verta, explains why model monitoring is fundamentally different from application performance monitoring or data monitoring. Attendees will get a deeper understanding of what model monitoring must achieve for batch and real-time model serving use cases.

Real-world Strategies for Debugging Machine Learning Systems: Patrick Hall, Principal Scientist at bnh.ai, introduces model debugging, an emergent discipline focused on finding and fixing errors in the internal mechanisms and outputs of ML models.

FrugalML: Using ML APIs more accurately and cheaply: Lingjiao Chen, PhD Researcher at Stanford University, introduces a principled framework that jointly learns the strength and weakness of each API on different data, and performs an efficient optimization to automatically identify the best sequential strategy to adaptively use the available APIs within a budget constraint.

The Rise of Vector data: Edo Liberty, Founder & CEO of Pinecone, discusses the need for infrastructure for managing high-dimensional vectors. Edo walks through the algorithmic and engineering challenges in working with vector data at scale, and explores open problems we still have no adequate solutions for.

Observability for Data Pipelines with OpenLineage: Julien Le Dem, Co-Founder & CEO of Datakin, discusses Marquez, an open source project that instruments data pipelines to collect lineage and metadata and enable those use cases. Marquez implements the OpenLineage API and provides context by making visible dependencies across organizations and technologies as they change over time.

Becoming a Data-driven Organization with Modern Lakehouse: A year after we formally introduced the lakehouse, we are seeing more companies adopt this exciting data management paradigm. Vini Jaiswal, Customer Success Engineer, explains how you can leverage the Lakehouse platform to make data a part of each business function.

Building an ML Platform with Ray and MLflow: Amog Kamsetty (Software Engineer) and Archit Kulkarni Software Engineer) of Anyscale describe how two open source projects, Ray and MLflow, work together to make it easy for ML platform developers to add scaling and experiment management to their platform.

Scaling Online ML Predictions At DoorDash: Hien Luu, Sr. Engineering Manager at Doordash, describes his journey of building and scaling a Machine Learning platform and, particularly, the prediction service, various optimizations experimented, lessons learned, technical decisions and tradeoffs.

These technical sessions are just a glimpse at what will be covered at Data + AI Summit 2021. Throughout the week, industry leaders will dive into all things AI, MLOps, open source, data use cases and so much more. I’m also incredibly excited about the keynotes we have lined up this year, including:

Bill Inmon
Malala Yoursafzai
Michael Lewis and Charity Dean
Manuela Veloso
Shafi Goldwasser
DJ Patil

What's next?

August 30, 2024/3 min read

Data Warehousing Trends from Data + AI Summit

November 11, 2024/4 min read

Never miss a Databricks post

Sign up

What's next?

Data Warehousing Trends from Data + AI Summit

Azure Databricks at Microsoft Ignite 2024