A Guide to Data + AI Summit Sessions: Machine Learning, Data Engineering, Apache Spark and More
We are only a few weeks away from Data + AI Summit, returning May 24–28. If you haven’t signed up yet, take advantage of free registration for five days of virtual engagement: training, talks, meetups, AMAs and community camaraderie.
To help you navigate through hundreds of sessions, I am sharing some of the content — including deep dives — that I’m excited about.
- Misusing MLflow to Help Deduplicate Data at Scale: Robin Oliva-Kraft and Maya Livshits of Intuit will share MLflow’s efficacy and experimentation beyond the traditional ML model cycle management.
- Consolidating MLOps at One of Europe’s Biggest Airports: Floris Hoogenboom and Sebastiaan Grasdijk, data scientists at Amsterdam Airport Schiphol, will discuss their use of MLflow with Airflow orchestration of training models at scale.
- Building Data Quality Pipelines With Apache Spark™ and Delta Lake: Sandy May, CTO and lead Data Engineer at Elastacloud, will demonstrate how intelligence and insights are garnered from the Delta Lake and less from the warehouse, also known as the Lakehouse pattern.
- Data Discovery at Databricks With Amundsen: Tao Feng and Tianru Zhou, data engineers at Databricks, will show how to peruse and discover your data using open source metadata discovery tool Amundsen.
- Getting Started With Databricks SQL Analytics: Simon Whiteley, a cloud solutions architect, will walk you through a typical data analyst’s journey and introduce easy ways to analyze and query in Lakehouse.
- Tensors Are All You Need: Faster Inference With Hummingbird: Karla Saur and Matteo Interlandi of Gray Systems Lab will dive into building Humming models that achieve 1000x inference speedup on GPUs by converting traditional ML models to tensor-based models (PyTorch and TVM).
- BOTS TESTING BOTS: From Manual to Automated Testing for Conversational AI: Christoph Börner, co-founder of Botium, will tackle questions like: “Why are bots failing?”, “What and how should you test?” and, of course, “How can we automate the testing and training?”
- The Critical Missing Component in the Production ML Stack: Alessya Visnjic, co-founder of Whylabs, shows what logging looks like in ML stack using PyTorch and MLflow.
- How Adobe Uses Structured Streaming at Scale: Yeshwanth Vijayakumar, data architect and engineer at Adobe, will discuss methods — and lessons learned — of ingesting TBs of data per day using Structured Streaming.
- YOLO With Data-Driven Software: Brooke Wenig, practice lead for ML at Databricks, and Tim Hunter, senior AI specialist at ABN AMRO Bank, introduce a new paradigm of data-driven software development.
Below are a few additional picks for developer-focused Apache Spark talks. Use the code JulesDAIS2021 for 25% off pre-conference training!
Deep Dive Into the New Features of Apache Spark™ 3.1
Monitor Apache Spark™ 3 on Kubernetes Using Metrics and Plugins
Efficient Distributed Hyperparameter Tuning With Apache Spark™
The Rise of ZStandard: Apache Spark™/Parquet/ORC/Avro
Project Zen: Making Data Science Easier in PySpark
Grow your knowledge tree by joining 100,000 of your fellow data professionals at Data + AI Summit.