Ben Wilson

Practice Lead, Databricks

Ben Wilson is the creator and lead developer of Databricks Labs AutoML. He currently serves as Practice Lead within the Resident Solutions Architects group at Databricks, specializing in Machine Learning Engineering and Data Engineering. Prior to his current role, he was the Data Science architect at Rue Gilt Groupe. His interests are in automation, concurrency, and creating solutions to ease production deployment of ML projects.



AutoML Toolkit – Deep DiveSummit 2020

Tired of doing the same ole feature engineering tasks or tuning your models over and over? Come watch how Databricks Labs is solving this. We will explore how this toolkit automates and accelerates: Feature Engineering/Culling Feature Importances Selection Model Selection & Tuning Model Serving/Deployment Model Documentation (MLflow) Inference & Scoring

From Data to Insights in Seconds: How to Build a Streaming ETL Pipeline on the Databricks Unified Analytics Platform

Rue Gilt Groupe strives to be the most engaging e-commerce website in the world. Their goal is to create an individualized shopping experience through the use of big data and machine learning. With over 400GB of clickstream data generated per day, they needed a way to process that data and feed it into their models in near real time. Without the right tools and support, that can be a resource intensive and costly proposition. This talk will detail how Rue Gilt Groupe built a streaming ETL pipeline with Databricks Delta, a powerful new offering within the Databricks Unified Analytics Platform — allowing them to accelerate processing times at exponential rates while simplifying the ability to tap into the power of machine learning at scale. This talk will highlight:

  • The challenges Rue Gilt Groupe faced trying to build a data pipeline that could deliver the performance required of their near real time use case.
  • How Databricks’ Unified Analytics Platform allowed them to easily build a streaming ETL pipeline and while simplifying data science at scale.
  • The engineering and business impact Databricks has had including reducing ETL outputs from 30 minutes to 10 seconds and contributing to a 10x increase in purchase engagement.
Session hashtag: #EntSAIS12