Han Wang

Tech Lead, Lyft Inc.

Han Wang is the tech lead of Lyft Machine Learning Platform, focusing on distributed computing and machine learning solutions. Before joining Lyft, he worked at Microsoft, Hudson River Trading, Amazon and Quantlab. Han is the founder of the Fugue project, aiming at democratizing distributed computing and machine learning.

Past sessions

Hyperparameter tuning is critical in model development. And its general form: parameter tuning with an objective function is also widely used in industry. On the other hand, Apache Spark can handle massive parallelism, and Apache Spark ML is a solid machine learning solution.

But we have not seen a general and intuitive distributed parameter tuning solution based on Apache Spark, why?

  1. Not every tuning problem is on Apache Spark ML models. How can Apache Spark handle general models?
  2. Not every tuning problem is a parallelizable grid or random search. Bayesian optimization is sequential, how can Apache Spark help in this case?
  3. Not every tuning problem is single epoch, deep learning is not. How to fit algos such as hyperband and ASHA into Apache Spark?
  4. Not every tuning problem is a machine learning problem, for example simulation + tuning is also common. How to generalize?

In this talk, we are going to show how using Fugue-Tune and Apache Spark together can eliminate these painpoints

  1. Fugue-Tune like Fugue, is a "super framework" - an absraction layer unifying existing solutions such as Hyperopt and Optuna
  2. It firstly models the general tuning problems, independent from machine learning
  3. It is designed for both small and large scale problems. It can always fully parallelize the distributable part of a tuning problem
  4. It works for both classical and deep learning models. With Fugue, running hyperband and ASHA becomes possible on Apache Spark.

In the demo, you will see how to do any type of tuning in a consistent, intuitive, scalable and minimal way. And you will see a live demo of the amazing performance.

In this session watch:
Han Wang, Tech Lead, Lyft Inc.

[daisna21-sessions-od]

While struggling to choose among different computing and machine learning frameworks such as Spark, Dask, Scikit-learn, Tensorflow, etc. for your ETL and machine learning projects, have you thought about unifying them into one ecosystem to use? In this talk, we will present such a framework we developed - Fugue. It’s an abstraction layer on top of different frameworks, also providing a SQL-like language that can represent your pipelines from end to end, which is highly extendable by Python. With the Fugue framework, it’s a lot easier and faster to create reliable, performant and portable pipelines than using native Spark, especially for non-expert users.

In this talk we will demonstrate how we implemented the Node2Vec algorithm on top of Fugue, so it can run on different computing frameworks and can process graphs with 100 million vertices and 3 billion edges in a few hours using Spark as the backend.

We have also built a unified interactive environment based on Kubernetes, Spark and Fugue, and will demonstrate great performance improvement on the projects migrated into this system. We will also talk about the future plan of the Fugue Project including Fugue ML and Fugue Streaming. Our goal is to create a unified ecosystem for distributed computing and machine learning.