John Zedlewski is the director of GPU-accelerated machine learning on the RAPIDS team. Previously, he worked on deep learning for self-driving cars at NVIDIA, deep learning for radiology at Enlitic, and machine learning for structured healthcare data at Castlight. He has an MA/ABD in economics from Harvard with a focus in computational econometrics and an AB in computer science from Princeton.
When combined with scale-out cloud infrastructure, modern hyperparameter optimization (HPO) libraries allow data scientists to deploy more compute power to improve model accuracy, running hundreds or thousands of model variants with minimal code changes. HPO has traditionally run into two barriers - complexity of model management and computational cost.
In this talk, we walk through a detailed example to address these challenges by combining two open source libraries. We use MLflow to simplify model management, and RAPIDS, a GPU-accelerated data science library, to reduce the compute time requirements. RAPIDS provides Pandas-compatible and scikit-learn-compatible APIs in Python that allow users to port existing code easily, while accelerating both data preprocessing and machine learning training scripts. The example builds a pipeline to predict flight delays from FAA data with random forests and gradient boosted decision trees, demonstrating a dramatic speedup in model building when compared to a non-accelerated version and using MLFlow APIs to select the best model and prepare it for deployment.