Solving a data science problem is about more than making a model. It entails data cleaning, exploration, modeling and tuning, production deployment, and workflows governing each of these steps. In this simple example, we’ll take a look at how health data can be used to predict life expectancy. It will start with data engineering in Apache Spark, data exploration, model tuning and logging with hyperopt and MLflow. It will continue with examples of how the model registry governs model promotion, and simple deployment to production with MLflow as a job or dashboard.
Sean is a data scientist at Databricks. He is an Apache Spark committer and PMC member, and co-author Advanced Analytics with Spark. Previously, he was director of Data Science at Cloudera and an engineer at Google.