Kevin is a software engineer at RStudio working on open source software for big data analytics and machine learning. Previously, he worked as a data scientist in a variety of companies, including Honeywell where he led machine learning projects using IIoT data, and KPMG where he worked with insurers to develop predictive models for pricing.
We provide a recap of the MLflow R interface which was announced at Spark+AI Summit Europe and discuss recent developments. The session includes a live demo showcasing the intersection of big data (Spark) and deep learning (via TensorFlow) and how the end-to-end lifecycle from prototyping to deployment can be managed by MLflow.
We provide an overview of the R interface for MLflow, an open source platform for managing end-to-end machine learning lifecycle. We demonstrate the three components of the framework—experiment tracking, project packaging, and model deployment—via concrete use cases. Session hashtag: #SAISDS5
Sparklyr has enabled data scientists to use familiar R and tidyverse syntax to interactively analyze data and build models at scale via Apache Spark. However, one common pain point in organizations is operationalizing these models either in a batch prediction or real-time scoring setting. With support for Spark ML pipelines in sparklyr, data scientists can use R to build pipelines that are fully interoperable with Scala using a familiar API. For real-time scoring, an R interface to MLeap, an open source engine for serializing and serving Spark ML models, is provided. These functionalities faciliate collaboration among data scientists and implementation engineers and shorten time to production. We discuss the mechanics of sparklyr ML pipelines and demonstrate an end-to-end example. Session hashtag: #ML5SAIS