Skip to main content

Databricks and RStudio Introduce New Version of MLflow with R Integration

Contributions from RStudio Expand the Ecosystem of the Industry’s First Open Source Framework for Machine Learning Lifecycle

October 3, 2018
Share this post

LONDON – October 3, 2018 - Databricks, the leader in unified analytics and founded by the original creators of Apache Spark™, and RStudio, today announced a new release of MLflow, an open source multi-cloud framework for the machine learning lifecycle, now with R integration. RStudio has partnered with Databricks to develop an R API for MLflow v0.7.0 which was showcased today at Spark + AI Summit Europe. This new integration adds to features that have already been released, making MLflow the most comprehensive open source machine learning platform, with support for multiple programming languages, integrations with popular machine learning libraries, and support for multiple clouds.

Previous to MLflow, the industry did not have a standard process or end-to-end infrastructure to develop and productionize machine learning applications in a simple and consistent way. With MLflow, organizations can package their code as reproducible runs, execute and compare hundreds of parallel experiments, leverage any hardware or software platform for training, tuning, hyperparameter search and more.  Additionally, organizations can deploy and manage models in production on a variety of clouds and serving platforms. As a testament to MLflow’s design to be an open platform, RStudio’s contribution extends the MLflow platform to the large community of data scientists who use RStudio and R programming language.

"In many organizations machine learning workflows are far too ad-hoc, with no systematic tracking of experiments, inadequate protocols around reproducibility, and no consistent way to package and deploy models. MLflow helps address these issues in a uniform fashion across languages and frameworks," said JJ Allaire, chief executive officer at RStudio. “Integration of R with MLflow will significantly broaden the reach of the project by allowing a broader community to use and contribute to MLflow.”

Since launching MLflow only four months ago, community engagement and contributions have led to an impressive array of new features and integrations that have been released, including:

  • Support for Multiple Programming Languages: To give developers a choice, in addition to R, MLflow supports Python, Java and Scala; as well as a REST server interface which can be used from any language.
  • Integration with Popular Machine Learning Libraries and Frameworks: MLflow has built-in integrations with the most popular machine learning libraries such as scikit-learn, TensorFlow, Keras, PyTorch, H2O, and Apache Spark MLlib to help teams build, test, and deploy machine learning applications.
  • Cross-cloud Support: Organizations can use MLflow to quickly deploy machine learning models to multiple cloud services, including Databricks, Azure Machine Learning, and Amazon SageMaker based on their needs. MLflow leverages AWS S3, Google Cloud Storage, and Azure Data Lake Storage allowing teams to easily track and share artifacts from their code.

“With MLflow, data science teams can systematically package and reuse models across frameworks, track and share experiments locally or in the cloud, and deploy models virtually anywhere,” according to Matei Zaharia, chief technologist at Databricks, the original creator of Apache Spark, and Tech Lead of MLflow. “The flurry of interest and contributions we’ve seen from the data science community validates the need for an open source framework to streamline the machine learning lifecycle.”

MLflow on Databricks’ Unified Analytics Platform

Databricks provides MLflow as a managed service, and early adopters are experiencing increased efficiency across the machine learning lifecycle. By leveraging MLflow within Databricks’ Unified Analytics Platform, users can easily initiate runs from their on-premises environment or from Databricks notebooks. MLflow’s tight integration with Databricks Delta enables data science teams to track the large-scale data that fed the models along with all the other model parameters then reliably reproduce training runs. By integrating MLflow as part of its Unified Analytics Platform, Databricks is bringing the overall benefits of one common security model to the entire machine learning lifecycle.

Additional Resources

About Databricks

Databricks’ mission is to accelerate innovation for its customers by unifying Data Science, Engineering and Business. Founded by the original creators of Apache Spark, Databricks provides a Unified Analytics Platform for data science teams to collaborate with data engineering and lines of business to build data products. Users achieve faster time-to-value with Databricks by creating analytic workflows that go from ETL and interactive exploration to production. The company also makes it easier for its users to focus on their data by providing a fully managed, scalable, and secure cloud infrastructure that reduces operational complexity and total cost of ownership. Databricks, venture-backed by Andreessen Horowitz, NEA and Battery Ventures, among others, has a global customer base that includes Viacom, Shell and HP. For more information, visit www.databricks.com.

Apache, Apache Spark and Spark are trademarks of the Apache Software Foundation.

Recent Press Releases

Databricks is Raising $10B Series J Investment at $62B Valuation
Read Now
Databricks Expands Presence in the Middle East, Launching in the Kingdom of Saudi Arabia
Read Now
Databricks Advances Data and AI Innovation in the UK Public Sector
Read Now
Databricks Announces Over 70% Annualized Growth in France as Demand for the Data Intelligence Platform Grows
Read Now
Databricks Completes the Financial Security Institute’s Security and Safety Assessment for Cloud Service Providers in Korea
Read Now
View All