Skip to main content

Today, we’re excited to announce MLflow v0.3.0, which we released last week with some of the requested features from internal clients and open source users. MLflow 0.3.0 is already available on PyPI and docs are updated. If you do pip install mlflow as described in the MLflow quickstart guide, you will get the recent release.

In this post, we’ll describe a couple new features and enumerate other items and bug fixes filed as issues on the Github repository.

GCP-Backed Artifact Support

We’ve added support for storing artifacts in Google Storage, through the --default-artifact-root parameter to the mlflow server command. This makes it easy to run MLflow training jobs on multiple cloud instances and track results across them. The following example shows how to launch the tracking server with a GCP artifact store. Also, you will need to setup Authentication as described in the documentation. This item closes issue #152.

mlflow server --default-artifact-root gs://my-mlflow-google-bucket/

Apache Spark MLlib Integration

As part of MLflow’s Model component, we have added Spark MLlib model as a model flavor.
This means that you can export Spark MLlib models as MLflow models. Exported models when saved using MLlib’s native serialization can be deployed and loaded as Spark MLlib models or as Python Function within MLflow. To save and load these models, use the spark.mflow API. This addresses issue #72. For example, you can save a Spark MLlib model, as shown in the code snippet below:

from pyspark.ml import Pipeline
from mlflow import spark

tokenizer = Tokenizer(inputCol="review", outputCol="words")
hashingTF = HashingTF(inputCol="words", outputCol="features")
lasso = LinearRegression(labelCol="rating", elasticNetParam=1.0, maxIter=20)
pipeline = Pipeline(stages=[tokenizer, hashingTF, lasso])
model = pipeline.fit(dataset)
...

mlflow.spark.log_model(model, "spark-model")

Now we can access this MLlib persisted model in an MLflow application.

from mlflow import spark

model = mlflow.spark.load_model("spark-model")
df = model.transform(test_df)

Other Features and Bug Fixes

In addition to these features, other items, bugs and documentation fixes are included in this release. Some items worthy of note are:

  • [SageMaker] Support for deleting and updating applications deployed via SageMaker (issue #145)
  • [SageMaker] Pushing the MLflow SageMaker container now includes the MLflow version that it was published with (issue #124)
  • [SageMaker] Simplify parameters to SageMaker deployment by providing sane defaults (issue #126)

The full list of changes and contributions from the community can be found in the CHANGELOG. We welcome more input on [email protected] or by filing issues or submitting patches on GitHub. For real-time questions about MLflow, we’ve also recently created a Slack channel for MLflow.

Read More

For an overview of what we’re working on next, take a look at the roadmap slides in our presentation from last week’s Bay Area Apache Spark Meetup or watch the meetup presentation.

Credits

MLflow 0.3.0 includes patches from Aaron Davidson, Andrew Chen, Bill Chambers, Brett Nekolny, Corey Zumar, Denny Lee, Emre Sevinç, Greg Gandenberger, Jules Damji, Juntai Zheng, Mani Parkhe, Matei Zaharia, Mike Huston, Siddharth Murching, Stephanie Bodoff, Sue Ann Hong, Tomas Nykodym, Vahe Hakobyan

Try Databricks for free

Related posts

See all Announcements posts