Databricks Runtime for ML is built on top and updated with every Databricks Runtime release. It is now generally available across all Databricks product offerings including: Azure Databricks, AWS cloud, GPU clusters and CPU clusters.
To use the Databricks Runtime for ML, simply select the ML version of the runtime when you create your cluster:
Conda Managed Runtime
Benefit from Conda integration for Python package management. All Python packages are installed in a single environment.
ML Frameworks Integration
The most popular ML libraries and frameworks are provided out-of-the-box including TensorFlow / TensorBoard, Keras, PyTorch, MLflow, Horovod / HorovodRunner, GraphFrames, scikit-learn, XGboost, numpy, MLeap, and Pandas.
Benefit from TensorFlow CUDA-optimized version on GPU clusters, and Intel MKL-DNN optimized TensorFlow package on Intel CPUs for maximum performance.
Quickly migrate your single node deep learning training code to run in a Databricks cluster with HorovodRunner, a simple API that abstracts complications faced when using Horovod for distributed training.
Optimized MLlib Logistic Regression and Tree Classifiers
The most popular estimators have been optimized as part of the Databricks Runtime for ML to provide you with up to 40% speed-up compared to Apache Spark 2.4.0.
Run GraphFrames 2-4 times faster and benefit from up to 100 times speed-up for Graph queries, depending on the workloads and data skew.
Included in the runtime, Databricks Delta – the next generation analytics engine – allows teams to build robust and performant ML pipelines, including ETL and data prep at scale.
Included in the runtime, MLflow allows for end-to-end management of the ML lifecycle, from experiment tracking, to project reproducibility, and model deployment.
Cross-cloud support on both Amazon P2/P3 instances, and Azure NC and NCv3 series.