Skip to main content
Page 1

Announcing General Availability of Ray on Databricks

We released Ray support public preview last year and since then, hundreds of Databricks customers have been using it for variety of use...

Announcing Ray Autoscaling support on Databricks and Apache Spark™

Ray is an open-source unified compute framework that simplifies scaling AI and Python workloads in a distributed environment. Since we introduced support for...

Introducing Apache Spark™ 3.5

Today, we are happy to announce the availability of Apache Spark™ 3.5 on Databricks as part of Databricks Runtime 14.0. We extend our...

Announcing Ray support on Databricks and Apache Spark Clusters

Ray is a prominent compute framework for running scalable AI and Python workloads, offering a variety of distributed machine learning tools, large-scale hyperparameter...

Simplify Data Conversion from Apache Spark to TensorFlow and PyTorch

June 16, 2020 by Liang Zhang and Weichen Xu in
Petastorm is a popular open-source library from Uber that enables single machine or distributed training and evaluation of deep learning models from datasets...

Introducing Built-in Image Data Source in Apache Spark 2.4

December 10, 2018 by Tomas Nykodym and Weichen Xu in
Introduction With recent advances in deep learning frameworks for image classification and object detection, the demand for standard image processing in Apache Spark...