Skip to main content
Page 1

Announcing Ray support on Databricks and Apache Spark Clusters

Ray is a prominent compute framework for running scalable AI and Python workloads, offering a variety of distributed machine learning tools, large-scale hyperparameter...

Databricks Connect: Bringing the capabilities of hosted Apache Spark™ to applications and microservices

June 14, 2019 by Eric Liang in
In this blog post we introduce Databricks Connect , a new library that allows you to leverage native Apache Spark APIs from any...

Introducing Databricks Optimized Autoscaling on Apache Spark™

Databricks is thrilled to announce our new optimized autoscaling feature. The new Apache Spark™-aware resource manager leverages Spark shuffle and executor statistics to...

Declarative Infrastructure with the Jsonnet Templating Language

June 26, 2017 by Eric Liang and Aaron Davidson in
This blog post is part of our series of internal engineering blogs on Databricks platform, infrastructure management, integration, tooling, monitoring, and provisioning. At...

Databricks Serverless: Next Generation Resource Management for Apache Spark

As the amount of data in an organization grows, more and more engineers, analysts and data scientists need to analyze this data using...

Transactional Writes to Cloud Storage on Databricks

In another blog post published today , we showed the top five reasons for choosing S3 over HDFS. With the dominance of simple...

Next Generation Physical Planning in Apache Spark

Never underestimate the bandwidth of a station wagon full of tapes hurtling down the highway. — Andrew Tanenbaum, 1981 magine a cold, windy...

Scalable Partition Handling for Cloud-Native Architecture in Apache Spark 2.1

December 15, 2016 by Eric Liang, Michael Allman and Wenchen Fan in
Apache Spark 2.1 is just around the corner: the community is going through voting process for the release candidates. This blog post discusses...

Notebook Workflows: The Easiest Way to Implement Apache Spark Pipelines

August 30, 2016 by Dave Wang, Eric Liang and Maddie Schults in
[glossary_parse]Today we are excited to announce Notebook Workflows in Databricks. Notebook Workflows is a set of APIs that allow users to chain notebooks...

Generalized Linear Models in SparkR and R Formula Support in MLlib

October 5, 2015 by Eric Liang in
To get started with SparkR, download Apache Spark 1.5 or sign up for a 14-day free trial of Databricks today . Apache Spark...