Skip to main content
Page 1
Engineering blog

Announcing Ray support on Databricks and Apache Spark Clusters

Ray is a prominent compute framework for running scalable AI and Python workloads, offering a variety of distributed machine learning tools, large-scale hyperparameter...
Company blog

Databricks Connect: Bringing the capabilities of hosted Apache Spark™ to applications and microservices

June 14, 2019 by Eric Liang in Company Blog
In this blog post we introduce Databricks Connect , a new library that allows you to leverage native Apache Spark APIs from any...
Company blog

Introducing Databricks Optimized Autoscaling on Apache Spark™

Databricks is thrilled to announce our new optimized autoscaling feature. The new Apache Spark™-aware resource manager leverages Spark shuffle and executor statistics to...
Engineering blog

Declarative Infrastructure with the Jsonnet Templating Language

This blog post is part of our series of internal engineering blogs on Databricks platform, infrastructure management, integration, tooling, monitoring, and provisioning. At...
Company blog

Databricks Serverless: Next Generation Resource Management for Apache Spark

As the amount of data in an organization grows, more and more engineers, analysts and data scientists need to analyze this data using...
Engineering blog

Transactional Writes to Cloud Storage on Databricks

In another blog post published today , we showed the top five reasons for choosing S3 over HDFS. With the dominance of simple...
Engineering blog

Next Generation Physical Planning in Apache Spark

Never underestimate the bandwidth of a station wagon full of tapes hurtling down the highway. — Andrew Tanenbaum, 1981 magine a cold, windy...
Engineering blog

Scalable Partition Handling for Cloud-Native Architecture in Apache Spark 2.1

Apache Spark 2.1 is just around the corner: the community is going through voting process for the release candidates. This blog post discusses...
Company blog

Notebook Workflows: The Easiest Way to Implement Apache Spark Pipelines

August 30, 2016 by Dave Wang, Eric Liang and Maddie Schults in Company Blog
[glossary_parse]Today we are excited to announce Notebook Workflows in Databricks. Notebook Workflows is a set of APIs that allow users to chain notebooks...
Engineering blog

Generalized Linear Models in SparkR and R Formula Support in MLlib

October 5, 2015 by Eric Liang in Engineering Blog
To get started with SparkR, download Apache Spark 1.5 or sign up for a 14-day free trial of Databricks today . Apache Spark...