Databricks Runtime 3.0 Beta Delivers Cloud Optimized Apache SparkMay 24, 2017 by Reynold Xin in Product A major value Databricks provides is the automatic provisioning, configuration, and tuning of clusters of machines that process data. Running on these machines...
Processing a Trillion Rows Per Second on a Single Machine: How Can Nested Loop Joins be this Fast?February 16, 2017 by Reynold Xin, Ala Luszczak and Bogdan Raducanu in Engineering Blog This blog post describes our experience debugging a failing test case caused by a cross join query running “too fast.” Because the root...
Databricks and Apache Spark 2016 Year in ReviewJanuary 4, 2017 by Reynold Xin, Jules Damji, Dave Wang and Matei Zaharia in Company Blog Spark Summit will be held in Boston on Feb 7-9, 2017. Check out the full agenda and get your ticket before it sells...
Introducing Apache Spark 2.1December 29, 2016 by Reynold Xin in Engineering Blog Spark Summit will be held in Boston on Feb 7-9, 2017. Check out the full agenda and get your ticket before it sells...
$1.44 per terabyte: setting a new world record with Apache SparkNovember 14, 2016 by Reynold Xin in Engineering Blog We are excited to share with you that a joint effort by Nanjing University, Alibaba Group, and Databricks set a new world record...
Spark Structured StreamingJuly 28, 2016 by Matei Zaharia, Tathagata Das, Michael Lumb and Reynold Xin in Engineering Blog Apache Spark 2.0 adds the first version of a new higher-level API, Structured Streaming, for building continuous applications . The main goal is...
Introducing Apache Spark 2.0July 26, 2016 by Reynold Xin, Michael Lumb and Matei Zaharia in Engineering Blog Today, we're excited to announce the general availability of Apache Spark 2.0 on Databricks. This release builds on what the community has learned...
Apache Spark as a Compiler: Joining a Billion Rows per Second on a LaptopMay 23, 2016 by Sameer Agarwal, Davies Liu and Reynold Xin in Engineering Blog When our team at Databricks planned our contributions to the upcoming Apache Spark 2.0 release, we set out with an ambitious goal by...
Technical Preview of Apache Spark 2.0 Now on DatabricksMay 11, 2016 by Reynold Xin in Engineering Blog For the past few months, we have been busy contributing to the next major release of the big data open source software we...
The Unreasonable Effectiveness of Deep Learning on Apache SparkApril 1, 2016 by Miles Yucht and Reynold Xin in Engineering Blog Update: this post is an April Fools joke. It is not an actual project we're working on. For the past three years, our...