Articles by Reynold Xin - Databricks Blog

Page 7

Databricks Runtime 3.0 Beta Delivers Cloud Optimized Apache Spark

May 24, 2017 by Reynold Xin in Product

A major value Databricks provides is the automatic provisioning, configuration, and tuning of clusters of machines that process data. Running on these machines...

Processing a Trillion Rows Per Second on a Single Machine: How Can Nested Loop Joins be this Fast?

February 16, 2017 by Reynold Xin, Ala Luszczak and Bogdan Raducanu in Engineering

This blog post describes our experience debugging a failing test case caused by a cross join query running “too fast.” Because the root...

Databricks and Apache Spark 2016 Year in Review

January 3, 2017 by Reynold Xin, Jules Damji, Dave Wang and Matei Zaharia in Company

Spark Summit will be held in Boston on Feb 7-9, 2017. Check out the full agenda and get your ticket before it sells...

Introducing Apache Spark 2.1

December 28, 2016 by Reynold Xin in Engineering

Spark Summit will be held in Boston on Feb 7-9, 2017. Check out the full agenda and get your ticket before it sells...

$1.44 per terabyte: setting a new world record with Apache Spark

November 14, 2016 by Reynold Xin in Engineering

We are excited to share with you that a joint effort by Nanjing University, Alibaba Group, and Databricks set a new world record...

Spark Structured Streaming

July 28, 2016 by Matei Zaharia, Tathagata Das, Michael Lumb and Reynold Xin in Engineering

Apache Spark 2.0 adds the first version of a new higher-level API, Structured Streaming, for building continuous applications . The main goal is...

Introducing Apache Spark 2.0

July 26, 2016 by Reynold Xin, Michael Lumb and Matei Zaharia in Engineering

Today, we're excited to announce the general availability of Apache Spark 2.0 on Databricks. This release builds on what the community has learned...

Apache Spark as a Compiler: Joining a Billion Rows per Second on a Laptop

May 23, 2016 by Sameer Agarwal, Davies Liu and Reynold Xin in Engineering

When our team at Databricks planned our contributions to the upcoming Apache Spark 2.0 release, we set out with an ambitious goal by...

Technical Preview of Apache Spark 2.0 Now on Databricks

May 11, 2016 by Reynold Xin in Engineering

For the past few months, we have been busy contributing to the next major release of the big data open source software we...

The Unreasonable Effectiveness of Deep Learning on Apache Spark

March 31, 2016 by Miles Yucht and Reynold Xin in Engineering

Update: this post is an April Fools joke. It is not an actual project we're working on. For the past three years, our...