Introducing Apache Spark 2.3February 28, 2018 by Sameer Agarwal, Xiao Li, Reynold Xin and Jules Damji in Engineering Blog Today we are happy to announce the availability of Apache Spark 2.3.0 on Databricks as part of its Databricks Runtime 4.0. We want...
Cost Based Optimizer in Apache Spark 2.2August 31, 2017 by Ron Hu, Zhenhua Wang, Wenchen Fan and Sameer Agarwal in Engineering Blog This is a joint engineering effort between Databricks’ Apache Spark engineering team (Sameer Agarwal and Wenchen Fan) and Huawei’s engineering team (Ron Hu...
Apache Spark as a Compiler: Joining a Billion Rows per Second on a LaptopMay 23, 2016 by Sameer Agarwal, Davies Liu and Reynold Xin in Engineering Blog When our team at Databricks planned our contributions to the upcoming Apache Spark 2.0 release, we set out with an ambitious goal by...