SQL Subqueries in Apache Spark 2.0June 17, 2016 by Davies Liu and Herman van Hövell in Engineering Blog Try this notebook in Databricks In the upcoming Apache Spark 2.0 release, we have substantially expanded the SQL standard capabilities. In this brief...
Apache Spark as a Compiler: Joining a Billion Rows per Second on a LaptopMay 23, 2016 by Sameer Agarwal, Davies Liu and Reynold Xin in Engineering Blog When our team at Databricks planned our contributions to the upcoming Apache Spark 2.0 release, we set out with an ambitious goal by...
Apache Spark 1.5 DataFrame API Highlights: Date/Time/String Handling, Time Intervals, and UDAFsSeptember 16, 2015 by Michael Armbrust, Yin Huai, Davies Liu and Reynold Xin in Engineering Blog To try new features highlighted in this blog post, download Spark 1.5 or sign up Databricks for a 14-day free trial today...
Improvements to Kafka integration of Spark StreamingMarch 30, 2015 by Cody Koeninger, Davies Liu and Tathagata Das in Engineering Blog Apache Kafka is rapidly becoming one of the most popular open source stream ingestion platforms. We see the same trend among the users...
Introducing DataFrames in Apache Spark for Large Scale Data ScienceFebruary 17, 2015 by Reynold Xin, Michael Armbrust and Davies Liu in Engineering Blog Today, we are excited to announce a new DataFrame API designed to make big data processing even easier for a wider audience. When...