Apache Spark as a Compiler: Joining a Billion Rows per Second on a LaptopMay 23, 2016 by Sameer Agarwal, Davies Liu and Reynold Xin in Engineering Blog When our team at Databricks planned our contributions to the upcoming Apache Spark 2.0 release, we set out with an ambitious goal by...
Approximate Algorithms in Apache Spark: HyperLogLog and QuantilesMay 19, 2016 by Tim Hunter, Hossein Falaki and Joseph Bradley in Solutions Introduction Apache Spark is fast, but applications such as preliminary data exploration need to be even faster and are willing to sacrifice some...
Technical Preview of Apache Spark 2.0 Now on DatabricksMay 11, 2016 by Reynold Xin in Engineering Blog For the past few months, we have been busy contributing to the next major release of the big data open source software we...
New Content in Databricks Community EditionApril 12, 2016 by Ion Stoica in Engineering Blog At the Spark Summit New York , we announced Databricks Community Edition (CE) beta. CE is a free version of the Databricks service...
The Unreasonable Effectiveness of Deep Learning on Apache SparkApril 1, 2016 by Miles Yucht and Reynold Xin in Engineering Blog Update: this post is an April Fools joke. It is not an actual project we're working on. For the past three years, our...
Apache Spark Trending in the Stack Overflow SurveyMarch 22, 2016 by Reynold Xin in Solutions Last week, Stack Overflow released the result of their 2016 developer survey . This is one of the most significant surveys in the...
On-Time Flight Performance with GraphFrames for Apache SparkMarch 16, 2016 by Joseph Bradley, Bill Chambers and Denny Lee in Engineering Blog Introduction Graph structures are a more intuitive approach to many classes of data problems. Whether traversing social networks, restaurant recommendations, or flight paths...
Introducing GraphFramesMarch 3, 2016 by Ankur Dave, Joseph Bradley and Tim Hunter in Engineering Blog We would like to thank Ankur Dave from UC Berkeley AMPLab for his contribution to this blog post. Databricks is excited to announce...
Reshaping Data with Pivot in Apache SparkFebruary 9, 2016 by Andrew Ray in Engineering Blog Spark Summit East is just around the corner! If you haven’t registered yet, you can get tickets and here’s a promo code for...
Auto-scaling scikit-learn with Apache SparkFebruary 8, 2016 by Tim Hunter and Joseph Bradley in Engineering Blog Data scientists often spend hours or days tuning models to get the highest accuracy. This tuning typically involves running a large number of...