Databricks Bi-Weekly Digest: 7/18/16July 18, 2016 by Jules Damji in Engineering Blog Today, we're kicking off a new series: the Databricks Bi-Weekly Digest. Our goal with this digest is to summarize Spark related content, compiled...
SparkR Tutorial at useR 2016July 7, 2016 by Hossein Falaki and Shivaram Venkataraman in Solutions AMPLab and Databricks gave a tutorial on SparkR at the useR conference. The conference was held from June 27 - June 30 at...
Apache Spark Key Terms, ExplainedJune 22, 2016 by Jules Damji and Denny Lee in Engineering Blog This article was originally posted on KDnuggets The Spark Summit Europe call for presentations is open, submit your idea today As observed in...
Approximate Algorithms in Apache Spark: HyperLogLog and QuantilesMay 19, 2016 by Tim Hunter, Hossein Falaki and Joseph Bradley in Solutions Introduction Apache Spark is fast, but applications such as preliminary data exploration need to be even faster and are willing to sacrifice some...
New Content in Databricks Community EditionApril 12, 2016 by Ion Stoica in Engineering Blog At the Spark Summit New York , we announced Databricks Community Edition (CE) beta. CE is a free version of the Databricks service...
The Unreasonable Effectiveness of Deep Learning on Apache SparkApril 1, 2016 by Miles Yucht and Reynold Xin in Engineering Blog Update: this post is an April Fools joke. It is not an actual project we're working on. For the past three years, our...
Apache Spark Trending in the Stack Overflow SurveyMarch 22, 2016 by Reynold Xin in Solutions Last week, Stack Overflow released the result of their 2016 developer survey . This is one of the most significant surveys in the...
Apache Spark 2015 Year In ReviewJanuary 5, 2016 by Reynold Xin, Matei Zaharia and Patrick Wendell in Solutions To learn more about Apache Spark, attend Spark Summit East in New York in Feb 2016 . 2015 has been a year of...
Introducing Redshift Data Source for SparkOctober 19, 2015 by Sameer Wadkar and Josh Rosen in Engineering Blog This is a guest blog from Sameer Wadkar, Big Data Architect/Data Scientist at Axiomine. The Spark SQL Data Sources API was introduced in...
Guest blog: PMML Support in Apache Spark's MLlibJuly 2, 2015 by Vincenzo Selvaggio in Engineering Blog This is a guest blog from our friend Vincenzo Selvaggio who contributed this feature. He is a Senior Java Technical Architect and Project...