Introducing Redshift Data Source for SparkOctober 19, 2015 by Sameer Wadkar and Josh Rosen in Engineering Blog This is a guest blog from Sameer Wadkar, Big Data Architect/Data Scientist at Axiomine. The Spark SQL Data Sources API was introduced in...
Generalized Linear Models in SparkR and R Formula Support in MLlibOctober 5, 2015 by Eric Liang in Engineering Blog To get started with SparkR, download Apache Spark 1.5 or sign up for a 14-day free trial of Databricks today . Apache Spark...
Apache Spark 1.5.1 and What do Version Numbers Mean?October 1, 2015 by Reynold Xin in Engineering Blog The inaugural Spark Summit Europe will be held in Amsterdam on October 27 - 29. Check out the full agenda and get your...
Improved Frequent Pattern Mining in Apache Spark 1.5: Association Rules and Sequential PatternsSeptember 28, 2015 by Feynman Liang, Jiajin Zhang and Dandan Tu in Engineering Blog We would like to thank Jiajin Zhang and Dandan Tu from Huawei for contributing to this blog. To get started mining patterns from...
Large Scale Topic Modeling: Improvements to LDA on Apache SparkSeptember 22, 2015 by Feynman Liang, Yuhao Yang and Joseph Bradley in Engineering Blog This blog was written by Feynman Liang and Joseph Bradley from Databricks, and Yuhao Yang from Intel. To get started using LDA, download...
Apache Spark 1.5 DataFrame API Highlights: Date/Time/String Handling, Time Intervals, and UDAFsSeptember 16, 2015 by Michael Armbrust, Yin Huai, Davies Liu and Reynold Xin in Engineering Blog To try new features highlighted in this blog post, download Spark 1.5 or sign up Databricks for a 14-day free trial today...
Announcing Apache Spark 1.5September 9, 2015 by Reynold Xin and Patrick Wendell in Engineering Blog The inaugural Spark Summit Europe will be held in Amsterdam this October. Check out the full agenda and get your ticket before it...
From Pandas to Apache Spark's DataFrameAugust 12, 2015 by Olivier Girardot in Engineering Blog This is a cross-post from the blog of Olivier Girardot. Olivier is a software engineer and the co-founder of Lateral Thoughts, where he...
Diving into Apache Spark Streaming's Execution ModelJuly 30, 2015 by Tathagata Das, Matei Zaharia and Patrick Wendell in Engineering Blog With so many distributed stream processing engines available, people often ask us about the unique benefits of Apache Spark Streaming . From early...
New Features in Machine Learning Pipelines in Apache Spark 1.4July 29, 2015 by Joseph Bradley and Burak Yavuz in Engineering Blog Apache Spark 1.2 introduced Machine Learning (ML) Pipelines to facilitate the creation, tuning, and inspection of practical ML workflows. Spark’s latest release, Spark...