Databricks Blog

Page 205

Guest blog: SequoiaDB Connector for Apache Spark

August 3, 2015 by Tao Wang in Company

This is a guest blog from Tao Wang at SequoiaDB . He is the co-founder and CTO of SequoiaDB, leading its long-term technology...

Diving into Apache Spark Streaming's Execution Model

July 30, 2015 by Tathagata Das, Matei Zaharia and Patrick Wendell in Engineering

With so many distributed stream processing engines available, people often ask us about the unique benefits of Apache Spark Streaming . From early...

New Features in Machine Learning Pipelines in Apache Spark 1.4

July 29, 2015 by Joseph Bradley and Burak Yavuz in Engineering

Apache Spark 1.2 introduced Machine Learning (ML) Pipelines to facilitate the creation, tuning, and inspection of practical ML workflows. Spark’s latest release, Spark...

Using 3rd Party Libraries in Databricks: Apache Spark Packages and Maven Libraries

July 28, 2015 by Burak Yavuz in Company

In an earlier post, we described how you can easily integrate your favorite IDE with Databricks to speed up your application development. In...

Joint Blog Post: Bringing ORC Support into Apache Spark

July 16, 2015 by Zhan Zhang, Cheng Liang and Patrick Wendell in Engineering

This is a joint blog post with our partner Hortonworks. Zhan Zhang is a member of technical staff at Hortonworks, where he collaborated...

Introducing Window Functions in Spark SQL

July 15, 2015 by Yin Huai and Michael Armbrust in Engineering

Get an early preview of O'Reilly's new ebook for the step-by-step guidance you need to start using Delta Lake. In this blog post...

Introducing R Notebooks in Databricks

July 13, 2015 by Hossein Falaki in Product

Apache Spark 1.4 was released on June 11 and one of the exciting new features was SparkR . I am happy to announce...

Announcing SparkHub: A Community Site for Apache Spark

July 10, 2015 by Denny Lee in Announcements

Today, we are happy to announce SparkHub , a service for the Apache Spark community to easily find the most relevant Spark resources...

New Visualizations for Understanding Apache Spark Streaming Applications

July 8, 2015 by Tathagata Das, Shixiong Zhu and Andrew Or in Engineering

Earlier, we presented new visualizations introduced in Apache Spark 1.4.0 to understand the behavior of Spark applications. Continuing the theme, this blog highlights...

Guest blog: PMML Support in Apache Spark's MLlib

July 2, 2015 by Vincenzo Selvaggio in Engineering

This is a guest blog from our friend Vincenzo Selvaggio who contributed this feature. He is a Senior Java Technical Architect and Project...