Evolving the Databricks brand
Some brands start out as, well, brands. A lot of work goes into the concept and painting the picture before the business is ever launched. Databricks is different. It always has been and always will be an engineering-led company. Databricks’ model for innovation is inspired by the open-source community. This is where our roots run...
Apache Spark 2015 Year In Review
To learn more about Apache Spark, attend Spark Summit East in New York in Feb 2016. 2015 has been a year of tremendous growth for Apache Spark. The pace of development is the fastest ever. We went through 4 releases (Spark 1.3 to 1.6) in a single year, and each of them added hundreds of...
Announcing Apache Spark 1.6
To learn more about Apache Spark, attend Spark Summit East in New York in Feb 2016. Today we are happy to announce the availability of Apache Spark 1.6! With this release, Spark hit a major milestone in terms of community development: the number of people that have contributed code to Spark has crossed 1000, doubling...
Announcing an Apache Spark 1.6 Preview in Databricks
Today we are happy to announce the availability of an Apache Spark 1.6 preview package in Databricks. The Apache Spark 1.6.0 release is still a few weeks away - this package is intended to provide early access to the features in the upcoming Spark 1.6 release, based on the upstream source code. Using the preview...
Spark Survey 2015 Results are now available
We ran the Spark Survey 2015 this summer to gain insights on how organizations are using Apache Spark. The results of this year’s Spark Survey - reflecting the answers and opinions of over 1,417 respondents representing 842 organizations - strongly indicate the rapid growth of the Spark community and offers valuable insight into the direction...
Announcing Apache Spark 1.5
The inaugural Spark Summit Europe will be held in Amsterdam this October. Check out the full agenda and get your ticket before it sells out! Today we are happy to announce the availability of Apache Spark’s 1.5 release! In this post, we outline the major development themes in Spark 1.5 and some of the new features...
Diving into Apache Spark Streaming’s Execution Model
With so many distributed stream processing engines available, people often ask us about the unique benefits of Apache Spark Streaming. From early on, Apache Spark has provided an unified engine that natively supports both batch and streaming workloads. This is different from other systems that either have a processing engine designed only for streaming, or...
Joint Blog Post: Bringing ORC Support into Apache Spark
This is a joint blog post with our partner Hortonworks. Zhan Zhang is a member of technical staff at Hortonworks, where he collaborated with the Databricks team on this new feature. In version 1.2.0, Apache Spark introduced a Data Source API (SPARK-3247) to enable deep platform integration with a larger number of data sources and sinks....
Announcing Apache Spark 1.4
Today I’m excited to announce the general availability of Apache Spark 1.4! Spark 1.4 introduces SparkR, an R API targeted towards data scientists. It also evolves Spark’s DataFrame API with a large number of new features. Spark's ML pipelines API first introduced in Spark 1.3 graduates from an alpha component. Finally, Spark Streaming and Core...
Announcing Apache Spark 1.3!
Today I’m excited to announce the general availability of Apache Spark 1.3! Apache Spark 1.3 introduces the widely anticipated DataFrame API, an evolution of Spark’s RDD abstraction designed to make crunching large datasets simple and fast. Apache Spark 1.3 also boasts a large number of improvements across the stack, from Streaming, to ML, to SQL....