Skip to main content

Our goal with Apache Spark is very simple: provide the best platform for computation on big data. We do this through both a powerful core engine and rich libraries for useful analytics tasks. Today, we are excited to announce the release of Apache Spark 0.9.0. This major release extends Spark’s libraries and further improves its performance and usability. Apache Spark 0.9.0 is the largest release to date, with work from 83 contributors, who submitted over 300 patches.

Apache Spark 0.9 features significant extensions to the set of standard analytical libraries packaged with Spark. The release introduces GraphX, a library for graph computation that comes with implementations of several standard algorithms, such as PageRank. Spark’s machine learning library (MLlib) has been extended to support Python, using the NumPy numerical library. A Naive Bayes Classifier has also been added to MLlib. Finally, Spark Streaming, which supports near-real-time continuous computation, has added a simplified high-availability mode and several significant optimizations.

In addition to higher-level libraries, Spark 0.9 features improvements to the core computation engine. Spark now now automatically spills reduce output to disk, increasing the stability of workloads with very large aggregations. Support for Spark in YARN mode has been hardened and improved. The standalone mode has added automatic supervision of applications and better support for sharing clusters amongst several users. Finally, we’ve focused on stabilizing API’s ahead of Apache Spark’s 1.0 release to make things easy for developers writing Spark applications. This includes upgrading to Scala 2.10, allowing applications written in Scala to use newer libraries.

Apache Spark 0.9.0 can be downloaded directly from the Apache Spark website. It will also be available to CDH users via a Cloudera parcel, which can automatically install Spark on existing CDH clusters. For a more detailed explanation of the features in this release, head on over to the official release notes. Enjoy the newest release of Spark!

Try Databricks for free

Related posts

Apache Spark Key Terms, Explained

June 22, 2016 by Jules Damji and Denny Lee in
This article was originally posted on KDnuggets The Spark Summit Europe call for presentations is open, submit your idea today As observed in...

Databricks is now Generally Available

June 15, 2015 by Ion Stoica and Matei Zaharia in
We are excited to announce today, at Spark Summit 2015 , the general availability of the Databricks – a hosted data platform from...

A Deep Dive into the Latest Performance Improvements of Stateful Pipelines in Apache Spark Structured Streaming

This post is the second part of our two-part series on the latest performance improvements of stateful pipelines. The first part of this...
See all Engineering Blog posts