Apache Spark 0.9.0 Released

Published: February 3, 2014

Our goal with Apache Spark is very simple: provide the best platform for computation on big data. We do this through both a powerful core engine and rich libraries for useful analytics tasks. Today, we are excited to announce the release of Apache Spark 0.9.0. This major release extends Spark’s libraries and further improves its performance and usability. Apache Spark 0.9.0 is the largest release to date, with work from 83 contributors, who submitted over 300 patches.

Apache Spark 0.9 features significant extensions to the set of standard analytical libraries packaged with Spark. The release introduces GraphX, a library for graph computation that comes with implementations of several standard algorithms, such as PageRank. Spark’s machine learning library (MLlib) has been extended to support Python, using the NumPy numerical library. A Naive Bayes Classifier has also been added to MLlib. Finally, Spark Streaming, which supports near-real-time continuous computation, has added a simplified high-availability mode and several significant optimizations.

In addition to higher-level libraries, Spark 0.9 features improvements to the core computation engine. Spark now now automatically spills reduce output to disk, increasing the stability of workloads with very large aggregations. Support for Spark in YARN mode has been hardened and improved. The standalone mode has added automatic supervision of applications and better support for sharing clusters amongst several users. Finally, we’ve focused on stabilizing API’s ahead of Apache Spark’s 1.0 release to make things easy for developers writing Spark applications. This includes upgrading to Scala 2.10, allowing applications written in Scala to use newer libraries.

Apache Spark 0.9.0 can be downloaded directly from the Apache Spark website. It will also be available to CDH users via a Cloudera parcel, which can automatically install Spark on existing CDH clusters. For a more detailed explanation of the features in this release, head on over to the official release notes. Enjoy the newest release of Spark!

What's next?

March 22, 2024/10 min read

GGML GGUF File Format Vulnerabilities

June 5, 2024/3 min read

Never miss a Databricks post

Sign up

What's next?

GGML GGUF File Format Vulnerabilities

BigQuery adds first-party support for Delta Lake