Skip to main content
Engineering blog

Today, we are happy to announce Apache Spark Packages (http://spark-packages.org), a community package index to track the growing number of open source packages and libraries that work with Apache Spark. Spark Packages makes it easy for users to find, discuss, rate, and install packages for any version of Spark, and makes it easy for developers to contribute packages.

Spark Packages will feature integrations with various data sources, management tools, higher level domain-specific libraries, machine learning algorithms, code samples, and other Spark content. Thanks to the package authors, the initial listing of packages includes scientific computing libraries, a job execution server, a connector for importing Avro data, tools for launching Spark on Google Compute Engine, and many others. We expect this list to grow substantially in 2015, and to help fuel this growth we’re continuing to invest in extension points to Spark such as the Spark SQL data sources API, the Spark streaming Receiver API, and the Spark ML pipeline API. Package authors who submit a listing retain full rights to your code, including your choice of open-source license.

Please give Spark Packages a try and let us know if you have any questions when working with the site! We expect to extend the site in the coming months while also building mechanisms in Spark to make using packages even easier. We hope Spark Packages lets you find even more great ways to work with Spark.