Apache Spark Usage in the Open Source Ecosystem

Download Slides

Apache Spark is an active member of the broad open source community beyond the Apache Foundation. Every day thousands of users combine capabilities of Spark with other open source software to get their job done. This is not by chance. Spark has been designed to behave well with existing ecosystems. For example, PySpark is designed to work well with Pandas, Numpy and other python packages. In this talk we will present an analysis of libraries and open source tools that are commonly used along with Spark in JVM, Python and R ecosystems. Our quantitative results are based on usage of thousands of Spark users. We will show the Spark Summit attendees what the rest of their community finds useful to complement the power of Spark and what parts of Spark API is used in conjunction with most popular open source libraries.