We ran the Spark Survey 2015 this summer to gain insights on how organizations are using Apache Spark.
The results of this year’s Spark Survey – reflecting the answers and opinions of over 1,417 respondents representing 842 organizations – strongly indicate the rapid growth of the Spark community and offers valuable insight into the direction Spark is moving. A key focus of Spark has been to make data processing easy and accessible. The results of this year’s survey suggest that this is resonating with Spark users across many industries.
To learn more, download the Spark Survey 2015 Report.
The three key takeaways from the Spark Survey are:
- Spark Adoption Is Growing Rapidly: Spark is the most active open source project in Big Data with over 600 contributors in the last 12 months (up from 315 in the previous 12-24 months). Just as important, it is being used to create many types of products inside different organizations (69% of respondents are creating two or more data products with Spark). Spark is being embraced by companies far beyond the IT industry and by a growing variety of functional roles within these companies (e.g. data scientists and analysts).
- Spark Use Is Growing Beyond Hadoop: Spark usage in the public cloud (51%) and within Spark’s own cluster manager (48%) have surged within the last year. While some run Spark in on-premise Hadoop clusters, they are no longer a majority of its users. As well, Spark integrates with many storage systems (e.g. Cassandra, HBase, S3). Spark is also pluggable, with dozens of third party libraries and storage integrations.
- Spark Is Increasing Access to Big Data: Spark is unlocking the value of Big Data by making it easier for a wide range of people (e.g. 41% data engineers, 22% data scientists, etc.) to solve a growing variety of data problems. The ability to allows users to program in the language of their choice is an important reason that is driving Spark use across a growing audience of data scientists writing in Python and R. Spark users are expanding into the areas of advanced analytics and real-time streaming while building foundations on data warehousing and BI.
In conclusion, as a result of the insights revealed in the results of Spark Survey 2015, we have a better picture of who is using Spark, how they’re using it, and what they’re using it to build. These insights will guide major updates to the Spark platform as we move into Spark’s next phase of growth. Thank you to everyone who participated in Spark Survey 2015 and for your help in shaping Spark’s future!