To learn more about Apache Spark, attend Spark Summit East in New York in Feb 2016.
2015 has been a phenomenal year of growth for both Databricks and the Apache Spark project. In June, we launched general availability (GA) of our cloud platform, the first end-to-end enterprise data platform based on Spark. At the same time, we have continued our efforts in training Spark developers and of course in developing Spark itself. In this post, we wanted to share some updates about each of these efforts, and let you know what we’ve been up to in 2015:
When our team first designed Spark at UC Berkeley, we wanted to make writing big data applications easier. However, we realized that much more was needed to make big data simple for an organization: big data projects spend most of their effort managing infrastructure, loading data, and keeping production jobs running. This is why we developed Databricks, an end-to-end, managed platform based on Spark. With Databricks, organizations can immediately start working on their data problems, in an environment accessible to data scientists, engineers, and business users alike.
Since the GA of this platform in June, Databricks has been adopted by over 200 paying customers, with applications ranging from data warehousing and reporting to real-time machine learning. We have also learned a lot from our customers’ use of the platform. In particular, we saw three interesting trends:
To give a sense of what organizations have been able to do with Databricks, some of our customer highlights in 2015 included:
![]() |
With Databricks, Elsevier Labs – the advanced R&D group within Elsevier, a global provider of scientific information – completed advanced analytics projects faster (weeks to days) and broadened access to data (15 people contributing instead of limiting to two or three specialists). |
![]() |
MyFitnessPal’s legacy data pipeline was slow, did not scale, and lacked flexibility. Databricks helped them solve all of these challenges with our automatically managed Spark clusters, interactive workspace, and a production job scheduler to easily transition from development to production. |
![]() |
Celtra expanded the number of people able to work with their data by a factor of four allowing them to increase the amount of ad-hoc analysis done six-fold. View the webinar How Celtra Optimizes its Advertising Platform with Databricks to see how users across the organization use this data. |
As developers at heart, a key part of our mission has also been to empower other professionals to tackle big data problems. We are happy to note that in 2015 we trained over 20,000 developers on Spark, more than any other company.
Spark education was top of mind for us in 2015 with the launch of several key programs:
Our 2015 Spark Survey results validate that our work to make Spark data processing easy and accessible is resonating with Spark users across many industries. Key findings from the survey included:
As the Spark community expands at an amazing pace (with 650 contributors in 2015 alone), Databricks has continued to be the largest contributor to the Apache Spark project, providing 10x more code than any other company. We consider the success of Spark one of our key missions, and to this end we have contributed to all areas of Spark in 2015. Some of our major contributions this year were:
For a deep dive on the major additions in 2015, please read Reynold Xin’s blog post here: Spark 2015 Year In Review.
But it’s not all about the code: nurturing the Spark community is also about bringing together users. To this end, we have brought together 4000 attendees through three Spark Summits, bringing the conference to Europe and New York for the first time. We have also contributed to dozens of local meetup groups with our Meetup-in-the-box initiative. We plan to expand both these initiatives in 2016.
Finally, 2015 was also a significant year for our partners. We are happy to see IBM, Hortonworks, Intel, Cloudera, and MapR, just to name a few, investing significantly in Spark. We look forward to continuing our collaboration with them in 2016, to build a stronger Spark community.
While 2015 was exciting, we believe it is still only the beginning for both Databricks and Apache Spark. Our overall mission is to make big data simple, allowing every enterprise to gain value from its data. Our experience with Databricks customers so far shows that this is indeed possible: with a fully managed end-to-end platform, customers are completing projects in a fraction of the time it had taken with previous tools, and simultaneously making their data accessible to more users in their organization well beyond the “big data experts”. In 2016, we will continue to work with our customers, partners, and the Spark community to make extracting value from data even easier.