UPDATE: Slides and videos from the Summit are now available! Check them out now!
We are delighted about the success of Spark Summit 2015 in San Francisco on June 15th and 16th, with three different sold-out Spark Training sessions on June 17th. This is the largest Spark Summit to date with more than 2000 attendees! Databricks is proud to make all talk videos, slides, training talk videos, and training materials available online for free as a service to the Apache Spark community. Slides will be available on the Spark Summit 2015 agenda page and videos will be published there too as soon as we finish editing them.
Key Announcements
Matei Zaharia, the creator of Spark, and Patrick Wendell - both co-founders of Databricks - opened the summit with a talk about the Spark Community Update. In it they described how Apache Spark continues to grow quickly, with new features including data frames, R support, and machine learning pipelines added in the past few releases.
In the next keynote, Ion Stoica, CEO of Databricks, and Ali Ghodsi, VP Engineering and Product Management of Databricks, talked about Powering Data Science with Spark. In it, they talked about how Databricks makes big data simple by enabling data professionals to easily solve their data challenges and by leveraging the power of Spark. In it, they announced that Databricks is generally available!
In a great partnership milestone, Databricks and IBM had also announced a joint effort to contribute key machine learning capabilities to the Apache Spark Project. Over the course of the next few months, Databricks and IBM will collaborate to expand Spark’s machine learning capabilities.
Keynotes
With Spark Summit being a community event focused on data science and data engineering at scale, some of our keynote highlights included:
- Software Above the Level of a Single Device: The Implications – Tim O'Reilly (O'Reilly Media)
- Spark at NASA/JPL – Chris Mattmann (NASA)
- Perspectives on Big Data & Analytics - Doug Wolfe (Central Intelligence Agency)
- Fireside chat with Ben Horowitz (Andreessen Horowitz)
- Field Notes from Expeditions in the Cloud – Matt Wood (Amazon Web Services)
- How Spark Fits into Baidu's Scale – James Peng (Baidu)
- A Tale of a Data-Driven Culture – Gloria Lau (Timeful/Google)
Community Talks
With more than 260 submissions, this year’s Spark Summit had one of the most amazing schedules to date, with some session highlights including:
- Appraiser : How Airbnb Generates Complex Models in Spark for Demand Prediction – Hector Yee (Airbnb)
- Spark and Spark Streaming at Netflix – Kedar Sadekar (Netflix), Monal Daxini (Netflix)
- Dynamic Community Detection for Large-scale e-Commerce data with Spark Streaming and GraphX – Ming Huang (Taobao Inc, Alibaba Group)
- Lessons Learned with Spark at the US Patent & Trademark Office – Christopher Bradford (OpenSource Connections)
- Solving Low Latency Query Over Big Data with Spark SQL – Julien Pierre (Microsoft)
- SparkR: The Past, the Present and the Future – Shivaram Venkataraman (UC Berkeley AMPLAB), Rui Sun (Intel Asia Pacific R&D)
- Use of Spark MLlib for Predicting the Offlining of Digital Media – Christopher Burdorf (NBC Universal)
Training
We had Spark Training the day following Spark Summit, we trained over 500 students to use Spark in three parallel classes.
Learn more about Spark training classes run by Databricks on the training portion of our website.
Learn More
For Spark enthusiasts abroad, the first Spark Summit Europe will be in Amsterdam from October 27th to 29th. by June 23 and register now to get a discount.
To keep up with Spark and Databricks news, don’t forget to sign up for our monthly newsletter.