Graph Features in Spark 3.0: Integrating Graph Querying and Algorithms in Spark Graph

Download Slides

Spark 3.0 introduces a new module: Spark Graph. Spark Graph adds the popular query language Cypher, its accompanying Property Graph Model and Graph Algorithms to the data science toolbox. Graphs have a plethora of useful applications in recommendation, fraud detection and research. The tutorial aims to help understanding when graphs should be used and how Spark Graph can be used to extend analytical workflows. In this tutorial we will explore the concepts and motivations behind graph querying and graph algorithms, the components of the new Spark Graph module and their APIs, and how those APIs allow you to successfully write your own graph applications and integrate them in your data science workflows.

The tutorial is a mixture of presentation, code examples, and notebooks. We will demonstrate how to write an end-to-end Graph application that operates on different kinds of input data. We will show how Spark Graph interacts with Spark SQL and openCypher Morpheus, a Spark Graph extension that allows you to easily manage multiple graphs and provides built-in Property Graph Data Sources for the Neo4j graph database as well as Cypher language extensions.

At the end of the tutorial, attendees will have a good understanding of when to apply graphs in their data science workflows, how to bring Spark Graph into an existing Spark workflow and how to make best use of the new APIs. This tutorial will be both lead by the presenters and also hands-on interactive session. The tutorial material will be made available during the presentation.

« back
About Mats Rydberg


Mats has worked with Neo4j for more than four years with a focus on graph query language design and implementation. Mats is leading the development of the Cypher for Apache Spark (CAPS) project, now called Morpheus, which has been accepted as a Spark 3.0 major feature under the name of Spark Graph and will bring the leading graph query language Cypher to Apache Spark. Mats holds a Master's degree in Computer Science specialized on graph algorithms.

About Max Kie├čling


Max is a Software Engineer working at Neo4j. He holds a Master's degree from the University of Leipzig, where he worked on the distribution execution of declarative graph queries. At Neo4j he is part of the team responsible for the development of Cypher for Apache Spark / Morpheus as well as the graph algorithms library.