GraphFrames: Scaling Web-Scale Graph Analytics with Apache Spark

Download Slides

Graph analytics has a wide range of applications, from information propagation and network flow optimization to fraud and anomaly detection. The rise of social networks and the Internet of Things has given us complex web-scale graphs with billions of vertices and edges. However, in order to extract the hidden gems of understanding and information within those graphs, you need tools to analyze the graphs easily and efficiently.
At Spark Summit 2016, Databricks introduced GraphFrames, which implements graph queries and pattern matching on top of Spark SQL to simplify graph analytics. In this talk, we’ll discuss the work that has made graph algorithms in GraphFrames faster and more scalable. For example, new implementations of connected components have received algorithm improvements based on recent research, as well as performance improvements from Spark DataFrames. Discover lessons learned from scaling the implementation from millions to billions of nodes; see its performance in the context of other popular graph libraries; and hear about real-world applications.

Session hashtag: #EUds6