GraphFrames: Graph Queries In Spark SQL - Databricks

GraphFrames: Graph Queries In Spark SQL

Download Slides

Graph analysis is important in domains including commerce, social networks, and medicine. Graph analysis comes in two forms: pattern matching to find subgraphs of interest, and graph algorithms such as PageRank and triangle counting. GraphX and similar systems have made it possible to run graph algorithms within relational systems like Spark, but until recently, pattern queries required moving data manually to a specialized graph database. GraphFrames is a new effort to integrate pattern matching and graph algorithms with Spark SQL, simplifying the graph analytics pipeline and enabling optimizations across graph and relational queries. A key component of GraphFrames is our graph-aware query planner, which can speed up queries by an order of magnitude. We will describe the GraphFrame API, its query planning algorithm, and the latest performance results.

Learn more:

  • Introducing GraphFrames
  • GraphX and GraphFrames
  • GraphFrames: Graph Queries In Spark SQL

  • « back