Sören is a software engineer in the Neo4j Graph Analytics team concentrating on big data query execution and graph algorithms. His interests cover working with Cypher in big data environments such as Spark SQL. Prior to joining Neo4j, he was studying at Leipzig University.
Spark 3.0 introduces a new module: Spark Graph. Spark Graph adds the popular query language Cypher, its accompanying Property Graph Model and graph algorithms to the data science toolbox. Graphs have a plethora of useful applications in recommendation, fraud detection and research.
Morpheus is an open-source library that is API compatible with Spark Graph and extends its functionality by:
We will demonstrate how to explore data in Spark, use Morpheus to transform data into a Property Graph, and then build a Graph Solution in Neo4j.
Relationships are one of the most predictive indicators of behavior and preferences. Communities detection based on relationships is a powerful tool for inferring similar preferences in peer groups, anticipating future behavior, estimating group resiliency, finding hierarchies, and preparing data for other analysis. Centrality measures based on relationships identify the most important items in a network and help us understand group dynamics such as influence, accessibility, the speed at which things spread, and bridges between groups. Data scientists use graph algorithms to identify groups and estimate important entities based on their interactions. In this session, we'll cover the common uses of community detection and centrality measures and how some of the iconic graph algorithms compute values. We'll show examples of how to run community detection and centrality algorithms in Apache Spark including using the AggregateMessages function to add your own algorithms. You'll learn best practices and tips for tricky situations. For those that want to run graph algorithms in a graph platform, we'll also illustrate a few examples in Neo4j. Some of the Community Detection Algorithms included: * Triangle Count and Clustering Coefficient to estimate network cohesiveness * Strongly Connected Components and Connected Components to find clusters * Label Propagation to quickly infer groups and data cleans with semi-supervised learning * Louvain Modularity to uncover at group hierarchies Balanced Triad to identify unstable groups * PageRank to reveal influencers * Betweenness Centrality to predict bottlenecks and bridges