Graph data and graph analytics are increasingly important in data science and engineering. Cypher is an open language used for querying and updating graph databases and analytics platforms, which is now available in the Apache Spark environment. Neo4j Morpheus leverages the open source graph language project to integrate data from Neo4j operational graph databases with Hive and JDBC SQL data sources, using new Cypher features like the Property Graph Catalog, named graphs, graph projection, parameterized graph view functions, and graph/table views. Input and output graphs can be loaded and stored as structured collections of DataFrames with strong graph schemas to ensure data consistency and graph query optimization.
Property graphs can also be analyzed and transformed using graph algorithms such as those in the GraphFrames project. Besides describing and demonstrating these capabilities, this talk also discusses the Spark Project Improvement Proposal to bring Cypher into Spark 3.0, and outlines current work to unify Cypher with other graph query languages to form a new ISO standard Graph Query Language.
openCypher and SQL Property Graphs standards contributor Lead, Query Languages Standards and Research, Neo4j Inc. Product Manager, Neo4j Morpheus/Cypher for Apache Spark Head of Enterprise Data Distribution Infrastructure, Barclays Investment Bank, 2011-2015 Co-author OASIS Business Transaction Protocol 1.1
Martin Junghanns is part of the Cypher for Apache Spark Engineering team at Neo4j. He is also the main developer of Gradoop, a system for graph analytics on distributed data flow systems. Martin holds a MSc Computer Science degree from the University of Leipzig.