Benyue (Emma) Liu

Product Manager, TigerGraph

Emma has a mission to empower developers, data scientists and enterprise data analysts with easier access to graph analytics and features. She is a product manager at TigerGraph leading efforts in spark integration, cloud, enterprise features, ecosystem connectors. Emma has background in data management, cloud computing and complex system designs. Prior to TigerGraph, she worked at Oracle and MarkLogic. Emma holds a Bachelor of Science degree from Harvey Mudd College and a Master of Science degree from MIT.

Past sessions

As data grows in size and connectedness dramatically in all dimensions, the potential for graph-enriched machine learning grows likewise, but scalable technologies are needed to both build models and apply them in real-time. Real-time deep-link graph pattern matching and analytics provides new opportunities for enriching your machine learning models with graph features.

'In addition to the real-time deep-link aspect, the ability to process large datasets in a production pipeline provides a synergistic approach for the two distributed and performant platforms: Spark and TigerGraph. The TigerGraph graph database provides scalable real-time deep link graph analytics and augments Spark with graph analytics and predictions for a wide range of Machine Learning use cases.

In this session, we will explain the architecture and technical implementation for a TigerGraph+Spark graph-enhanced Machine Learning pipeline: Use TigerGraph both before training to extract (graph and non-graph) features and after training to apply the model on streaming data; use Spark to train and tune machine learning models at scale. As an example, we will present a solution in production at China Mobile that detects and prevents phone-based scams using machine learning with TigerGraph.

Specifically, the solution generates 118 graph features for 600 million users, to feed a machine learning system which detects three types of unwanted phone calls. TigerGraph then helps to deploy the model by extracting these 118 features in real-time for up to 10,000 calls per second, to give customers a real-time diagnosis of their incoming calls.