Massive-Scale Entity Resolution Using the Power of Apache Spark and Graph - Databricks

Massive-Scale Entity Resolution Using the Power of Apache Spark and Graph

Download Slides

Spark’s graph capabilities are great at enabling analysis of networks for use-cases such as fraud-detection, illicit network detection, and supply chain risk analysis. However, in order for a data scientist to perform analytics on a network (e.g., Page Rank, community detection, etc.), they end up spending all their time fighting a mountain of data integration challenges. A specific challenge this talk will focus on is connecting entities in a network within and across data domains.

We will explore how you can leverage the Spark ecosystem’s graph capabilities to perform massive-scale entity resolution (ER). As a result, your data scientists will be able to more quickly and effectively perform graph analytics that drive business and mission value. Key takeaways:

  • The Spark ecosystem enables you to quickly get started with graph analytics use-cases at scale
  • Complementing traditional ER techniques with the context of graph relationships allows you to connect entities that you could not easily connect before
  •  

    Try Databricks
    See More Spark + AI Summit in San Francisco 2019 Videos


    « back
    About Max Melnick

    I am passionate about making a positive impact by creating technology products with my strong blend of technical and business skills. I have 8+ years of experience in the federal public service industry and consumer products industry focusing on technology product development, big data/graph analytics, solution architecture, and machine learning. I also enjoy sports / exercise, travel, cooking/eating, reading, listening to podcasts, and learning new things! More details: http://maxmelnick.com/about/