Fuzzy matching with Spark

Download Slides

Business data comes with a lot of noise. To effectively model and analyze the vast amounts of ever growing data, we need effective tools to link and group similar entities together. In this talk, we will discuss how we have used Spark’s machine learning, distributed and in memory capabilities to create a fuzzy matching engine which can learn from given samples of similar records and apply that knowledge to cleanse, deduplicate and link records.

« back