Andrew Morgan - Databricks

Andrew Morgan

Founder, ByteSumo Ltd

Andrew Morgan is a specialist in data strategy and its execution, and has deep experience in the supporting technologies, system architecture, and data science that bring it to life. With over 20 years of experience in the data industry, he has worked designing systems for some of its most prestigious players and their global clients – often on large, complex and international projects. In 2013, he founded ByteSumo Ltd, a data science and big data engineering consultancy, and he now works with clients in Europe and the USA

UPCOMING SESSIONS

PAST SESSIONS

Story Deduplication and MutationSummit Europe 2017

We demonstrate how to use Spark Streaming to build a global News Scanner that scrapes news in near real time, and uses sophisticated text analysis, SimHash, Random Indexing and Streaming K-Means to produce a geopolitical monitoring tool that allows users to track major world events as they unfold. We highlight advanced spark techniques for scaling, including: using Apache NIFI to deliver data to Spark Streaming, using the Goose library with Spark to build web scrapers, how to de-duplicate streamed documents at scale using advanced techniques like SimHash, Random Indexing, and Streaming K-Means in order to detect, track and visualise "global media conversations” as they mutate over time. Session hashtag: #EUstr9