Antoine Amend - Databricks

Antoine Amend

Data Scientist, Barclays

Antoine Amend is a data scientist passionate about big data engineering and scalable computing. Graduated in 2008 with a Msc. in Astrophysics, he worked for a large consultancy business in Switzerland before discovering the concept of big data at the early stages of Hadoop. He has embraced big data technologies ever since, and is now working as the Head of Data Science for cyber security in a large financial institution. By combining a scientific approach with core IT skills, Antoine qualified two years running for the Big Data World Championships finals held in Austin TX



Story Deduplication and MutationSummit Europe 2017

We demonstrate how to use Spark Streaming to build a global News Scanner that scrapes news in near real time, and uses sophisticated text analysis, SimHash, Random Indexing and Streaming K-Means to produce a geopolitical monitoring tool that allows users to track major world events as they unfold. We highlight advanced spark techniques for scaling, including: using Apache NIFI to deliver data to Spark Streaming, using the Goose library with Spark to build web scrapers, how to de-duplicate streamed documents at scale using advanced techniques like SimHash, Random Indexing, and Streaming K-Means in order to detect, track and visualise "global media conversations” as they mutate over time. Session hashtag: #EUstr9