Swisscom is the leading mobile-service provider in Switzerland, with a market share high enough to enable us to model and understand the collective mobility in every area of the country. To accomplish that, we built an urban planning tool that helps cities better manage their infrastructure based on data-based insights, produced with Apache Spark, YARN, Kafka and a good dose of machine learning. In this talk, we will explain how building such a tool involves mining a massive amount of raw data (1.5E9 records/day) to extract fine-grained mobility features from raw network traces. These features are obtained using different machine learning algorithms. For example, we built an algorithm that segments a trajectory into mobile and static periods and trained classifiers that enable us to distinguish between different means of transport. As we sketch the different algorithmic components, we will present our approach to continuously run and test them, which involves complex pipelines managed with Oozie and fuelled with ground truth data. Finally, we will delve into the streaming part of our analytics and see how network events allow Swisscom to understand the characteristics of the flow of people on roads and paths of interest. This requires making a link between network coverage information and geographical positioning in the space of milliseconds and using Spark streaming with libraries that were originally designed for batch processing. We will conclude on the advantages and pitfalls of Spark involved in running this kind of pipeline on a multi-tenant cluster. Audiences should come back from this talk with an overall picture of the use of Apache Spark and related components of its ecosystem in the field of trajectory mining.
François Garillot joined Swisscom in 2015, and has worked since on curating and understanding telecommunications data through big data tools. Previously, he has been working on Apache Spark Streaming's reliability at Lightbend (formerly Typesafe). A select few of interests span machine learning - especially online models, approximation & hashing techniques, control theory, and unsupervised time series analysis. But he also enjoys skiing, sailing and hunting for good cheese in his free time.
Mohamed Kafsi is a data scientist in the Mobility Insights team at Swisscom. He holds a Ph.D. in Machine Learning from EPFL. His passions are modelling and predicting [human] behaviours by mining large-scale datasets, and his expertise includes probabilistic graphical models, machine learning and information theory. Prior to working at Swisscom, he has studied at EPFL in Switzerland and Carnegie Mellon University in the US, with internships at Deutsche Telekom T-labs in Berlin, Nokia Research Center in Helsinki and Yahoo! Labs in San Francisco.