Paul is a Senior Data Scientist in Amadeus Airlines Data Unit. He has a Ph.D. degree in particle physics from Fermi National Accelerator Laboratory in Chicago, IL (USA). Before joining Amadeus, he has worked 9 years on scientific projects for the European Space Agency. These last years Paul has focused on providing value for airlines. He is a machine learning expert and highly skilled in analytics in identifying anomalies data using unsupervised learning techniques and building predictive models using supervised algorithms including the use of ensemble methods of decision trees.
Nowadays, Airlines have understood that traditional customer segmentation in the airline industry by booking class does not reflect the complex passenger's behavior. As one of the main providers of IT solutions for Airlines industry, Amadeus has the resources and infrastructure to manage all the ticketing and booking data as well as understanding the Airline needs and market particularities. By combining different data sources produced by the different airline systems, we have applied unsupervised machine learning techniques to improve our understanding of customer behavior. For this product development, featuring engineering was applied with diverse variables including demographics information, ancillary, customer RFM, purchase etc. All the ETL process has been implemented with Spark API on Scala (using both Spark 1.6 and 2.1), and the SparkML library was used for the clustering. Simulations were performed on our own cluster. We will present results of a customer segmentation analysis for Airlines and how our results differ from the traditional rules based on business experience or intuition. More important, we want to show how Spark can be used as the main tool for Machine Learning analysis with Big Data to create relevant business insight for Airlines. Keywords: Airline industry; Segmentation; SparkML: Business insight Session hashtag: #SAISExp17