Dr Peter Knight is a senior data scientist with the GE Aviation UK data science team. He has over fifteen years’ experience of developing analytics on aviation data sets. He has an Aerospace PhD and 1st class master’s degree from the University of Southampton. He has 4 papers and 5 patents to his name. He was awarded the Derek George Astridge Safety in Aerospace Award in 2005 and has also won a number of GE awards including ‘Global Customer Champion Award’, ‘Most Inspirational Employee Award’ and open innovation challenge winner.
GE Aviation has hundreds of data scientists and engineers developing algorithms. The majority of these people do not have the time to learn Apache Spark and continue to develop on local machines in Python or R. We also have lots of historical code that was not developed for Spark. However, the business wanted to deploy to a Spark environment for scalability, as quickly as possible. So how did we bridge the gap? A data scientist and software engineer will co-present to share how we approached the problem of building, unifying and scaling these algorithms.
GE is a world leader in the manufacture of commercial jet engines, offering products for many of the best-selling commercial airframes. With more than 33,000 engines in service, GE Aviation has a history of developing analytics for monitoring its commercial engines fleets. In recent years, GE Aviation Digital has developed advanced analytic solutions for engine monitoring, with the target of improving detection and reducing false alerts, when compared to conventional analytic approaches. The advanced analytics are implemented in a real-time monitoring system which notifies GE's Fleet Support team on a per flight basis. These analytics are developed and validated using large, historical datasets. Analytic tools such as SQL Server and MATLAB were used until recently, when GE's data was moved to an Apache Spark environment. Consequently, our advanced analytics are now being migrated to Spark, where there should also be performance gains with bigger data sets. In this talk we will share experiences of converting our advanced algorithms to custom Spark ML pipelines, as well as outlining various case studies. Session hashtag: #SAISExp12