Matthew Proetsch

Data Scientis, U.S. Navy 

Lead Data Scientist in the Naval Enterprise Research and Data Science (N.E.R.D.S.) @ NAWCTSD working predicting maintenance for aviation platforms. MS from UC Berkeley in Information and Data Science.



Using Apache Spark for Predicting Degrading and Failing Parts in AviationSummit 2020

Throughout naval aviation, data lakes provide the raw material for generating insights into predictive maintenance and increasing readiness across many platforms. Successfully leveraging these data lakes can be technically challenging. However, the data they hold can inform maintenance decisions and help fleets improve readiness by revealing detectable conditions prior to component degradation and failure. Civilian and military aviation datasets are extremely large and heterogeneous. The authors are successfully using Spark to help overcome these challenges within ETL pipelines. Spark also facilitates ad-hoc and recurring reporting for aircraft component health checks at scale, which are created in collaboration with in-house engineering departments which flag recorded flights for known issues. Spark ML is used to flag anomalous data by fitting regression models to historical data and comparing model outputs to observed flights. Feature deviation from model output is measured for each new flight, and flights that appear to be anomalously out of expected ranges are flagged for human review.

Apache Spark has enabled a small team to handle a large volume of data spanning hundreds of schemas. The team has used Spark to parallelize aircraft component health scoring algorithms decreasing the running time of models to hours instead of days or weeks. Because of Spark's speed and versatility, it has become a major component within an official reporting architecture, and has successfully flagged parts prior to failure. A few shortcomings of Spark have also been encountered, including data visualization that is still being performed in Pandas. The authors will discuss and elaborate on their team's successful utilization of these tools, and future directions. Key Takeaways: -Civilian and military aviation data is difficult to work with due to volume and variety - Spark is specifically designed to tackle these issues - Spark is playing a major role in a small specialized team's aviation reporting and analysis architecture undefined undefined undefined