Time Series Analysis with Spark in the Automotive R&D Process

Download Slides

The increase in the number of sensors and other electrical devices in cars in the past years has drastically increased the amount of data cars produce. This data increase is specially prominent in the field of advanced driver assistance systems (ADAS) and autonomous driving (AD), where a single car can be generating data at the rate of 2 GB/s. During the research and development of the cars and ADAS/AD algorithms those data have to be recorded, stored and analyzed and repeatedly simulated. The algorithmic challenges related to distributed processing and analysis of the time-series data type are presented along with the solutions which we have developed in the last years working with the R&D departments of the major car manufacturers in Germany. One such challenge is finding pre-defined failure events in the logged data from devices within a drive and performing statistical or root-cause analysis on them. The failure events are defined as a combination of conditions on the signals coming from the devices. We present a higher level language on top of Spark, which we developed to help specifying those conditions as well as the analysis. The adoption of this language significantly increased the efficiency of our data science efforts. Furthermore, a special care was taken to optimize the storage schemas for both efficient storage as well as fast computations. Finally, the communication between the devices in the car happens on a vehicle bus and is governed by the principle of synchronized state machines. State machines are inherently sequential, with the next state being dependent on the previous one, which poses a particular problem when trying to analyze and test them in parallel. We present an algorithm which handles the parallel analysis of state machine outputs using Spark.

About Miha Pelko

Miha finished a PhD in computational neuroscience from the University of Edinburgh and has first worked as a data scientist in the field of sports prediction and lately as a data scientist at NorCom IT AG in the field of automotive R&D industry, working with several German car manufacturers to introduce Hadoop and Spark in their data processing workflows including data storage, analysis, modelling and simulations.

About Tilmann Piffl

Til is currently working as data scientist and data engineer at NorCom IT AG in Munich where he explores the potential of Spark and Hadoop in the automotive R&D processes. After gaining his PhD in Astrophysics in Potsdam, he worked as a post-doc in Theoretical Physics at the University of Oxford where he extracted the shape, mass and other properties of our Galaxy from spectroscopic stellar