Christoph Kreibich studied statistics at the University of Bamberg and worked at Daimler in the department of research and development for high-voltage battery systems analyzing data. Later he joined the Volkswagen Data:Lab, the center of competence for big data in the Volkswagen group, where he worked on various big data projects along the whole supply chain of an automotive OEM. Today he works in the IT department for production and logistics at Audi. There, he is responsible for different analytical projects for optimizing production processes and the set-up of the central data lake for the departments of production and logistics.
The process of painting a car is highly automated, highly complex and depends on various external variables which are, sometimes, difficult to control. Quality standards regarding paint and finish are extremely high at Audi as these are the most visible features of a car to a customer. Today, it takes years of experience to identify the main drivers of paint failures and keep standards accordingly high. For example, different types of paint require different settings regarding process values and application technique. In order to track the level of quality, every single car is inspected by quality assurance and every failure is documented. For documentation, there are more than 200 predefined types of failures available which are used for standardized documentation. While a car is being painted, data is collected from 2,500 sensors. Those parameters include temperatures, humidity, air flow of application robots, energy consumption, state of filters, etc. All of these variables may influence quality positively or negatively. The challenge of supporting process experts with valuable insights into data is solved by storing sensor data in the data lake and processing the data with Apache Spark and Scala on a HDFS cluster. To identify the most important drivers for paint quality for each failure and each layer of paint, 20 random forest models are being trained daily with MLlib. The results are stored in HDFS and visualized with Tableau. This session will give insights into the challenges of big data at an automotive OEM and how the production of Audi benefits from new big data technologies to make their processes more efficient and raise quality standards even higher. To achieve business benefits, Spark is being used along the whole process chain for data ingestion, transformation and training in a productive and completely automated environment. Session hashtag: #SFexp13