Daeyoung Kim

Senior Data Scientist,

Daeyoung Kim is currently a senior data scientist in BISTel Research team from 2015. He is working on product design & algorithm research for smart manufacturing solutions. Especially, he is interested in machine learning, deep learning & any other AI(Artificial Intelligence) algorithms for manufacturing solutions. He mainly focuses on developing time-series data analysis and image analysis solution based-on deep learning. He received all his degrees in industrial engineering from Seoul National University, Seoul, Korea.



Analyzing 2TB of Raw Trace Data from a Manufacturing Process: A First Use Case of Apache Spark for Semiconductor Wafers from Real IndustrySummit 2019

As the development of semiconductor devices, manufacturing system leads to improve productivity and efficiency for wafer fabrication. Owing to such improvement, the number of wafers yielded from the fabrication process has been rapidly increasing. However, current software systems for semiconductor wafers are not aimed at processing large number of wafers. To resolve this issue, the BISTel (a world-class provider of manufacturing intelligence solutions and services for manufacturers) tries to build several products for big data such as Trace Analyzer (TA) and Map Analyzer (MA) using Apache Spark.

TA is to analyze raw trace data from a manufacturing process. It captures details on all variable changes, big and small and give the traces' statistical summary (i.e.: min, max, slope, average, etc.). Several BISTel's customers, which are the top-tier semiconductor company in the world use the TA to analyze the massive raw trace data from their manufacturing process.

Especially, TA is able to manage terabytes of data by applying Apache Spark's APIs. MA is an advanced pattern recognition tool that sorts wafer yield maps and automatically identify common yield loss patterns. Also, some semiconductor companies use MA to identify clustering patterns for more than 100,000 wafers, which can be considered as big data in the semiconductor area. This talk will introduce these two products which are developed based on the Apache Spark and present how to handle the large-scale semiconductor data in the aspects of software techniques.