Hongchan Roh - Databricks

Hongchan Roh

Software Engineer, SK Telecom

A Ph.D. and software engineer in data engineering for big data and machine learning systems. Published top tier conference and journal papers including VLDB, IEEE TKDE, and Information Systems. The Project Leader of FlashBase (distributed in-memory DBMS optimized for DRAM/SSDs) in SKT Software R&D Center

UPCOMING SESSIONS

Apache Spark AI Use Case in Telco: Network Quality Analysis and Prediction with Geospatial VisualizationSummit Europe 2019

In this talk, we will present how we analyze, predict, and visualize network quality data, as a spark AI use case in a telecommunications company. SK Telecom is the largest wireless telecommunications provider in South Korea with 300,000 cells and 27 million subscribers. These 300,000 cells generate data every 10 seconds, the total size of which is 60TB, 120 billion records per day. In order to address previous problems of Spark based on HDFS, we have developed a new data store for SparkSQL consisting of Redis and RocksDB that allows us to distribute and store these data in real time and analyze it right away, We were not satisfied with being able to analyze network quality in real-time, we tried to predict network quality in near future in order to quickly detect and recover network device failures, by designing network signal pattern-aware DNN model and a new in-memory data pipeline from spark to tensorflow. In addition, by integrating Apache Livy and MapboxGL to SparkSQL and our new store, we have built a geospatial visualization system that shows the current population and signal strength of 300,000 cells on the map in real time. Topics -The architecture of our How we utilize Redis & RocksDB in order to store tremendous data in an efficient way. -The architecture of Spark Data Source for Redis: filter out irrelevant Redis keys using filter pushdown. -How we reduce memory usage of Spark driver and prevent its OutOfMemoryError. -Better prediction model for network quality prediction than RNN. -How we train a prediction model for network quality of 300,000 cells each of which has different signal patterns. -How we visualize in geospatial data: Customized logical plan for spatial query aggregation & pushdown -How we optimize Spatial query: aggregation pushdown and vectorized aggregation using SIMD

PAST SESSIONS