Junjie Chen

Senior Software Engineer, Tencent

Senior Software Engineer at Tencent. Focus on big data area years, PPMC of TubeMQ, contributor of Hadoop, Spark, Hive, and Parquet.

Past sessions

Summit 2020 A Thorough Comparison of Delta Lake, Iceberg and Hudi

June 23, 2020 05:00 PM PT

Recently, a set of modern table formats such as Delta Lake, Hudi, Iceberg spring out. Along with Hive Metastore these table formats are trying to solve problems that stand in traditional data lake for a long time with their declared features like ACID, schema evolution, upsert, time travel, incremental consumption etc. This talk will share the research that we did for the comparison about the key features and design these table format holds, the maturity of features, such as APIs expose to end user, how to work with compute engines and finally a comprehensive benchmark about transaction, upsert and mass partitions will be shared as references to audiences.

Kubernetes As of Spark 2.3, Spark can run on clusters managed by Kubernetes. we will describes the best practices about running Spark SQL on Kubernetes upon Tencent cloud includes how to deploy Kubernetes against public cloud platform to maximum resource utilization and how to tune configurations of Spark to take advantage of Kubernetes resource manager to achieve best performance. To evaluate performance, the TPC-DS benchmarking tool will be used to analysis performance impact of queries between configurations set.