A Thorough Comparison of Delta Lake, Iceberg and Hudi

Recently, a set of modern table formats such as Delta Lake, Hudi, Iceberg spring out. Along with Hive Metastore these table formats are trying to solve problems that stand in traditional data lake for a long time with their declared features like ACID, schema evolution, upsert, time travel, incremental consumption etc. This talk will share the research that we did for the comparison about the key features and design these table format holds, the maturity of features, such as APIs expose to end user, how to work with compute engines and finally a comprehensive benchmark about transaction, upsert and mass partitions will be shared as references to audiences.


 
Try Databricks
« back
About Junjie Chen

Tencent

Senior Software Engineer at Tencent. Focus on big data area years, PPMC of TubeMQ, contributor of Hadoop, Spark, Hive, and Parquet.

About Junping Du

Tencent

Junping Du is chief architect for Tencent Cloud Big Data Department and responsible for cloud data warehouse engineering team. As Apache Hadoop Committer/PMC member, he serves as release manager of Hadoop 2.6.x and 2.8.x for community. Junping has more than 10 years industry experiences in big data and cloud area. Before joining Tencent, he was YARN team lead at Hortonworks. Prior to Hortonworks, he worked as tech lead for vHadoop and Big Data Extension at VMware.