Incremental Iceberg Table Replication at Scale
Overview
Experience | In Person |
---|---|
Type | Breakout |
Track | Data Lakehouse Architecture and Implementation |
Industry | Enterprise Technology |
Technologies | Apache Spark, Apache Iceberg |
Skill Level | Intermediate |
Duration | 40 min |
Apache Iceberg is a popular table format for managing large analytical datasets. But replicating iceberg tables at scale can be a daunting task — especially when dealing with its hierarchical metadata. In this talk, we present an end-to-end workflow for replicating Apache Iceberg tables, leveraging Apache Spark to ensure that backup tables remain identical to their source counterparts. More excitingly, we have contributed these libraries back to the open-source community.
Attendees will gain a comprehensive understanding of how to set up replication workflows for Iceberg tables, as well as practical guidance on how to manage and maintain replicated datasets at scale. This talk is ideal for data engineers, platform architects and practitioners looking to apply replication and disaster recovery for Apache Iceberg in complex data ecosystems.
Session Speakers
Szehon Ho
/Software Engineer
Databricks
Hongyue Zhang
/Software Engineer
Apple