Session

Incremental Iceberg Table Replication at Scale

Overview

ExperienceIn Person
TypeBreakout
TrackData Lakehouse Architecture and Implementation
IndustryEnterprise Technology
TechnologiesApache Spark, Apache Iceberg
Skill LevelIntermediate
Duration40 min

Apache Iceberg is a popular table format for managing large analytical datasets. But replicating iceberg tables at scale can be a daunting task — especially when dealing with its hierarchical metadata. In this talk, we present an end-to-end workflow for replicating Apache Iceberg tables, leveraging Apache Spark to ensure that backup tables remain identical to their source counterparts. More excitingly, we have contributed these libraries back to the open-source community.

 

Attendees will gain a comprehensive understanding of how to set up replication workflows for Iceberg tables, as well as practical guidance on how to manage and maintain replicated datasets at scale. This talk is ideal for data engineers, platform architects and practitioners looking to apply replication and disaster recovery for Apache Iceberg in complex data ecosystems.

Session Speakers

Szehon Ho

/Software Engineer
Databricks

Hongyue Zhang

/Software Engineer
Apple