eBay has been using Analytical DBMS (ADBMS) data warehouse solution for over a decade, there are millions of batch queries running every day against 6000+ key DW tables, which contains over 22PB data (compressed) and still keeps booming every year. Based upon that, data services and products enables eBay business decisions and site features, so it has to be always available and accurate.
Apache Spark provides an open source and more scalable way of solution for such amount of data. Starting from beginning of this year, eBay has been working on migrating ADBMS batch workload to Spark, about 90% of them migrated in automatic way. Our team is leading the automation tools and pipeline to commit the accomplishment within this year.
In today’s session, we will introduce:
1. Tool sets which enables the auto migration engine: including metadata services, SQL convertor, Table/View generator, data mover, optimizer, pipeline generator, data validator, workflow controller many not only contributes in auto migration but also enables development work of individual engineers
2. End to end auto migration steps till cut over on production, starting from initializing on dev environment, unit test, data validation, integration test, release, parallel run, monitoring and cut over
Session hashtag: #SAISDD7
Edward is a data engineer manager in data service and solution (DSS) group in eBay. He has been working in big data industry for a decade. His team is focusing on big data processing and data applications development. Apache Spark is the major solution for feature engineering and ETL in the team now.
Lipeng Zhu is a software engineer at eBay's Data Services and Solutions (DSS) group, focusing on data warehouse automatic migration from ADBMS to Spark. Most of his work experience at eBay is working on the previous research and migration work from ADBMS to Open Source platform(MapReduce, Streaming, Spark)