Information Solution Architect with over 20 years of experience. Extensive background in Data Management, Big Data, Information Systems, Data Governance as well as process and project management. Implementation of numerous solutions across a host of different architectures including IBM, Oracle, open source and datawarehouse appliances. Experience in database design, DBA, data integration, Security, Big Data, Business Analytics and advanced analytics. Implementation of open source software encompassing Hadoop (and peripheral components), Spark, R, Python, RDBMS and NoSQL technologies. Breadth of industry experience to each engagement with specific background in government, power, financial, manufacturing, technology, healthcare and insurance. Long track record of success and Delivery within time and budget. Managed up to 12 team members in various positions. Agnostic perspective to each assignment, providing the best overall solution to the challenge at hand.
There's a need to develop a recovery process for Delta table in a DR scenario. Cloud multi-region sync is Asynchronous. This type of replication does not guarantee the chronological order of files at the target (DR) region. In some cases, we can expect large files to arrive later than small files. With Delta Lake, this can create an incomplete version at the DR site at the breakup point. The assumption is that the Primary (Prod) site is not reachable and therefore there’s a need to identify and fix the incomplete version of the Delta Lake table. Similar scenarios happen with RDBMS replication, they rely on their logs to restore the database to a stable version and run the recovery or reload process. This document will address this need and look for a solution that can be shared with customers.