Information Solution Architect with over 20 years of experience. Extensive background in Data Management, Big Data, Information Systems, Data Governance as well as process and project management. Implementation of numerous solutions across a host of different architectures including IBM, Oracle, open source and datawarehouse appliances. Experience in database design, DBA, data integration, Security, Big Data, Business Analytics and advanced analytics. Implementation of open source software encompassing Hadoop (and peripheral components), Spark, R, Python, RDBMS and NoSQL technologies. Breadth of industry experience to each engagement with specific background in government, power, financial, manufacturing, technology, healthcare and insurance. Long track record of success and Delivery within time and budget. Managed up to 12 team members in various positions. Agnostic perspective to each assignment, providing the best overall solution to the challenge at hand.
May 28, 2021 11:05 AM PT
Change Data Feed is a new feature of Delta Lake on Databricks that is available as a public preview since DBR 8.2. This feature enables a new class of ETL workloads such as incremental table/view maintenance and change auditing that were not possible before. In short, users will now be able to query row level changes across different versions of a Delta table.
In this talk we will dive into how Change Data Feed works under the hood and how to use it with existing ETL jobs to make them more efficient and also go over some new workloads it can enable.
November 17, 2020 04:00 PM PT
The future of finance goes hand in hand with social responsibility, environmental stewardship and corporate ethics. In order to stay competitive, businesses are increasingly disclosing more information about their environmental, social and governance (ESG) performance.
In this free demo, we’ll demonstrate ways to use machine learning to extract the key ESG initiatives as communicated in yearly PDF reports and compare these with the actual media coverage from news analytics data. Afterwards, FinServe Technical Director Antoine Amend will be available to answer questions about this solution or any other financial services analytics use case questions you may have.
Speaker: Antoine Amend and Itai Weiss
June 25, 2020 05:00 PM PT
There's a need to develop a recovery process for Delta table in a DR scenario. Cloud multi-region sync is Asynchronous. This type of replication does not guarantee the chronological order of files at the target (DR) region. In some cases, we can expect large files to arrive later than small files. With Delta Lake, this can create an incomplete version at the DR site at the breakup point. The assumption is that the Primary (Prod) site is not reachable and therefore there’s a need to identify and fix the incomplete version of the Delta Lake table. Similar scenarios happen with RDBMS replication, they rely on their logs to restore the database to a stable version and run the recovery or reload process. This document will address this need and look for a solution that can be shared with customers.