Nitin Raj Soundararajan is a technical consultant on artificial intelligence, big data and analytics, where he focuses on the deployment of AI and ML solutions to solve real business problems in multiple domain. Nitin Raj holds an undergraduate degree in Computer Science and Engineering. He enjoys speaking at academic and industry conferences to share his knowledge and passion for AI, machine learning, and big data analytics. He is working in providing artificial intelligence and big data analytics solutions to utility and energy sector.
Delta Lake, an open-source storage layer which brings ACID transactions to Apache Spark and big data workloads. Key takeaways: Audience can gain knowledge on what is Delta Lake and how Delta Lake can help us to save time and computational power. Audience can gain knowledge on various features supported by Delta Lake. Audience can gain knowledge on how they can implement delta lake in their environment to gain the benefits. Audience can get clear picture on how delta lake works with an example Below are the various features that will be covered in the presentation. ACID Transactions Data lakes have multiple data pipelines reading and writing data concurrently, and data engineers have to go through a tedious process to ensure data integrity, due to the lack of transactions. Delta Lake brings ACID transactions to your data lakes. It provides serializability, the strongest level of isolation level. Scalable Metadata Handling: In the big data world, even the metadata itself can be "big data". Delta Lake treats metadata just like data, leveraging Spark's distributed processing power to handle all its metadata. As a result, Delta Lake can handle petabyte-scale tables with billions of partitions and files at ease Time Travel (data versioning): Delta Lake provides snapshots of data enabling developers to access and revert to earlier versions of data for audits, rollbacks or to reproduce experiments. Open Format: All data in Delta Lake is stored in Apache Parquet format enabling Delta Lake to leverage the efficient compression and encoding schemes that are native to Parquet 100% Compatible with Apache Spark API: Developers can use Delta Lake with their existing data pipelines with minimal change as it is fully compatible with Spark, the commonly used big data processing engine