Looking to learn more about Delta Lake? Want to see what’s the latest development in the project? Want to engage with other community members? If so, we invite you to attend this year’s Data + AI Summit! This global event brings together thousands of practitioners, industry leaders, and visionaries to engage in thought-provoking dialogue and share the latest innovations in data and AI.
At this year’s summit, we are very excited to have visionaries from the Data and AI community, including Andrew Ng, Zhamak Dehgani, Christopher Manning, Matei Zaharia, Tarika Barrett, Peter Norvig, Daphne Koller, Ali Ghodsi, as well as companies who are building innovative products like NASDAQ, Scribd, and Apple. They will share how they are leveraging Delta Lake to solve high-impact data-driven use cases that can benefit any organization. From learning how to deliver interactive analytics at a massive scale to solving healthcare price transparency to modernizing big data for finance and more – this conference will provide high-value insights for all technical and business-focused stakeholders.
Events for the Delta Lake Community!
Whether you are an active contributor, a regular user of Delta Lake, or just curious about the fast-growing Delta Lake community, we invite you to these community-focused events at the Data + AI Summit. This is a great opportunity for the community to come together, celebrate and engage with the project maintainers and leading contributors. Don’t forget to tune into Day 1 Opening Keynote on Delta Lake by Michael Armbrust scheduled for Tuesday, June 28 at 8:30 AM PST. Then head over to the following sessions:
Bring any and all questions regarding Delta Lake. Are you curious about the Delta Lake roadmap, upcoming features, and recent releases by the community? This is an in-person version of our Delta Lake Community Office Hours. In this rapid-fire question format, our panel will field your toughest questions! That’s right, ask them anything about Delta Lake and how to engage with the community!
This is your chance to get your questions answered and learn about what others are asking. In this AMA-style session, we are bringing a panel of Delta Lake Committers. and Rust Programming experts too! That’s right! So join them and ask away your questions!
Delta Lake Contributor Meetup with Delta Lake Birthday Party
Wednesday, June 29 @6:30 PM PST
Featured Guests: Dominique Brezinski (Apple) and Michael Armbrust (Databricks)
It’s a Delta Lake birthday party, so come meet and greet with Delta Lake contributors and committers on all things data engineering, data architecture, and Delta Lake. But we’re not here just to enjoy your festivities, come with your technical questions as we will have multiple panels to answer your questions. You will have the opportunity to learn more about how Delta Lake started with a fireside chat with Dominique Brezinski from Apple and Michael Armbrust from Databricks.
Can’t-Miss Sessions Featuring Delta Lake
The volumes of data that are being collected and stored for analysis and to drive decisions are reaching levels that make it difficult for even the most seasoned data engineering and data science teams to manage and extract insights. With the advent of Delta Lake, data that was previously locked inside data lakes or proprietary data warehouses can be processed and operationalized, turning data into insights quickly and reliably.
Here are four sessions that put Delta Lake, front and center, and are sure to capture the attention of data scientists or ML engineers interested in maximizing the value of their data lake.
Diving into Delta Lake Integrations, Features, and Roadmap
Thursday, June 30 @9:15 AM
- Tathagata Das, Databricks
- Denny Lee, Databricks
The Delta ecosystem rapidly expanded with the release of Delta Lake 1.2 which included a variety of integrations and feature updates. Join this session to learn about how the wider Delta community collaborated together to bring these features and integrations together; as well as the current roadmap. This will be an interactive session so come prepared with your questions – we should have answers!
Delta Lake, the Foundation of Your Lakehouse
Tuesday, June 28 @2:05 PM
- Himanshu Raja, Databricks
- Hagai Attias, Akamai Technologies
Delta Lake is the open source storage layer that makes the Databricks Lakehouse Platform possible by adding reliability, performance, and scalability to your data, wherever it is located. Join this session for an inside look at what is under the hood of Databricks – see how Delta Lake, by adding ACID transactions and versioning to Parquet files together with the Photon engine, provides customers with huge performance gains and the ability to address new challenges. This session will include a demo and overview of customer use cases unlocked by Delta Lake and the benefits of running Delta Lake workloads on Databricks.
Ensuring Correct Distributed Writes to Delta Lake in Rust with Formal Verification
Tuesday, June 28 @4:00 PM PST
- QP Hou, Neuralink
Rust ensures memory safety, but bugs can still make it into implementations, so what can be done to avoid this? In this session, the concept of common, formal verification methods used in distributed system designs and implementations will be reviewed, as well as the use of TLA+ and stateright to formally model delta-rs multi-writer S3 backend implementation. Learn how the combination of both Rust and formal verification in this way results in an efficient, native Delta Lake implementation that is free of both memory and logical bugs!
A Modern Approach to Big Data for Finance
Wednesday, June 29 @2:05 PM PDT
- Bill Dague, Nasdaq
- Leonid Rosenfeld, Nasdaq
There are unique challenges associated with working with big data for applications in finance including the impact of high data volumes, disparate storage, variable sharing protocols, and more. By leveraging open source technologies such as Databricks’ Delta Sharing, in combination with a flexible data management stack, organizations can be more nimble and innovative with their analytics strategies. See this in action with a live demonstration of Delta Sharing in combination with Nasdaq Data Fabric.
Near Real-Time Analytics with Event Streaming, Live Tables, and Delta Sharing
Thursday, June 30 @10:45 AM PDT
- Christina Taylor, Carvana
While microservice architectures are embraced by application teams as it enables services to be developed and scaled independently and in a distributed fashion, data teams must take a consolidated approach with centralized repositories where data from these services actually come together to be joined and aggregated for analysis. Learn how these approaches can work together through a streaming architecture leveraging Delta Live Tables and Delta Sharing to enable near real-time analytics and business intelligence – even across clouds.
Additional Sessions Featuring Delta Lake
Journey to Solving Healthcare Price Transparency with Databricks and Delta Lake
Tuesday, June 28 @10:45 AM PDT
- Ross Silberquit, Cigna
- Narayanan Hariharasubramanian, Cigna
US Government’s Price Transparency mandate requires the Healthcare industry to generate Machine-Readable Files (MRF) of different types for different procedures, which involves handling Terabytes of data by ingesting, transforming, aggregating and hosting on a public domain. Join this session and learn how Cigna was able to deliver on this mandate with the help of Databricks running on AWS, Delta Lake, and Apache Spark.
Streaming ML Enrichment Framework Using Advanced Delta Table Features
Tuesday, June 28 @10:45 AM PDT
- Peter Vasko, Socialbakers
The challenge for Socialbakers’ marketing SaaS platform was how to build a scalable framework for data scientists and ML engineers that could accommodate hundreds of generic or customer-specific ML models, running both in streaming and batch, capable of processing 100+ million records per day from customer social media networks. The goal was achieved using Apache Spark™, Delta Lake, and clever usage of Delta Table features.
In this session we will share the ideas behind the framework and how to efficiently combine Spark structured streaming and Delta Tables.
The Road to a Robust Data Lake: Utilizing Delta Lake and Databricks to Map 150 Million Miles of Roads a Month
Tuesday, June 28 @11:30 AM PDT
- Itai Yaffe, Databricks
- Ofir Kerker, Nexar
In the past, stream processing over data lakes required a lot of development efforts from data engineering teams. Today, with Delta Lake and Databricks Auto Loader, this becomes a few minutes’ work and unlocks a new set of ways to efficiently leverage your data.
In this talk, learn how Nexar, a leading provider of dynamic mapping solutions, utilizes Delta Lake and advanced features such as Auto Loader to map 150 million miles of roads a month and provide meaningful insights to cities, mobility companies, driving apps, and insurers.
Automate Your Delta Lake or Practical Insights on Building Distributed Data Mesh
Tuesday, June 28 @2:05 PM PDT
- Serge Smertin, Databricks
We all live in the exciting times and the hype of Distributed Data Mesh (or just mess!). In this session, we will discuss a couple of architectural and organizational approaches for achieving Distributed Data Mesh, which is essentially a combination of mindset, fully automated infrastructure, continuous integration for data pipelines, dedicated team collaborative environments, and security enforcement. This should appeal to data leaders, data scientists, and anyone interested in DevOps
Self-Serve, Automated and Robust CDC pipeline using AWS DMS, DynamoDB Streams and Delta Lake
Tuesday, June 28 @ 2:05 PM PDT
- Dibyendu Karmakar, Swiggy
In this session, learn how the team at Swiggy designed and developed a CDC-based system to solve the challenges of ingesting transactional data in Delta Lake, and dealt with late-arriving updates and deletes, enabling near real-time availability of data, eliminated bulk ingestion, and optimized costs.
Productionizing Ethical Credit Scoring Systems with Delta Lake, Feature Store and MLFlow
Tuesday, June 28 @4:00 PM PDT
- Jeanne Choo, Databricks
Although Fairness, Ethics, Accountability, and Transparency (FEAT) are must-haves for high-stakes machine learning models (eg. credit scoring systems), a lack of concrete guidelines, common standards, and technical templates make productionizing responsible AI systems challenging. In this talk, we demonstrate how an open-source code example of a responsible credit scoring application, developed by the Monetary Authority of Singapore’s Veritas Consortium, might be put into production using tools such as Delta Lake and MLflow.
Enabling BI in a Lakehouse Environment: How Spark and Delta Can Help With Automating a DWH Development
Wednesday, June 29 @10:45 AM PDT
- Ivana Pejeva, element61
- Yoshi Coppens, element61
Traditional data warehouses typically struggle when it comes to handling large volumes of data and traffic, particularly when it comes to unstructured data. By contrast, data lakes overcome such issues and have become the central hub for storing data. In this session, learn how the team at element61 uses a framework that includes Apache Spark™ and Delta Lake, to bridge BI with modern-day use cases, such as machine learning and real-time analytics. The session outlines the original challenges, the steps taken, and the technical hurdles that were faced.
Streaming Data into Delta Lake with Rust and Kafka
Wednesday, June 29 @11:30 AM PDT
- Christian Williams, Scribd
The future of Scribd’s data platform is trending toward real-time. A notable challenge has been streaming data into Delta Lake in a fast, reliable, and efficient manner. To help address this problem, Scribd developed two foundational open source projects: delta-rs and kafka-delta-ingest. Join this session for a closer look at the architecture of kafka-delta-ingest, and how it fits into a larger, real-time data ecosystem at Scribd.
Evolution of Data Architectures and How to Build a Lakehouse
Thursday, June 30 @8:30 AM PDT
- Vini Jaiswal, Databricks
A lakehouse architecture combines data management capability including reliability, integrity, and quality from the data warehouse and supports workloads from different data domains including advanced analytics and Artificial Intelligence with the low cost and open approach of data lakes. Data Practitioners will learn core concepts of building an efficient Lakehouse with Delta Lake.
DBA Perspective—Optimizing Performance Table-by-Table
Thursday, June 30 @9:15 AM PDT
- Douglas Moore, Databricks
As a DBA for your organization’s lakehouse, it’s your job to stay on top of performance & cost optimization techniques. In this session, learn how to use the available Delta Lake tools to tune your jobs and optimize your tables.
Discover Data Lakehouse With End-to-End Lineage
Thursday, June 30 @10:00 AM PDT
- Tao Feng, Databricks
Data Lineage is key for managing change, ensuring data quality, and implementing Data Governance in an organization. In this talk, we will talk about how to capture table and column lineage for Spark, Delta Lake, and Unity Catalog and how users could leverage data lineage to serve various use cases.
GIS Pipeline Acceleration with Apache Sedona
Thursday, June 30 @10:00 AM PDT
- Alihan Zihna, CKDelta
- Fernando Ayuso Palacios, CKDelta (Hutchison Group)
CKDelta ingests and processes a massive amount of geospatial data to deliver insights for their customers. Using Apache Sedona together with Databricks they have been able to accelerate their data pipelines many times over. In this session, learn how the CKDelkta data team migrated their existing data pipelines to Sedona and PySpark and the pitfalls encountered along the way.
Expert Training featuring Delta Lake
Take your understanding of Delta Lake to the next level. Check out the following training session designed to broaden your experience with and usage of Delta Lake features and functionality.
Training: Lakehouse with Delta Lake Deep Dive
Monday, June 27 @8:00 AM PDT, and @1:00 PM PDT
Thursday, June 30 @ 8:00 AM PDT, and @1:00 PM PDT
- Audience: All Audiences
- Duration: 1 half-day
- Hands-on labs: No
Want to develop your expertise on building end-to-end data pipelines using Delta Lake?
In this course, you will learn about applying software engineering principles with Databricks and how to build end-to-end OLAP data pipelines using Delta Lake for batch and streaming data. The course also discusses serving data to end-users through aggregate tables and Databricks SQL Analytics.
- Familiarity with data engineering concepts
- Basic knowledge of Delta Lake core features and use cases
Sign-up for Delta Lake Talks at Summit!
Make sure to register for the Data + AI Summit to take advantage of all the amazing sessions and training featuring Delta Lake. Registration is free!
And… be engaged in the Delta Lake Community beyond the Summit!
Your active participation doesn’t have to be limited to the Summit. If you want to stay connected beyond the summit, we have active GitHub, Slack, Google Group, Linux Foundation chapter, Youtube, Community Office Hours, Twitter, Linkedin channels where you can connect with the community, participate in the discussions, get help on getting started with the Delta Lake or start contributing to the project with a good first issue. Hope to see you on one of these channels.