Skip to main content
Login
      • Discover
        • For Executives
          • For Startups
            • Lakehouse Architecture
              • Mosaic Research
              • Customers
                • Featured Stories
                  • See All Customers
                  • Partners
                    • Cloud Providers
                      Databricks on AWS, Azure, GCP, and SAP
                      • Consulting & System Integrators
                        Experts to build, deploy and migrate to Databricks
                        • Technology Partners
                          Connect your existing tools to your Lakehouse
                          • C&SI Partner Program
                            Build, deploy or migrate to the Lakehouse
                            • Data Partners
                              Access the ecosystem of data consumers
                              • Partner Solutions
                                Find custom industry and migration solutions
                                • Built on Databricks
                                  Build, market and grow your business
                                • Databricks Platform
                                  • Platform Overview
                                    A unified platform for data, analytics and AI
                                    • Data Management
                                      Data reliability, security and performance
                                      • Sharing
                                        An open, secure, zero-copy sharing for all data
                                        • Data Warehousing
                                          Serverless data warehouse for SQL analytics
                                          • Governance
                                            Unified governance for all data, analytics and AI assets
                                            • Real-Time Analytics
                                              Real-time analytics, AI and applications made simple
                                              • Artificial Intelligence
                                                Build and deploy ML and GenAI applications
                                                • Data Engineering
                                                  ETL and orchestration for batch and streaming data
                                                  • Business Intelligence
                                                    Intelligent analytics for real-world data
                                                    • Data Science
                                                      Collaborative data science at scale
                                                    • Integrations and Data
                                                      • Marketplace
                                                        Open marketplace for data, analytics and AI
                                                        • IDE Integrations
                                                          Build on the Lakehouse in your favorite IDE
                                                          • Partner Connect
                                                            Discover and integrate with the Databricks ecosystem
                                                          • Pricing
                                                            • Databricks Pricing
                                                              Explore product pricing, DBUs and more
                                                              • Cost Calculator
                                                                Estimate your compute costs on any cloud
                                                              • Open Source
                                                                • Open Source Technologies
                                                                  Learn more about the innovations behind the platform
                                                                • Databricks for Industries
                                                                  • Communications
                                                                    • Media and Entertainment
                                                                      • Financial Services
                                                                        • Public Sector
                                                                          • Healthcare & Life Sciences
                                                                            • Retail
                                                                              • Manufacturing
                                                                                • See All Industries
                                                                                • Cross Industry Solutions
                                                                                  • Customer Data Platform
                                                                                    • Cybersecurity
                                                                                    • Migration & Deployment
                                                                                      • Data Migration
                                                                                        • Professional Services
                                                                                        • Solution Accelerators
                                                                                          • Explore Accelerators
                                                                                            Move faster toward outcomes that matter
                                                                                          • Training and Certification
                                                                                            • Learning Overview
                                                                                              Hub for training, certification, events and more
                                                                                              • Training Overview
                                                                                                Discover curriculum tailored to your needs
                                                                                                • Databricks Academy
                                                                                                  Sign in to the Databricks learning platform
                                                                                                  • Certification
                                                                                                    Gain recognition and differentiation
                                                                                                    • University Alliance
                                                                                                      Want to teach Databricks? See how.
                                                                                                    • Events
                                                                                                      • Data + AI Summit
                                                                                                        • Data + AI World Tour
                                                                                                          • Data Intelligence Days
                                                                                                            • Event Calendar
                                                                                                            • Blog and Podcasts
                                                                                                              • Databricks Blog
                                                                                                                Explore news, product announcements, and more
                                                                                                                • Databricks Mosaic Research Blog
                                                                                                                  Discover the latest in our Gen AI research
                                                                                                                  • Data Brew Podcast
                                                                                                                    Let’s talk data!
                                                                                                                    • Champions of Data + AI Podcast
                                                                                                                      Insights from data leaders powering innovation
                                                                                                                    • Get Help
                                                                                                                      • Customer Support
                                                                                                                        • Documentation
                                                                                                                          • Community
                                                                                                                          • Dive Deep
                                                                                                                            • Resource Center
                                                                                                                              • Demo Center
                                                                                                                              • Company
                                                                                                                                • Who We Are
                                                                                                                                  • Our Team
                                                                                                                                    • Databricks Ventures
                                                                                                                                      • Contact Us
                                                                                                                                      • Careers
                                                                                                                                        • Working at Databricks
                                                                                                                                          • Open Jobs
                                                                                                                                          • Press
                                                                                                                                            • Awards and Recognition
                                                                                                                                              • Newsroom
                                                                                                                                              • Security and Trust
                                                                                                                                                • Security and Trust
                                                                                                                                            • Data and AI summit

                                                                                                                                              JUNE 9–12 | SAN FRANCISCO

                                                                                                                                              700+ sessions on all things data intelligence. Get ready to dive deep.

                                                                                                                                              REGISTER
                                                                                                                                            • Ready to get started?
                                                                                                                                            • Get a Demo
                                                                                                                                            Data and AI summit

                                                                                                                                            JUNE 9–12 | SAN FRANCISCO

                                                                                                                                            700+ sessions on all things data intelligence. Get ready to dive deep.

                                                                                                                                            REGISTER
                                                                                                                                            • Login
                                                                                                                                            • Try Databricks
                                                                                                                                            1. Blog
                                                                                                                                            2. /
                                                                                                                                              Open Source
                                                                                                                                            3. /
                                                                                                                                              Article

                                                                                                                                            Using Apache Flink With Delta Lake

                                                                                                                                            Incorporating Flink datastreams into your Lakehouse Architecture

                                                                                                                                            apache-flink-delta-blog-og

                                                                                                                                            Published: February 10, 2022

                                                                                                                                            Open Source7 min read

                                                                                                                                            by Max Fisher, Dylan Gessner and Vini Jaiswal

                                                                                                                                            Share this post

                                                                                                                                            Keep up with us

                                                                                                                                            As with all parts of our platform, we are constantly raising the bar and adding new features to enhance developers’ abilities to build the applications that will make their Lakehouse a reality. Building real-time applications on Databricks is no exception. Features like asynchronous checkpointing, session windows, and Delta Live Tables allow organizations to build even more powerful, real-time pipelines on Databricks using Delta Lake as the foundation for all the data that flows through the lakehouse.

                                                                                                                                            However, for organizations that leverage Flink for real-time transformations, it might appear that they are unable to take advantage of some of the great Delta Lake and Databricks features, but that is not the case. In this blog we will explore how Flink developers can build pipelines to integrate their Flink applications into the broader Lakehouse architecture.

                                                                                                                                            A stateful Flink application

                                                                                                                                            Let’s use a credit card company to explore how we can do this.

                                                                                                                                            For credit card companies, preventing fraudulent transactions is table-stakes for a successful business. Credit card fraud poses both reputational and revenue risk to a financial institution and, therefore, credit card companies must have systems in place to remain constantly vigilant in preventing fraudulent transactions. These organizations may implement monitoring systems using Apache Flink, a distributed event-at-a-time processing engine with fine-grained control over streaming application state and time.

                                                                                                                                            Below is a simple example of a fraud detection application in Flink. It monitors transaction amounts over time and sends an alert if a small transaction is immediately followed by a large transaction within one minute for any given credit card account. By leveraging Flink’s ValueState data type and KeyedProcessFunction together, developers can implement their business logic to trigger downstream alerts based on event and time states.

                                                                                                                                            Integrating Flink applications using cloud object store sinks with Delta Lake

                                                                                                                                            There is a tradeoff between very low-latency operational use-cases and running performant OLAP on big datasets. To meet operational SLAs and prevent fraudulent transactions, records need to be produced by Flink nearly as quickly as events are received, resulting in small files (on the order of a few KBs) in the Flink application’s sink. This “small file problem” can lead to very poor performance in downstream queries, as execution engines spend more time listing directories and pulling files from cloud storage than they do actually processing the data within those files. Consider the same fraud detection application that writes transactions as parquet files with the following schema:

                                                                                                                                            Fortunately, Databricks Auto Loader makes it easy to stream data landed into object storage from Flink applications into Delta Lake tables for downstream ML and BI on that data.

                                                                                                                                            Delta Lake tables automatically optimize the physical layout of data in cloud storage through compaction and indexing to mitigate the small file problem and enable performant downstream analytics.

                                                                                                                                            Much like Auto-Loader can transform a static source like cloud storage into a streaming datasource, Delta Lake tables also function as streaming sources despite being stored in object storage. This means that organizations using Flink for operational use cases can leverage this architectural pattern for streaming analytics without sacrificing their real-time requirements.

                                                                                                                                            Integrating Flink applications using Apache Kafka and Delta Lake

                                                                                                                                            Let’s say the credit card company wanted to use their fraud detection model that they built in Databricks, and the model to score the data in real-time. Pushing files to cloud storage might not be fast enough for some SLAs around fraud detection, so they can write data from their Flink application to message bus systems like Kafka, AWS Kinesis, or Azure Event Hub. Once the data is written to Kafka, a Databricks job can read from Kafka and write to Delta Lake.

                                                                                                                                            For Flink developers, there is a Kafka Connector that can be integrated with your Flink projects to allow for DataStream API and Table API-based streaming jobs to write out the results to an organization’s Kafka cluster. Note that as of the writing of this blog, Flink does not come packaged with this connector, so you will need to include the Kafka Connector JAR in your project’s build file (i.e. pom.xml, build.sbt, etc).

                                                                                                                                            Here is an example of how you would write the results of your DataStream in Flink to a topic on the Kafka Cluster:

                                                                                                                                            Now you can easily leverage Databricks to write a Structured Streaming application to read from the Kafka topic that the results of the Flink DataStream wrote out to. To establish the read from Kafka...

                                                                                                                                            Once the data has been schematized, we can load our model and score the microbatch of data that Spark processes after each trigger. For a more detailed example of Machine Learning models and Structured streaming, check this article out in our documentation.

                                                                                                                                            Now we can write to Delta by configuring the writeStream and pointing it to our fraud_predictions Delta Lake table. This will allow us to build important reports on how we track and handle fraudulent transactions for our customers; we can even use the outputs to understand how our model is doing over time in terms of how many false positives it outputs or accurate assessments.

                                                                                                                                            Conclusion

                                                                                                                                            With both of these options, Flink and Autoloader or Flink and Kafka, organizations can still leverage the features of Delta Lake and ensure they are integrating their Flink applications into their broader Lakehouse architecture. Databricks has also been working with the Flink community to build a direct Flink to Delta Lake connector.

                                                                                                                                            Keep up with us

                                                                                                                                            Recommended for you

                                                                                                                                            Share this post

                                                                                                                                            Never miss a Databricks post

                                                                                                                                            Subscribe to the categories you care about and get the latest posts delivered to your inbox

                                                                                                                                            Sign up

                                                                                                                                            What's next?

                                                                                                                                            GGML GGUF File Format Vulnerabilities

                                                                                                                                            Open Source

                                                                                                                                            March 22, 2024/10 min read

                                                                                                                                            GGML GGUF File Format Vulnerabilities

                                                                                                                                            databricks x google cloud

                                                                                                                                            Open Source

                                                                                                                                            June 5, 2024/3 min read

                                                                                                                                            BigQuery adds first-party support for Delta Lake

                                                                                                                                            databricks logo
                                                                                                                                            Why Databricks
                                                                                                                                            Discover
                                                                                                                                            • For Executives
                                                                                                                                            • For Startups
                                                                                                                                            • Lakehouse Architecture
                                                                                                                                            • Mosaic Research
                                                                                                                                            Customers
                                                                                                                                            • Featured
                                                                                                                                            • See All
                                                                                                                                            Partners
                                                                                                                                            • Cloud Providers
                                                                                                                                            • Technology Partners
                                                                                                                                            • Data Partners
                                                                                                                                            • Built on Databricks
                                                                                                                                            • Consulting & System Integrators
                                                                                                                                            • C&SI Partner Program
                                                                                                                                            • Partner Solutions
                                                                                                                                            Discover
                                                                                                                                            • For Executives
                                                                                                                                            • For Startups
                                                                                                                                            • Lakehouse Architecture
                                                                                                                                            • Mosaic Research
                                                                                                                                            Customers
                                                                                                                                            • Featured
                                                                                                                                            • See All
                                                                                                                                            Partners
                                                                                                                                            • Cloud Providers
                                                                                                                                            • Technology Partners
                                                                                                                                            • Data Partners
                                                                                                                                            • Built on Databricks
                                                                                                                                            • Consulting & System Integrators
                                                                                                                                            • C&SI Partner Program
                                                                                                                                            • Partner Solutions
                                                                                                                                            Product
                                                                                                                                            Databricks Platform
                                                                                                                                            • Platform Overview
                                                                                                                                            • Sharing
                                                                                                                                            • Governance
                                                                                                                                            • Artificial Intelligence
                                                                                                                                            • Business Intelligence
                                                                                                                                            • Data Management
                                                                                                                                            • Data Warehousing
                                                                                                                                            • Real-Time Analytics
                                                                                                                                            • Data Engineering
                                                                                                                                            • Data Science
                                                                                                                                            Pricing
                                                                                                                                            • Pricing Overview
                                                                                                                                            • Pricing Calculator
                                                                                                                                            Open Source
                                                                                                                                            Integrations and Data
                                                                                                                                            • Marketplace
                                                                                                                                            • IDE Integrations
                                                                                                                                            • Partner Connect
                                                                                                                                            Databricks Platform
                                                                                                                                            • Platform Overview
                                                                                                                                            • Sharing
                                                                                                                                            • Governance
                                                                                                                                            • Artificial Intelligence
                                                                                                                                            • Business Intelligence
                                                                                                                                            • Data Management
                                                                                                                                            • Data Warehousing
                                                                                                                                            • Real-Time Analytics
                                                                                                                                            • Data Engineering
                                                                                                                                            • Data Science
                                                                                                                                            Pricing
                                                                                                                                            • Pricing Overview
                                                                                                                                            • Pricing Calculator
                                                                                                                                            Integrations and Data
                                                                                                                                            • Marketplace
                                                                                                                                            • IDE Integrations
                                                                                                                                            • Partner Connect
                                                                                                                                            Solutions
                                                                                                                                            Databricks For Industries
                                                                                                                                            • Communications
                                                                                                                                            • Financial Services
                                                                                                                                            • Healthcare and Life Sciences
                                                                                                                                            • Manufacturing
                                                                                                                                            • Media and Entertainment
                                                                                                                                            • Public Sector
                                                                                                                                            • Retail
                                                                                                                                            • View All
                                                                                                                                            Cross Industry Solutions
                                                                                                                                            • Customer Data Platform
                                                                                                                                            • Cybersecurity
                                                                                                                                            Data Migration
                                                                                                                                            Professional Services
                                                                                                                                            Solution Accelerators
                                                                                                                                            Databricks For Industries
                                                                                                                                            • Communications
                                                                                                                                            • Financial Services
                                                                                                                                            • Healthcare and Life Sciences
                                                                                                                                            • Manufacturing
                                                                                                                                            • Media and Entertainment
                                                                                                                                            • Public Sector
                                                                                                                                            • Retail
                                                                                                                                            • View All
                                                                                                                                            Cross Industry Solutions
                                                                                                                                            • Customer Data Platform
                                                                                                                                            • Cybersecurity
                                                                                                                                            Resources
                                                                                                                                            Documentation
                                                                                                                                            Customer Support
                                                                                                                                            Community
                                                                                                                                            Training and Certification
                                                                                                                                            • Learning Overview
                                                                                                                                            • Training Overview
                                                                                                                                            • Certification
                                                                                                                                            • University Alliance
                                                                                                                                            • Databricks Academy Login
                                                                                                                                            Events
                                                                                                                                            • Data + AI Summit
                                                                                                                                            • Data + AI World Tour
                                                                                                                                            • Data Intelligence Days
                                                                                                                                            • Full Calendar
                                                                                                                                            Blog and Podcasts
                                                                                                                                            • Databricks Blog
                                                                                                                                            • Databricks Mosaic Research Blog
                                                                                                                                            • Data Brew Podcast
                                                                                                                                            • Champions of Data & AI Podcast
                                                                                                                                            Training and Certification
                                                                                                                                            • Learning Overview
                                                                                                                                            • Training Overview
                                                                                                                                            • Certification
                                                                                                                                            • University Alliance
                                                                                                                                            • Databricks Academy Login
                                                                                                                                            Events
                                                                                                                                            • Data + AI Summit
                                                                                                                                            • Data + AI World Tour
                                                                                                                                            • Data Intelligence Days
                                                                                                                                            • Full Calendar
                                                                                                                                            Blog and Podcasts
                                                                                                                                            • Databricks Blog
                                                                                                                                            • Databricks Mosaic Research Blog
                                                                                                                                            • Data Brew Podcast
                                                                                                                                            • Champions of Data & AI Podcast
                                                                                                                                            About
                                                                                                                                            Company
                                                                                                                                            • Who We Are
                                                                                                                                            • Our Team
                                                                                                                                            • Databricks Ventures
                                                                                                                                            • Contact Us
                                                                                                                                            Careers
                                                                                                                                            • Open Jobs
                                                                                                                                            • Working at Databricks
                                                                                                                                            Press
                                                                                                                                            • Awards and Recognition
                                                                                                                                            • Newsroom
                                                                                                                                            Security and Trust
                                                                                                                                            Company
                                                                                                                                            • Who We Are
                                                                                                                                            • Our Team
                                                                                                                                            • Databricks Ventures
                                                                                                                                            • Contact Us
                                                                                                                                            Careers
                                                                                                                                            • Open Jobs
                                                                                                                                            • Working at Databricks
                                                                                                                                            Press
                                                                                                                                            • Awards and Recognition
                                                                                                                                            • Newsroom
                                                                                                                                            databricks logo

                                                                                                                                            Databricks Inc.
                                                                                                                                            160 Spear Street, 15th Floor
                                                                                                                                            San Francisco, CA 94105
                                                                                                                                            1-866-330-0121

                                                                                                                                            See Careers
                                                                                                                                            at Databricks

                                                                                                                                            © Databricks 2025. All rights reserved. Apache, Apache Spark, Spark, the Spark Logo, Apache Iceberg, Iceberg, and the Apache Iceberg logo are trademarks of the Apache Software Foundation.

                                                                                                                                            • Privacy Notice
                                                                                                                                            • |Terms of Use
                                                                                                                                            • |Modern Slavery Statement
                                                                                                                                            • |California Privacy
                                                                                                                                            • |Your Privacy Choices