Introducing Databricks LakeFlow: A Unified, Intelligent Solution for Data Engineering
June 12, 2024
Ingest data from databases, enterprise apps and cloud sources, transform it in batch and near real-time using SQL and Python, and confidently deploy and operate in production
San Francisco, CA – June 12, 2024 — Databricks, the Data and AI company, today announced the launch of Databricks LakeFlow, a new solution that unifies and simplifies all aspects of data engineering, from data ingestion to transformation and orchestration. With LakeFlow, data teams can now simply and efficiently ingest data at scale from databases such as MySQL, Postgres and Oracle, and enterprise applications such as Salesforce, Dynamics, Sharepoint, Workday, NetSuite and Google Analytics. Databricks is also introducing Real Time Mode for Apache Spark™, which allows stream processing at ultra low latency.
LakeFlow automates deploying, operating and monitoring pipelines at scale in production with built-in support for CI/CD, and advanced workflows that support triggering, branching, and conditional execution. Data quality checks and health monitoring are built-in and integrated with alerting systems such as PagerDuty. LakeFlow makes building and operating production-grade data pipelines simple and efficient while still addressing the most complex data engineering use cases, enabling even the most busy data teams to meet the growing demand for reliable data and AI.
Addressing Challenges in Building and Operating Reliable Data Pipelines
Data engineering is essential for democratizing data and AI within businesses, yet it remains a challenging and complex field. Data teams must ingest data from siloed and often proprietary systems, including databases and enterprise applications, often requiring the creation of complex and fragile connectors. Additionally, data preparation involves maintaining intricate logic, and failures and latency spikes can lead to operational disruptions and unhappy customers. Deploying pipelines and monitoring data quality typically requires additional, disparate tools, further complicating the process. Existing solutions are fragmented and incomplete, leading to low data quality, reliability issues, high costs, and an increasing backlog of work.
LakeFlow addresses these challenges by simplifying all aspects of data engineering via a single, unified experience built on the Databricks Data Intelligence Platform, with deep integrations with Unity Catalog for end-to-end governance and serverless compute enabling highly efficient and scalable execution.
Key Features of LakeFlow
LakeFlow Connect: Simple and scalable data ingestion from every data source. LakeFlow Connect provides a breadth of native, scalable connectors for databases such as MySQL, Postgres, SQL Server and Oracle as well as enterprise applications like Salesforce, Dynamics, Sharepoint, Workday and NetSuite. These connectors are fully integrated with Unity Catalog, providing for robust data governance. LakeFlow Connect incorporates the low latency, highly efficient capabilities of Arcion, which was acquired by Databricks in November 2023. LakeFlow Connect makes all data, regardless of size, format or location available for batch and real-time analysis.
LakeFlow Pipelines: Simplifying and automating real-time data pipelines. Built on Databricks’ highly scalable Delta Live Tables technology, LakeFlow Pipelines allows data teams to implement data transformation and ETL in SQL or Python. Customers can now enable Real Time Mode for low-latency streaming without any code changes. LakeFlow eliminates the need for manual orchestration and unifies batch and stream processing. It offers incremental data processing for optimal price/performance. LakeFlow Pipelines makes even the most complex of streaming and batch data transformations simple to build and easy to operate.
LakeFlow Jobs: Orchestrating workflows across the Data Intelligence Platform. LakeFlow Jobs provides automated orchestration, data health and delivery spanning scheduling notebooks and SQL queries all the way to ML training and automatic dashboard updates. It provides enhanced control flow capabilities and full observability to help detect, diagnose and mitigate data issues for increased pipeline reliability. LakeFlow Jobs automates deploying, orchestrating and monitoring data pipelines in a single place, making it easier for data teams to meet their data delivery promises.
Availability
With LakeFlow, the future of data engineering is unified and intelligent. LakeFlow is entering preview soon, starting with LakeFlow Connect. Customers can join the waitlist here.
About Databricks
Databricks is the Data and AI company. More than 10,000 organizations worldwide — including Block, Comcast, Condé Nast, Rivian, Shell and over 60% of the Fortune 500 — rely on the Databricks Data Intelligence Platform to take control of their data and put it to work with AI. Databricks is headquartered in San Francisco, with offices around the globe, and was founded by the original creators of Lakehouse, Apache Spark™, Delta Lake and MLflow. To learn more, follow Databricks on LinkedIn, X and Facebook.
Contact: [email protected]