데이터 엔지니어링

Tens of millions of production workloads run daily on Databricks

배경 이미지

Easily ingest and transform batch and streaming data on the Databricks Lakehouse Platform. Orchestrate reliable production workflows while Databricks automatically manages your infrastructure at scale. Increase the productivity of your teams with built-in data quality testing and support for software development best practices.

배치와 스트리밍 통합

Eliminate silos on one platform with a single and unified API to ingest, transform and incrementally process batch and streaming data at scale.

Focus on getting value from data

Databricks automatically manages your infrastructure and the operational components of your production workflows so you can focus on value, not on tooling.

Connect your tools of choice

An open Lakehouse Platform to connect and use your preferred data engineering tools for data ingestion, ETL/ELT and orchestration.

Build on the Lakehouse Platform

The Lakehouse Platform provides the best foundation to build and share trusted data assets that are centrally governed, reliable and lightning-fast.

배경 이미지

어떻게 작동하나요?

Simplified data ingestion

Automated ETL processing

Reliable workflow orchestration

End-to-end observability and monitoring

Next-generation data processing engine

Foundation of governance, reliability and performance

Simplified data ingestion

Ingest data into your Lakehouse Platform and power your analytics, AI and streaming applications from one place. Auto Loader incrementally and automatically processes files landing in cloud storage — without the need to manage state information — in scheduled or continuous jobs. It efficiently tracks new files (scaling to billions) without having to list them in a directory, and can also automatically infer the schema from the source data and evolve it as it changes over time. The COPY INTO command makes it easy for analysts to perform batch file ingestion into Delta Lake via SQL.

자세히

Simplified data ingestion

Automated ETL processing

Automated ETL processing

Once ingested, raw data needs transforming so that it’s ready for analytics and AI. Databricks provides powerful ETL capabilities for data engineers, data scientists and analysts with Delta Live Tables (DLT). DLT is the first framework that uses a simple declarative approach to build ETL and ML pipelines on batch or streaming data, while automating operational complexities such as infrastructure management, task orchestration, error handling and recovery, and performance optimization. With DLT, engineers can also treat their data as code and apply software engineering best practices like testing, monitoring and documentation to deploy reliable pipelines at scale.

자세히

Reliable workflow orchestration

Databricks Workflows is the fully managed orchestration service for all your data, analytics and AI that is native to your Lakehouse Platform. Orchestrate diverse workloads for the full lifecycle including Delta Live Tables and Jobs for SQL, Spark, notebooks, dbt, ML models and more. Deep integration with the underlying Lakehouse Platform ensures you will create and run reliable production workloads on any cloud while providing deep and centralized monitoring with simplicity for end users.

자세히

Reliable workflow orchestration

End-to-end observability and monitoring

End-to-end observability and monitoring

The Lakehouse Platform gives you visibility across the entire data and AI lifecycle so data engineers and operations teams can see the health of their production workflows in real time, manage data quality and understand historical trends. In Databricks Workflows you can access dataflow graphs and dashboards tracking the health and performance of your production jobs and Delta Live Tables pipelines. Event logs are also exposed as Delta Lake tables so you can monitor and visualize performance, data quality and reliability metrics from any angle.

Next-generation data processing engine

Databricks data engineering is powered by Photon, the next-generation engine compatible with Apache Spark APIs delivering record-breaking price/performance while automatically scaling to thousands of nodes. Spark Structured Streaming provides a single and unified API for batch and stream processing, making it easy to adopt streaming on the Lakehouse without changing code or learning new skills.

자세히

Next-generation data processing engine

State-of-the art data governance, reliability and performance

State-of-the art data governance, reliability and performance

Data engineering on Databricks means you benefit from the foundational components of the Lakehouse Platform — Unity Catalog and Delta Lake. Your raw data is optimized with Delta Lake, an open source storage format providing reliability through ACID transactions, and scalable metadata handling with lightning-fast performance. This combines with Unity Catalog to give you fine-grained governance for all your data and AI assets, simplifying how you govern, with one consistent model to discover, access and share data across clouds. Unity Catalog also provides native support for Delta Sharing, the industry’s first open protocol for simple and secure data sharing with other organizations.

Delta Live Tables Modern software engineering for ETL processing

자세히

통합

Provide maximum flexibility to your data teams — leverage Partner Connect and an ecosystem of technology partners to seamlessly integrate with popular data engineering tools. For example, you can ingest business-critical data with Fivetran, transform it in place with dbt, and orchestrate your pipelines with Apache Airflow.

데이터 수집 및 ETL

+ 여타 모든 Apache Spark™️ 호환 클라이언트

고객 사례

ADP
Asurion 고객 사례

고객 사례

Shell Logo
"ADP에서는 인적 자원 관리 데이터를 레이크하우스의 통합 데이터 스토어로 마이그레이션하고 있습니다. 우리 팀은 Delta Live Tables를 통해 품질 관리를 구축하는 데 도움을 받았습니다. SQL만 사용해서 배치와 실시간 스트리밍을 지원하는 선언적 API 덕분에 데이터 관리에 들어가는 시간과 노력을 절약할 수 있었습니다."

— Jack Berkowitz, CDO, ADP

yipitdata
Asurion 고객 사례

고객 사례

Shell Logo
“Databricks Workflows allows our analysts to easily create, run, monitor and repair data pipelines without managing any infrastructure. This enables them to have full autonomy in designing and improving ETL processes that produce must-have insights for our clients. We are excited to move our Airflow pipelines over to Databricks Workflows.”

— Anup Segu, Senior Software Engineer, YipitData

시작할 준비가
되셨나요?

시작하기 가이드

AWSAzureGCP