Open-Source

Einige der weltweit beliebtesten Open-Source-Datentechnologien wurden ursprünglich von Databricks-Ingenieuren erfunden

An einem Treffen teilnehmen

Built on open data and AI projects trusted by millions of developers

Apache Spark™

Apache Spark is a unified engine for executing data engineering, data science and ML workloads.

What is Apache Spark?Comparing Spark and Databricks Visit spark.apache.org

Delta Lake

Delta Lake lets you build a lakehouse architecture on top of storage systems such as AWS S3, ADLS, GCS and HDFS.

Learn more about Delta Lake Visit delta.io Tech Talks: Getting Started With Delta Lake

Apache Iceberg™

Apache Iceberg lets you build a lakehouse architecture on top of storage systems such as AWS S3, ADLS, GCS and HDFS.

Visit apache.iceberg.org

Unity Catalog

Unity Catalog is the industry’s only universal catalog for data and AI.

Learn more about Unity Catalog Visit unitycatalog.io

MLflow

MLflow manages the ML lifecycle, including experimentation, reproducibility, deployment and a central model registry.

Managed MLflow on Databricks Visit mlflow.org Tech Talks: Managing the ML Lifecycle

Delta Sharing

Delta Sharing is the industry’s first open protocol for secure data sharing, making it simple to share data with other organizations.

Visit Delta Sharing

Redash

Redash enables anyone to leverage SQL to explore, query, visualize and share data from both big and small data sources.

Visit Redash on GitHub

Databricks supports these additional popular open source technologies

TensorFlow

Databricks supports TensorFlow, a library for deep learning and general computation on clusters.

TensorFlow on Databricks

PyTorch™

Facebook, the creator of PyTorch, and Databricks have collaborated on integrations.

PyTorch on Databricks

Keras™

Deep learning API written in Python, running on top of TensorFlow. Available in Databricks Runtime for ML.

Keras on Databricks

RStudio

An open source suite of tools for collaborative data science using R.

R programming on big data

scikit-learn

Widely used Python package for machine learning built on top of NumPy, SciPy and Matplotlib.

Scikit-learn on Databricks

XGBoost

A distributed gradient boosting library that has bindings in languages such as Python, R and C++.

XGBoost on Databricks

Terraform

HashiCorp Terraform is a popular open source tool for creating safe and predictable cloud infrastructure across several cloud providers. Databricks Terraform provider allows customers to manage their entire Databricks workspaces along with the rest of their infrastructure using a flexible, powerful tool. Using Terraform also encourages customers to adopt best practices with infrastructure as code (IaC).

Terraform on Databricks

Ready to get started?

Try Databricks for free