Skip to main content

Companies across all industries want to share data with each other to enable collaboration and accelerate innovation. However, these organizations often use different data or cloud platforms, which creates friction or blocks collaboration. Databricks and the Linux Foundation developed Delta Sharing, marking a significant milestone in the democratization of data exchange with the first open source approach to data sharing across platforms, clouds, and regions. With Delta Sharing, customers are no longer limited to collaborating within their own platform and customer base but can instead go beyond and share data with all of their customers, partners, and any other collaborators. 

Since announcing general availability of Delta Sharing in 2022, we have seen many enterprises adopt it to maximize their reach and collaborate with their customers and partners —regardless of cloud or platform. Databricks customers use the managed Delta Sharing service offered natively, which supports both Databricks-to-Databricks (D2D) and Databricks-to-Open (D2O) for non-Databricks customers. Thanks to its open reach, D2O is very popular with customers, with 40% of active shares using open connectors. Databricks customers Atlassian and Nasdaq use Databricks D2O to deliver data to all their partners and customers on any computing platform, anywhere. Data and software platforms such as Oracle have also adopted Delta Sharing for Oracle-to-Open sharing to help enable their customers.  

Databricks-to-Open (D2O) Delta Sharing revolutionizes how organizations share data, enabling seamless sharing of data managed in a Unity Catalog-enabled workspace with any user on any computing platform, anywhere. This approach enables Databricks customers to collaborate with all of their partners, customers, and suppliers - regardless of whichever data or cloud platform they use.  

This blog will showcase the pivotal role of D2O in modern data sharing strategies with real-world applications. We will explore D2O scenarios that empower organizations to extend their data sharing capabilities, enabling interoperability with external partners’ systems, and reaching customers anywhere. 

In addition, we will highlight the most commonly used Delta Sharing open source connectors, such as Python, Apache Spark™, Excel, Tableau, PowerBI, part of the growing, open Delta Sharing ecosystem. We will also showcase how Databricks customers leverage D2O combined with the Delta Sharing REST API to build a cohesive data fabric architecture, customizing their data sharing experiences across their entire customer base. 

Finally, we will review Databricks' Marketplace's recent support for D2O, which now enables recipient access to Marketplace listings via the Delta Sharing open connectors. For example, we will explain how a Python connector or Spark connector can be used to consume a Delta Sharing listing in systems where there is no native connector, such as Amazon EMR, Google BigQuery, and Snowflake. 

Increasingly, enterprises are implementing a D2O workflow to simplify collaboration externally across multiple platforms to unlock the potential of their data to drive innovation, ensure robust governance, and accelerate growth. 

Open Ecosystem of Connectors

Consuming data shared using the Delta Sharing open sharing protocol requires an OSS connector, authenticated using a credential file that is typically obtained when a provider shares an activation token with a recipient.

The table below summarizes the OSS connectors that Delta Sharing currently supports, with links for download and major features for each. For example, the Python Connector offers robust capabilities for querying metadata, accessing snapshots, supporting Change Data Feed (CDF), and supporting Pandas. Another one is the Apache Spark Connector which provides similar capabilities to the Python connector, ensuring seamless integration into Spark users' workflows. These connectors are part of the broader OSS Delta Sharing project, aimed at simplifying data sharing and consumption through familiar APIs and promoting open and accessible data sharing. All of these connectors also help read data from the Unity Catalog (UC) for recipients not yet on UC.

ConnectorDescriptionDownloadMajor Features
Python
Python
Python / PySpark sharing clientGitHub
  • Query metadata
  • Query version
  • Get latest snapshot
  • Change Data Feed (CDF)
Apache Spark
Apache Spark
Apache Spark sharing clientGitHub
  • Query metadata
  • Query version
  • Get latest snapshot
  • CDF
  • Streaming
Microsoft Power BI
PowerBI
Power BI uses Power Query to connect to data sources. Read documentation.Power BI Delta Sharing Connector
  • Get latest snapshot
Microsoft Excel
Microsoft Excel
Excel add-in for Delta Sharing and writing Delta tablesExponam Excel Add-in
  • Query metadata
  • Query version
  • Get latest snapshot
Tableau
Tableau from Salesforce
Tableau Delta Sharing connector provides joint integration. Read the blog post.Tableau Delta Sharing Connector
  • Query metadata
  • Query version
  • Get latest snapshot

Earlier this year, a new Tableau Delta Sharing connector was announced to support seamless data sharing between Tableau and Databricks. 

Meet Your Customers Wherever They Are: BigQuery and Snowflake Examples

When integrating Delta Sharing with systems that lack native connectors, such as BigQuery and Snowflake, the Python delta sharing connector provides a versatile solution to bridge these gaps effectively. For BigQuery users, PySpark can be leveraged to authenticate and access shared data via the ‘delta_sharing’ library, followed by loading this data into a DataFrame and writing it directly to BigQuery. This process utilizes Google Cloud Dataproc for scalable data processing, ensuring that data handling is both efficient and secure. To learn more about how to use Delta Sharing with BigQuery, read Medium blog post from Databricks experts.

Similarly, for Snowflake integration, recipients can utilize the Python connector with the Pandas library to import data into a DataFrame. Following the data import, Snowflake’s Snowpark Python API facilitates the connection to Snowflake databases, allowing for seamless data writing from the Pandas DataFrame into Snowflake tables. 

Code example: 

<span class="subtle">pip install delta-sharing, snowflake-snowpark-python pandas
import delta_sharing
import pandas as pd
# Path to the Delta Sharing profile JSON file
profile_file = "path/to/your/profile.delta-sharing.json"
# Load the profile
client = delta_sharing.SharingClient(profile_file)
# Load a specific table into a DataFrame
table_url = "delta-sharing://<profile>#schema_name.table_name"
df = delta_sharing.load_as_pandas(table_url)

# Snowflake Snowpark session setup
connection_parameters = { …}
# Create a Snowflake session
session = Session.builder.configs(connection_parameters).create()
# Write the pandas DataFrame directly to a Snowflake table
session.write_pandas(df_pandas, "your_snowflake_table_name", auto_create_table=True)</span>

This method offers significant advantages because it eliminates the need for providers to replicate data in a separate system simply for sharing purposes, which would otherwise require additional computing, storage, and technical effort. By using Delta Sharing, data providers can directly share from their Databricks environment, enabling recipients to access the live data across various platforms, without the need for replication. This approach not only demonstrates the flexibility and cost-effectiveness of Delta Sharing but also enhances efficiency by consolidating data in a single system.

Delta Sharing: an open cross-platform sharing ecosystem

Enhance Your Data Services with the Delta Sharing API

Many customers build their own products and interfaces on top of Databricks. These customers use Databricks Delta Sharing’s REST API to create tailored data sharing applications for their customers. Such applications are designed not only to enhance user experience but also to fit seamlessly into a comprehensive data fabric strategy.

Clients are leveraging these custom-built applications to control their data exchange environments, enabling them to share data hosted on Databricks with their customers who may not be using the same platform.

By customizing user interfaces to external partners' needs, organizations enhance collaboration and drive innovation, transforming data exchange into a strategic asset that improves business relationships and customer engagement. This approach strengthens their competitive edge in a data-driven market. The emphasis on flexibility and adaptability in these customized interfaces marks a new era of strategic data exchange. 

For example, Atlassian integrates with Delta Sharing to help their customers drive insights with a flexible, open ecosystem. Atlassian Analytics’ latest feature data shares is powered by Databricks Delta Sharing’s open-source protocol. Data shares allows you to access Atlassian data in your environments and in any BI tool. Watch Atlassian’s 2024 Data + AI Summit session, “Empowering Enterprise Grade Customers with Delta Sharing - an Atlassian Analytics Story.” 

"Atlassian Analytics recently launched Data Shares, leveraging Delta Sharing from Databricks, to boost flexibility and accelerate customers' time-to-insight. Whether users choose to work within Atlassian Analytics or continue using dashboards they're already familiar with, Delta Sharing's open ecosystem of connectors, including Tableau, PowerBI, and Spark, enables customers to easily power their environments with data directly from the Atlassian Data Lake."
— Ben Jackson, Senior Group Product Manager, Data & Analytics, Atlassian  

Another Databricks customer, Nasdaq has been using Delta Sharing for their Data Link Platform which delivers market data, alternative data, and partner data to its users. As their data sets increased, they needed to have a scalable solution to deliver terabytes of data securely and efficiently, while reducing egress costs. Nasdaq uses Delta Sharing customized for their specific needs in a scalable way which includes built-in governance from Databricks. To learn more about how Nasdaq uses D2O sharing, hear from them in the 2024 Data + AI Summit session, “Delta Sharing unlocks the value of your data to partners and customers.”

Oracle announced Delta Sharing integration for their Oracle Autonomous Database users last year to connect with Databricks across clouds. Customers no longer have to deal with having their data locked in one platform or have to copy their data to share it with another platform. Now, with Delta Sharing, these platforms can see each other’s data without the need for copying. This helps avoid issues with outdated data, unnecessary computer usage, and extra work. Read Oracle’s blog post to learn more about this integration. You can also learn more from Oracle in the 2024 Data + AI Summit session “Delta Sharing: Open Protocol for Secure Data Sharing (OSS).

Databricks Marketplace D2O

Databricks Marketplace is an open marketplace for all your data and AI assets, such as AI models, tabular data, file-based data, as well as industry-based Solution Accelerators. 

The Databricks Marketplace D2O (Databricks-to-Open) feature extends the capabilities of Marketplace to support recipients across non-Databricks platforms, leveraging the power of Delta Sharing. This extension enables a broader range of data sharing possibilities beyond the conventional Databricks-to-Databricks (D2D) interactions, by implementing a unique credential system for recipient identification. Unlike the standard procedure that relies on mutual authentication between Databricks account metastores, D2O facilitates the sharing of data through an open protocol, allowing recipients to access shared assets without the necessity of a Databricks account. Furthermore, after the listing is installed, the feature offers the functionality for users to download and renew the credential token needed to access the shared data. This enhances the Databricks Marketplace's utility by enabling integration with external tools such as Spark, PowerBI, Excel, and non-UC Databricks accounts, thus broadening the scope of data accessibility and collaboration. 

Advancing Data Collaboration through D2O

Our exploration of D2O Delta Sharing highlights its pivotal role in facilitating data exchange across Databricks and non-Databricks platforms. By deploying connectors, D2O enhances data accessibility and ensures seamless integration with various platforms, including Spark, PowerBI, Tableau, and Excel. This strategic interoperability fosters a more inclusive data ecosystem, improving the utility and applicability of data in diverse analytical and operational scenarios.

D2O's approach to data sharing marks a significant advancement in data democratization, empowering organizations to spread insights and foster collaboration beyond traditional boundaries. The impact of this feature is substantial, simplifying data operations, sparking innovation, and opening new avenues for growth and efficiency.

Reflecting on the capabilities and potential of D2O Delta Sharing, it is clear that this innovation is more than just technological progress; it is a commitment to open, accessible, and collaborative data exchange. With the advancements made by D2O, the future of data sharing looks promising, cementing data's role as a crucial element in decision-making and innovation in today's digital world.

Getting Started with Delta Sharing

To learn more about how to implement Delta Sharing within your organization, check out the latest resources including new eBooks and related blogs below, or deep dive into the Delta Sharing technical documentation

If you are already a Delta Sharing customer, you can also reach out to the team with questions or to provide feedback at datasharing[at]databricks.com.

Try Databricks for free

Related posts

See all Platform Blog posts