Announcing Public Preview of Delta Sharing with Cloudflare R2 Integration
Special thanks to Phillip Jones, Senior Product Manager, and Harshal Brahmbhatt, Systems Engineer from Cloudflare for their contributions to this blog.
Organizations across industries want to share their data and AI assets in a single, unified way, regardless of clouds or regions. However, many organizations still struggle to share data with customers, teams and partners, facing platform compatibility issues and limitations, high egress costs, and a lack of governance and security. Databricks and the Linux Foundation developed Delta Sharing as the first open approach for secure data sharing. Customers have been using Delta Sharing to easily and securely share data across platforms, clouds and regions, without the need for replication.
Today, we're excited to announce Delta Sharing with Cloudflare R2 integration is in Public Preview to help customers sharing data across clouds and regions to save on egress costs. Databricks now supports Delta Sharing from Cloudflare R2, Cloudflare's zero egress, distributed object storage offering. Joint customers can now take advantage of zero egress fees without costly replication across regions and no vendor lock-in.
Strategic partnership with Cloudflare
Databricks partnered with Cloudflare to help organizations share their data with customers and partners in a single unified way, regardless of cloud or region. Cloudflare R2 is a zero-egress distributed storage offered by Cloudflare that enables customers to share the most up-to-date datasets with their partners, suppliers, and lines of businesses without compromising security and privacy.
Matthew Prince, co-founder and CEO of Cloudflare, explained the value of the partnership, "The combination of Cloudflare's massive global network and zero egress storage, along with Databricks' powerful sharing and processing capabilities, will give our joint customers the fastest, most secure, and most affordable data sharing capabilities across the globe."
Using Delta Sharing with Cloudflare R2, customers are now in control of where to move and use their data and AI (live datasets, models, and notebooks), sharing the latest across platforms, clouds and regions with no need for replication, zero egress costs, no vendor lock-in, and without compromising on security and governance.
"The combination of Cloudflare's massive global network and zero egress storage, along with Databricks' powerful sharing and processing capabilities, will give our joint customers the fastest, most secure, and most affordable data sharing capabilities across the globe."— Matthew Prince, CEO & Co-founder Cloudflare
“Delta Sharing provides the first open protocol for sharing data across diverse computing platforms, clouds and regions. We are excited about how this will push open interchange forward and help all of our customers collaborate more easily,” explained Matei Zaharia, Co-Founder and CTO at Databricks about the partnership with Cloudflare.
"Delta Sharing provides the first open protocol for sharing data across diverse computing platforms, clouds and regions. We are excited about how this will push open interchange forward and help all of our customers collaborate more easily."— Matei Zaharia, Co-Founder and CTO at Databricks
Allium saves up to $645K per year using Delta Sharing and Cloudflare R2
In the last 15 years, the financial industry has been transformed with the introduction of blockchain technology and the usage of cryptocurrency across industries. This evolution has generated an ever-increasing amount of transactional data from public blockchains, available for investors and traders to gain crucial, real-time insights.
Allium is a Databricks customer that provides a simple data platform with fast and accurate blockchain data. They help customers ranging from financial institutions to crypto-native firms unlock the full power of their data. Allium offers a dedicated data infrastructure and products including managed blockchain databases, enriched data schemas, and real-time notification capabilities. They are a leader in this space, serving 15 blockchains, including EVMs and Bitcoin, 100+ schemas, and 250+ TB data in size to empower all kinds of crypto applications - from accounting and auditing for traders to wash trading filtering for NFT marketplaces. Allium meets their customers wherever they are—in their data environment, resulting in more than 1 PB of data transfer monthly in the last quarter, and this volume continues to surge following the recent crypto recovery fueled by ETF optimism.
While the massive increase in data transfer volumes has contributed to Allium's rapid business growth, it has also added a significant challenge to its bottom line– how to build a cost-efficient data storage and sharing solution that meets its customers' needs. Specifically, how can they share data with their customers to any location - across clouds and regions - and minimize expensive data egress costs from cloud vendors.
Before adopting the joint solution of Delta Sharing with Cloudflare R2, Allium had implemented other platforms but found them prohibitively expensive, with estimated costs reaching $53.8K monthly for a 1 Petabyte data egress, totaling approximately $645K annually.
“We initially leveraged Snowflake’s replication system but it lacked control and was expensive. In Snowflake, serving data to different regions requires us to replicate data to that region, so it automatically incurs a lot of storage costs as well as some egress costs. This expense increases exponentially for any operational schema change, which happens frequently at our scale,” explains Ethan Chan, Co-Founder and CEO of Allium.
"In Snowflake, serving data to different regions requires us to replicate data to that region, so it automatically incurs a lot of storage costs as well as some egress costs. This expense increases exponentially for any operational schema change, which happens frequently at our scale."— Ethan Chan, Co-Founder and CEO of Allium
The combination of Delta Sharing with Cloudflare R2 has provided Allium with a cost-effective and secure data sharing solution, with no need for costly and complex replications or vendor lock-in. Allium is now in control of where they move and use their data with Delta Sharing's multicloud support and has consolidated its cloud storage with Cloudflare R2 to build its next-generation data sharing platform.
Chan explains, "Combining both Delta Sharing and Cloudflare R2 together allows us to deliver data to our customers reliably and cost-effectively. We deliver the highest quality blockchain data to our customers in their preferred environment, while minimizing our storage and egress costs, saving up to $645K per year. Plus, this gives us both the control and security to scale our offerings sustainably."
Allium uses this integration to maximize their cost savings (see diagram below) by persisting the blockchain data using Delta UniForm (Delta Lake Universal Format), a seamless way of unifying Parquet table formats without creating additional copies. Allium enables Apache Iceberg and Delta connectors that read the data stored in Cloudflare R2. They also implement Delta Sharing to seamlessly and securely share their data across regions and platforms, all with zero egress costs for outbound transfers.
"Combining both Delta Sharing and Cloudflare R2 together allows us to deliver data to our customers reliably and cost-effectively. We deliver the highest quality blockchain data to our customers in their preferred environment, while minimizing our storage and egress costs, saving up to $645K per year."— Ethan Chan, Co-Founder and CEO of Allium
Allium also recently expanded its product line to share its Ethereum Realtime Data, now listed on Databricks Marketplace. This dataset supports users within the cryptocurrency space sharing valuable insights about Ethereum's dynamics. Available for purchase, it includes several details about Ethereum's blockchain, including smart contracts, NFT and decentralized finance (DeFi) markets, and more.
Key industry use cases
Another example of a type of customer that can benefit from using Delta Sharing and Cloudflare R2 is a data aggregator using a commonly used 'hub and spoke' architectural pattern. A data aggregator specializes in collecting and merging data from diverse sources into a unified, cohesive dataset. A 'hub and spoke' data sharing scenario is defined as one-to-many, where one organization shares with many clients. These data aggregators specialize in collecting, merging and sharing datasets to various clients across different regions, clouds, and platforms. However, these organizations face a common challenge— how to scale data sharing in a cost-effective and predictable way. Ideally, they are able to benefit from economies of scale, so that as their number of clients increases, the sharing cost should only increase marginally. In addition, they don't want to have any dependency on their clients adopting data replication for cost savings, but solely be in control of managing the costs with a predictable approach.
Industries that typically use data aggregators include financial services, healthcare and life sciences, and media and entertainment. Sharing data helps drive critical business needs such as decision-making, market analysis, research, and supporting overall business operations. For example, data aggregators play a crucial role in powering various financial applications and services, such as budgeting apps, investment platforms, lending solutions, and more by securely accessing and analyzing users' financial information. See table below for some industry-specific use cases.
Industry | Data Aggregator Use Case | Use Case Details |
---|---|---|
Media and Entertainment | Content Archiving | Aggregators can be used to archive content systematically, making it easier for media companies to share their content with partners and customers to access and repurpose their historical content for new audiences or platforms. |
Financial Services | Credit Scoring and Risk Assessment | Data aggregators provide insights into users' financial behavior, such as spending patterns, income levels, and debt obligations. This information is shared and can be used by lenders and financial institutions to assess credit risk and help them make lending decisions based on overall credit ratings. |
Healthcare and Life Sciences | Commercial Effectiveness | Healthcare data aggregators can provide clinical prescription data to hospitals, healthcare providers, pharmaceutical companies, and research institutions for analysis and usage in many different ways. This could include identifying new markets to enter, measuring sales channel dynamics, or buying patterns in retail pharmacies or hospitals. |
Calculate savings and when to implement a joint solution
Cloud egress costs generally scale proportionally with the volume of data queried from the data share. The diagram below shows that as the number of queries (and volume of data) increases, so does the egress cost. Customers can use this approach to compare different storage solutions and quantify the cost-benefit of using Cloudflare R2's solution, which doesn't introduce any egress cost. As the diagram below highlights, Cloudflare R2's solution can lead to significant savings relative to other cloud storage solutions.
For example, based on standard pricing assumptions, the analysis below indicates that data assets whose data transfer activities exceed 26% across different clouds or 85% across regions on a monthly basis can benefit from significant monthly savings on both storage and egress costs.1
Test drive Delta Sharing and Cloudflare R2
Delta Sharing and Cloudflare R2 are now available in Public Preview. To implement the joint solution, you don't have to migrate all your data to Cloudflare R2 (see related blog, Architecting Global Data Collaboration with Delta Sharing). You only need to replicate the shared data once to R2, in three easy steps (see the diagram below):
- Add Cloudflare R2 as an external storage location
- Create new tables, volumes, or ML models in Cloudflare R2, and sync data incrementally using Deep Clone
- Create a Delta Share, as usual on the R2 table
Refer to the technical documentation for more details. You can also provide feedback to our team at [email protected].
Using Delta Sharing with Cloudflare R2, you can now benefit from a new approach to share data and AI across platforms, clouds and regions, with zero egress costs, no vendor lock-in, and without compromising on security and governance.
Learn more about how to integrate Delta Sharing into your data collaboration strategy with the latest resources:
- Read O'Reilly technical guide, Data Sharing and Collaboration with Delta Sharing (early release)
- Dive deeper into the Databricks Delta Sharing documentation.
- Read more about Delta Sharing: an open standard for secure data sharing
- Watch video announcement for Delta Sharing with Matei Zaharia (Keynote Data + AI Summit 2021)
1 The cost savings calculation was based on the assumption that 10% of the data is refreshed monthly, and data is replicated to Cloudflare R2 for sharing purpose while keeping the original copy in S3.