Skip to main content

We are pleased to share the strategic partnership announcement between Dell and Databricks from the Dell Technologies World 2023 opening keynote last week. Our joint customers now have easy access to data stored within Dell's Elastic Cloud Storage (ECS) from the Databricks Lakehouse Platform, whether in the public cloud, on-premises, or in a private cloud. Additionally, Delta Sharing streamlines the data sharing, movement and egress management.

Harnessing Your Data to Drive Business Insights with Versatile Options and Capabilities

Organizations can combine the power of the cloud through the Databricks Lakehouse Platform with the control and cost-efficiency of data stored in Dell ECS object storage on-premises or in a colocation facility. Databricks can now perform data engineering, data science, and data warehousing analytics directly on ECS object store as detailed in this whitepaper.

With Databricks, customers get an open, unified platform for data, analytics and AI that enables organizations to share, process and analyze large amounts of data quickly and easily. Dell ECS offers a robust, secure, object storage platform tailored for enterprise-level organizations that handle large volumes of unstructured data deployable on-premises, in a private cloud, or in a hybrid environment. It's built on a distributed architecture that enables it to scale horizontally across multiple nodes, allowing it to accommodate large amounts of data and support high-performance workloads.

Dell ECS compatibility with Databricks Lakehouse Platform and all leading cloud providers (AWS, GCP and Azure) enhances flexibility, enabling organizations to analyze data in the cloud, on-premises, and in private cloud environments, providing versatility and paving the way for new, innovative options to drive business insights.

Breaking Down the Architecture

Extending Your Data Workloads to On-premises

Many organizations have numerous and large enterprise grade workloads where large scale distributed compute is required to meet SLAs on the ETL, AI, and warehousing demands can saturate the bandwidth on a networking connection whether it's from cloud adjacent or sitting in Dell ECS on-premises storage. For these types of workloads, simply land your raw data in cloud storage to take advantage of the massively scalable compute in Databricks, curate data products, then Delta Share the curated data products. This allows for egress on demand, meaning you are only pulling the data that is needed for the use case. Providing a simple solution to prevent data duplication and unnecessary data movement while taking advantage of data locality requirements to meet SLAs in a cost-effective manner.

Organizations also utilize tried and true Delta Sharing patterns to realize Multi-Cloud and cross-region data solutions and now these patterns can be applied to on premises as well with Dell ECS storage.

Figure 1: Extending your data workloads to on-premises
Figure 1: Extending your data workloads to on-premises

Cloud Adjacent Technologies

Technologies like private cloud and colocated hardware allow for high bandwidth, low latency connectivity to Databricks. Furthermore, private clouds enable connectivity to multiple clouds. Therefore, organizations can use Databricks to perform ETL, data science, and data warehousing analytics directly on ECS object store across one or multiple clouds simultaneously allowing for the best of both worlds.

Figure 2: Connecting to multiple clouds
Figure 2: Connecting to multiple clouds

Art of the Possible

Unlocking on-premises data storage opens the door to new opportunities, such as:

  • Data Resiliency - Because Dell ECS is directly accessible from Databricks in AWS, Azure, or GCP one copy of the data can be used across multiple cloud environments. When there is a low latency, high bandwidth pipe between Databricks and the Dell ECS instance such as with Faction, Databricks compute can perform ETL, Data Science, and Data Warehousing activities directly with Dell ECS. For High Availability and Disaster Recovery planning, leveraging ECS enables a simple, cost-effective strategy to jump between Databricks instances in different clouds.
  • Remote and Air-Gapped Solutions - Local analytics on big data workloads can provide on-site insights using Dell ECS storage. Bring the lakehouse directly to remote sites for low latency analytics. For large scale aggregate analytics, data from remote sites can be fed directly to a central hub without transformation or modification.
  • Cost Optimization - The partnership between Dell and Databricks provides organizations more freedom to meet basic demand with inexpensive upfront infrastructure and scale past base demand with scalable cloud computing. This helps to meet the requirements for more workloads in a cost-effective manner.

Conclusion

Businesses commonly store large amounts of data on-premises for various purposes, and historically this data was only accessible to on-site applications and infrastructure. With the combination of Databricks and Dell ECS you gain hybrid lakehouse functionality, enabling all your data to be accessed for business insights and long-term research. This marks a significant change in how businesses can utilize their data and provides a valuable benefit for those considering multi-cloud architectures to simplify workflows, optimize data investments, and deliver critical business insights.

References

Try Databricks for free

Related posts

See all Platform & Partners posts