In the last blog "Databricks Lakehouse and Data Mesh," we introduced the Data Mesh based on the Databricks Lakehouse. This blog will explore how the Databricks Lakehouse capabilities support Data Mesh from an architectural point of view.
Data Mesh is an architectural and organizational paradigm, not a technology or solution you buy. However, to implement a Data Mesh effectively, you need a flexible platform that ensures collaboration between data personas, delivers data quality, and facilitates interoperability and productivity across all data and AI workloads.
Let's look at how the capabilities of Databricks Lakehouse Platform address these needs.
The basic building block of a data mesh is the data domain, usually comprised of the following components:
This is depicted in the figure below:
To facilitate cross-domain collaboration and self-service analytics, common services around access control mechanisms and data cataloging are often centrally provided. For example, Databricks Unity Catalog provides not only informational cataloging capabilities such as data discovery and lineage, but also the enforcement of fine-grained access controls and auditing desired by many organizations today.
Data Mesh can be deployed in a variety of topologies. Outside of modern digital-native companies, a highly decentralized Data Mesh with fully independent domains is usually not recommended as it leads to complexity and overhead in domain teams rather than allowing them to focus on business logic and high quality data. Two popular examples often seen in enterprises are the Harmonized Data Mesh and the Hub & Spoke Data Mesh.
A harmonized data mesh emphasizes autonomy within domains:
The implications of a harmonized approach may include:
This approach may be challenging in global organizations where different teams have different breadth and depth in skills and may find it difficult to stay fully in sync with the latest practices and policies.
A Hub & Spoke Data Mesh incorporates a centralized location for managing shareable data assets and data that does not sit logically within any single domain:
The implications for a Hub and Spoke Data Mesh include:
In both of these approaches, domains may also have common and repeatable needs such as:
Having a centralized pool of skills and expertise, such as a center of excellence, can be beneficial both for repeatable activities common across domains as well as for infrequent activities requiring niche expertise that may not be available in each domain.
It is also perfectly feasible to have some variation between a fully harmonized data mesh and a hub-and-spoke model. For example, having a minimal global data hub to only host data assets that do not logically sit in a single domain and to manage externally acquired data that is used across multiple domains. Unity Catalog plays the pivotal role of providing authenticated data discovery wherever data is managed within a Databricks deployment.
Independent of the type of Data Mesh logical architecture deployed, many organizations will face the challenge of creating an operating model that spans cloud regions, cloud providers, and even legal entities. Furthermore, as organizations evolve towards the productization (and potentially even monetization) of data assets, enterprise-grade interoperable data sharing remains paramount for collaboration not only between internal domains but also across companies.
Delta Sharing offers a solution to this problem with the following benefits:
Data Mesh and Lakehouse both arose due to common pain points and shortcomings of enterprise data warehouses and traditional data lakes[1][2]. Data Mesh comprehensively articulates the business vision and needs for improving productivity and value from data, whereas the Databricks Lakehouse provides an open and scalable foundation to meet those needs with maximum interoperability, cost-effectiveness, and simplicity.
In this article, we emphasized two example capabilities of the Databricks Lakehouse platform that improve collaboration and productivity while supporting federated governance, namely:
However, there are a plethora of other Databricks features that serve as great enablers in the Data Mesh journey for different personas. For example:
To find out more about Lakehouse for Data Mesh: