Data is critical to enterprises, serving as the raw material for innovation and advancement. Its importance has grown as organizations become more data- and decision-centric, creating major challenges for organizations trying to keep up. Legacy data lakes and warehouses contribute to this problem, creating silos, reduced data visibility and slow and complicated data processing. These barriers and bottlenecks hamper collaboration and leave valuable data resources unutilized. Enterprises need a new data architecture to make the most of their data. Data Mesh is a modern data architecture that can solve this problem.
A data mesh is a decentralized data architecture that organizes data by business domain—such as marketing, sales or customer service—enabling domain teams to treat their data as a product and making it easier for business users to find, understand and use data from across the organization.
Decentralization is key to Data Mesh. Data is owned and managed independently by multiple business domains, rather than managed centrally by one team for the whole organization — although central rules for governance keep data interoperable, safe and semantically consistent.
Domain data managers are responsible for providing high-quality data products as well as protecting their data. Because they’re only responsible for their domain business data — not the data for the entire organization — they can provide more relevant data faster and more efficiently while maintaining strong data governance.
Data Mesh principles balance business autonomy with global interoperability. The architecture reduces the reliance on centralized teams and avoids data silos while promoting a collaborative environment for teams to cocreate and share data products that generate business value for the organization.
Four principles provide the foundation for a logical Data Mesh architecture:
Traditionally, organizations use a centralized data team to manage data — including storing, formatting, processing and analysis — throughout the business. This ensures consistent data management and governance, but it also creates bottlenecks. Teams quite often escape this centralization by inadvertently creating silos that speed up data decisions. However, it also prevents data users from getting relevant, accurate data in a timely manner. In addition, centralized data and AI teams often have limited understanding of the unique context for domain datasets, so they miss out on opportunities for meaningful data products.
As the volume and value of data continue to grow, centralized data and AI teams are often unable to keep up with demand. This leads to an overwhelmed team, hinders business users from accessing and using the data they need and prevents the organization from realizing the full value of its data.
In a Data Mesh, data management is decentralized and placed in the hands of domain experts who understand the data they’re working with. This results in several benefits:
To create a Data Mesh, organizations must have certain elements in place, including:
Databricks Data Intelligence Platform offers a technological foundation for organizations to adopt a Data Mesh architecture and modernize their data management approach. Databricks is a cloud-native data, analytics and AI platform that combines the performance and features of a data warehouse with the low-cost flexibility and scalability of a modern data lake. Its open architecture offers flexibility in how data is organized and structured, while providing a unified management infrastructure across data and analytics workloads.
The Databricks Platform is organized into units called workspaces that support a domain-centered Data Mesh. Databricks supports multiple workspaces, each corresponding to one or more domains. Each is owned and managed locally and serves as the home for collaboration. Within the workspace, the domain(s) can manage data products using an organizationwide self-service infrastructure.
Databricks provides tools for data management and processing across the lifecycle. It allows both batch and streaming data processing, enabling users to create and manage data products more efficiently. It can also unify table storage formats so that each domain can use its preferred format while maintaining a unified approach to data storage and metadata management.
Databricks’ Unity Catalog, the industry’s only unified and open data governance solution for data and AI, is critical for a Data Mesh. Unity Catalog enables centralized management by integrating governance, security, user management and metadata across workspaces. It provides data cataloging capabilities such as discoverability and lineage as well as enforcement of fine-grained access controls and audit logging. Security and access controls are only managed once, simplifying data governance. Unity Catalog organizes data into catalogs, allowing domain-specific management of data products.
Databricks also provides enterprise-grade interoperable data sharing to support collaboration among internal and external domains. Delta Sharing enables organizations to securely share data with zero copy, regardless of computing platform or cloud region. Delta Sharing provides the foundation for a broad range of external data-sharing activities, including publishing or acquiring data via a data marketplace.
With Unity Catalog and Delta Sharing, Databricks offers organizations flexibility to organize and manage data and analytics at scale. Data can be organized in a Data Mesh or a multi-tenant architecture, supporting both centralized and distributed data management solutions.
Data Mesh architecture offers enterprises a new way to approach data and fully leverage its value. Databricks provides an open, scalable foundation to make this vision a reality, with guaranteed interoperability, cost-effectiveness, governance and simplicity.
