Data Mesh
Data Mesh
Data is critical to enterprises, serving as the raw material for innovation and advancement. Its importance has grown as organizations become more data- and decision-centric, creating major challenges for organizations trying to keep up. Legacy data lakes and warehouses contribute to this problem, creating silos, reduced data visibility and slow and complicated data processing. These barriers and bottlenecks hamper collaboration and leave valuable data resources unutilized. Enterprises need a new data architecture to make the most of their data. Data Mesh is a modern data architecture that can solve this problem.
What is a Data Mesh?
Data Mesh is an organizational data architecture for managing data at scale and deriving more value from that data.
Decentralization is key to Data Mesh. Data is owned and managed independently by multiple business domains, rather than managed centrally by one team for the whole organization — although central rules for governance keep data interoperable, safe and semantically consistent.
Domain data managers are responsible for providing high-quality data products as well as protecting their data. Because they’re only responsible for their domain business data — not the data for the entire organization — they can provide more relevant data faster and more efficiently while maintaining strong data governance.
Data Mesh principles balance business autonomy with global interoperability. The architecture reduces the reliance on centralized teams and avoids data silos while promoting a collaborative environment for teams to cocreate and share data products that generate business value for the organization.
Here’s more to explore
Data Mesh architecture principles
Four principles provide the foundation for a logical Data Mesh architecture:
- Domain ownership: Data Mesh uses a distributed architecture in which domain teams retain full responsibility and autonomy for their data throughout its lifecycle. These domain teams are made up of different departments or functions within an organization, such as sales or accounting, each producing its own data. Domain ownership ensures data is owned by the users most familiar with the data.
- Data as a product: Data is treated as a product and the teams and departments within an organization are treated as customers. The organization applies product management principles to the data analytics lifecycle, ensuring quality data is provided to data consumers. Data products need to be discoverable, trustworthy, self-describing, addressable and interoperable. Besides data and metadata, they can contain code, dashboards, features, models and other assets needed to create and maintain the data product.
- Self-service infrastructure platform: While domain teams manage their own data products, the organization uses a harmonized, automated platform to build, run and maintain interoperable data products. Providing standard tools within the framework of a self-service platform enables scalability of the Data Mesh architecture.
- Federated governance: This principle ensures central, consistent data governance across domains. Compliance is tracked and managed centrally via a data catalog, data governance tools and automated policy enforcement. This ensures a data ecosystem that adheres to organizational rules and industry regulations.
Data Mesh benefits
Traditionally, organizations use a centralized data team to manage data — including storing, formatting, processing and analysis — throughout the business. This ensures consistent data management and governance, but it also creates bottlenecks. Teams quite often escape this centralization by inadvertently creating silos that speed up data decisions. However, it also prevents data users from getting relevant, accurate data in a timely manner. In addition, centralized data and AI teams often have limited understanding of the unique context for domain datasets, so they miss out on opportunities for meaningful data products.
As the volume and value of data continue to grow, centralized data and AI teams are often unable to keep up with demand. This leads to an overwhelmed team, hinders business users from accessing and using the data they need and prevents the organization from realizing the full value of its data.
In a Data Mesh, data management is decentralized and placed in the hands of domain experts who understand the data they’re working with. This results in several benefits:
- Speed and simplicity: Users can access the right data faster by directly contacting domain managers for requests, changes and approvals.
- High-quality data products: Domain data managers create more relevant and higher-quality products that bring value to business users.
- Improved discovery: While management and access are decentralized, all data is recorded and governed centrally, preventing silos and making data easier to find.
- Cost and performance efficiency: Distributed data architecture fosters adoption of real-time data streaming and improves visibility into resource allocation and storage, resulting in more efficiency, better financial planning and lower costs.
- Stronger governance: Federated security and compliance policies are enforced within domains as well as between them. Monitoring and auditing are centralized to ensure consistent adherence.
Data Mesh building blocks
To create a Data Mesh, organizations must have certain elements in place, including:
- A comprehensive data products strategy that sets common standards and processes such as a global blueprint for data product contracts, a publishing platform for data discovery and centralized governance processes and authority, and provides a self-service experience to its users.
- A harmonized platform where all data resides and is ready for all different kinds of analytics workloads, such as a data intelligence platform.
- A flexible platform that ensures collaboration between different data personas, delivers data quality and facilitates interoperability and productivity across all data and AI workloads.
- Centrally managed data governance services around access control and data cataloging to facilitate cross-domain collaboration and self-service analytics.
- A federated sharing layer that allows seamless sharing of data across domains.
- For many organizations, there is also a need to consider how data can be securely shared with external parties.
Adopting a Data Mesh with Databricks Data Intelligence Platform
Databricks Data Intelligence Platform offers a technological foundation for organizations to adopt a Data Mesh architecture and modernize their data management approach. Databricks is a cloud-native data, analytics and AI platform that combines the performance and features of a data warehouse with the low-cost flexibility and scalability of a modern data lake. Its open architecture offers flexibility in how data is organized and structured, while providing a unified management infrastructure across data and analytics workloads.
The Databricks Platform is organized into units called workspaces that support a domain-centered Data Mesh. Databricks supports multiple workspaces, each corresponding to one or more domains. Each is owned and managed locally and serves as the home for collaboration. Within the workspace, the domain(s) can manage data products using an organizationwide self-service infrastructure.
Databricks provides tools for data management and processing across the lifecycle. It allows both batch and streaming data processing, enabling users to create and manage data products more efficiently. It can also unify table storage formats so that each domain can use its preferred format while maintaining a unified approach to data storage and metadata management.
Databricks’ Unity Catalog, the industry’s only unified and open data governance solution for data and AI, is critical for a Data Mesh. Unity Catalog enables centralized management by integrating governance, security, user management and metadata across workspaces. It provides data cataloging capabilities such as discoverability and lineage as well as enforcement of fine-grained access controls and audit logging. Security and access controls are only managed once, simplifying data governance. Unity Catalog organizes data into catalogs, allowing domain-specific management of data products.
Databricks also provides enterprise-grade interoperable data sharing to support collaboration among internal and external domains. Delta Sharing enables organizations to securely share data with zero copy, regardless of computing platform or cloud region. Delta Sharing provides the foundation for a broad range of external data-sharing activities, including publishing or acquiring data via a data marketplace.
With Unity Catalog and Delta Sharing, Databricks offers organizations flexibility to organize and manage data and analytics at scale. Data can be organized in a Data Mesh or a multi-tenant architecture, supporting both centralized and distributed data management solutions.
Data Mesh architecture offers enterprises a new way to approach data and fully leverage its value. Databricks provides an open, scalable foundation to make this vision a reality, with guaranteed interoperability, cost-effectiveness, governance and simplicity.