Skip to main content

Introducing the MeshaVerse: Next-Gen Data Mesh 2.0

Henning Kropp
Ioannis Papadopoulos
Ryan Simpson
Tahir Fayyaz
Share this post

At Databricks, we have a (healthy) obsession with building and finding new ways to address our customers' biggest pain points so that they can unlock new value across all of their data - regardless of their role within an organization. A Data Mesh helps solve some of these challenges by giving teams complete control of their lifecycle while enabling more self-service. The data lakehouse architecture helps organizations drive their Data Mesh journey by enabling a decentralized approach to storing and processing data - while still centralizing security, governance, and discovery.

That’s why today, we’re thrilled to introduce MeshaVerse, a Lakehouse-powered data mesh that gives you full, interactive control over your data via a VR-driven experience. MeshaVerse introduces a new augmented reality layer on top of your data in Delta Lake via rentable rooms in your Virtual Lakehouse. To get started, all you need is a virtual clone of your Delta Lake data using:

CREATE ROOM sales_data
VIRTUAL CLONE source_table_name
LOCATION MeshaVerse/room

MeshaVerse completely abstracts your data from your cloud-based Lakehouse. No data or metadata is actually stored within the MeshaVerse – no more data security challenges or compliance nightmares.

Virtual domain data as a product

On a path to the Data Mesh, we find that many data teams still struggle with discovering and consuming siloed data. To address this, we are shifting to virtual data in a virtual distributed domain-driven architecture.

With the development of a MeshaVerse connector, our engineers built virtual abstracted data rooms of the Lakehouse via an augmented data reality experience across the architectural quantum. This enables data teams to build full abstractions of their data into a virtual data set, creating virtual domain data that can then be consumed using vendor-agnostic VR headsets or smartglasses.

Virtual domain data as a product helps data teams apply rigor to data sets by meeting the following requirements:

  • Discoverable: In a virtual room, data can be discovered using the MeshaVerse VR smartglasses. Via an interactive experience, data scientists, data engineers, and developers can explore virtual data sets with their hands.
  • Addressable: Users can rent rooms in the Lakehouse, making the data directly addressable by their room number.
  • Shareable: Collaboration is core to Databricks. With MeshaVerse, data practitioners can meet in the rooms to explore and share polyglot delta products.
  • Secure: With no data accessible or usable within the MeshaVerse - even with role-based room key cards - security is impenetrable. Minimize security threats while also streamlining regulatory compliance.

How it works

When designing MeshaVerse, our primary focus was on preserving decentralization while ensuring data reliability, data quality, and scale. Our novel approach includes implementing Dymlink, a symlink in the data lakehouse, and a new SlinkSync (Symbolic link Sync), a symlink that links Dymlinks together – similar to a linked list.

By establishing which symlinks can be composed as a set – using either a direct probable or indirect inverse probable match – we are able to infer the convergence criteria of a nondivergent series (i.e the compressed representation of the data) while always ensuring we stay within the gradient of the curve. As a result, we’re able to prevent an infinite recursion that can potentially stale all data retrieval from the Data Mesh. Stay tuned for a future blog, where we’ll dive deeper into this approach.

The integrity of this virtual data is ensured in real-time and at scale using a more recent implementation of Databricks Brickchain, taking advantage of all global compute power and therefore offering the potential to store the entire planet’s data with a fraction of the footprint.

MeshaVerse principles

Data Mesh’s utility is largely due to its core operating principles. In alignment with this approach, we’ve developed our own set of MeshaVerse principles designed to empower data teams and simplify virtual data use cases:

Augmented data ownership and architecture
Domain data that reside in the MeshaVerse is enhanced by MeshaVerse AR-generated perceptual information, sometimes across multiple sensory modalities, including visual, auditory, haptic, somatosensory and olfactory. MeshaVerse AR can be defined as a system that incorporates three basic features: a combination of real data and virtual data, real-time analytics, and accurate 3D registration of virtual data.

Data as a shortcut
Data explosion is real. As businesses accrue exponentially more data, they are faced with data swamps and scaling challenges. The MeshaVerse is systematically designed to reduce the need for data streaming and pipeline building. Via our VR goggles, see eye to eye with your data. Even when all of it is not readily available. No coding required.

Self-serve experience
With an Airbnb-like experience, rent a room inside the MeshaVerse. Either solo or with your entire data team for even more streamlined collaboration. Choose from a selection of pre-designed Lakehouse settings.

Federated computational governance
Symlink representations of your data are stored as cryptographic hashes in a brick within the Brickchain, making it possible for any participating party to validate that the data is completely secure. In a peer-to-peer network of distributed ledgers in the Brickchainmetadata is governed in a federated architecture.

What’s next

The MeshaVerse is the next evolution of the Databricks Lakehouse and accelerates our vision to make Databricks simple, open and multi-reality. That’s why we’ll be launching a new research and development office dedicated to the MeshaVerse. Stay tuned for more details!

Just kidding! Happy April Fool's Day from all of us Databricks!

Try Databricks for free

Related posts

Riding the AI Wave

March 15, 2022 by Danny Healy in
“...incorporating machine learning into a company’s application development is difficult…” It’s been almost a decade since Marc Andreesen hailed that software was eating...
See all Company Blog posts