Skip to main content
Engineering blog

Stepping into this brave new digital world we are certain that data will be a central product for many organizations. The way to convey their knowledge and their assets will be through data and analytics. During the Data + AI Summit 2021, Databricks announced Delta Sharing, the world's first open protocol for secure and scalable real-time data sharing. This simple REST secure data sharing protocol can become a differentiating factor for your data consumers and the ecosystem you are building around your data products.

Delta Sharing, the world's first open protocol for secure and scalable real-time data sharing.

Since the preview launch, we have seen tremendous engagement from customers across industries to collaborate and develop a data-sharing solution fit for all purposes and open to all. Customers have already shared petabytes of data using the Delta Sharing REST APIs. Through our customer conversations, there is a lot of anticipation of how Delta Sharing can be extended to non-tabular assets, such as machine learning experiments and models.

Arcuate - a Databricks Labs project that extends Delta Sharing for ML

Platforms like MLflow have emerged as a go-to option for many data scientists, ensuring smooth transition/experience when managing the machine learning lifecycle. MLflow is an open-source platform developed by Databricks to manage the ML lifecycle, including experimentation, reproducibility, deployment, and a central model registry.

Due to MLflow ubiquity, Arcuate combines MLflow with Delta Lake to leverage Delta Sharing capabilities to enable machine learning models exchange.

Using Delta Sharing also allows Arcuate to share other relevant metadata such as training parameters, model accuracy, artifacts, etc.

The project name takes inspiration from the term, arcuate delta - the wide fan-shaped river delta. We believe that enabling model exchange will have a wide impact on many digitally connected industries.

How Arcuate works with Delta Sharing to share machine learning models

How it works

Arcuate is provided as a Python library that can be installed on a Databricks cluster, or on your local machine. It integrates directly with MLflow, offering options to extract either an MLflow experiment, or an MLflow model into a Delta table. These tables are then shared via Delta Sharing (how it works), allowing recipients to load them into their own MLflow server.

For simplicity, Arcuate comes with two sets of APIs for both providers & recipients:

  • Python APIs to be used in any Python programs.
  • IPython magic %arcuate that provides SQL syntax in a notebook.

The end-to-end workflow would look like this:

  • Experiment or train models in any environment (including Databricks), store it in MLflow
  • Add an MLflow experiment to a Delta Sharing share:
# export the experiment experiment_name to table_name, and add it to share_name
export_experiments(experiment_name, table_name, share_name)
  • Add an MLflow model to a Delta Sharing share:
  • # export the model model_name to table_name, and add it to share_name
    export_models(model_name, table_name, share_name)  
  • Recipients can then load MLflow models/experiments seamlessly:
  • df = delta_sharing.load_as_pandas(delta_sharing_coordinate)
    
    # import the shared table as experiment_name
    import_experiments(df, experiment_name)
    df = delta_sharing.load_as_pandas(delta_sharing_coordinate)
    
    # import the model
    import_models(df, model_name)

    Roadmap

    This first version of Arcuate is just a start. As we develop the project, we can extend the implementation to sharing other objects, such as dashboards or arbitrary files. We believe that the future of data sharing is open, and we are thrilled to bring this approach to other sharing workflows.

    Getting started with Arcuate

    With Delta Sharing, for the first time ever, we have a data sharing protocol that is truly open. Now with Arcuate, we are able to have an open ML model sharing protocol.

    We will soon release Arcuate as a Databricks Labs project, so please keep an eye out for it. To try out the open source project Delta Sharing release, follow the instructions at delta.io/sharing. Or, if you are a Databricks customer, sign up for updates on our service. We are very excited to hear your feedback!

    Try Databricks for free

    Related posts

    Platform blog

    Introducing Delta Sharing: An Open Protocol for Secure Data Sharing

    Update: Delta Sharing is now generally available on AWS and Azure. Get an early preview of O'Reilly's new ebook for the step-by-step guidance...
    Platform blog

    Top Three Data Sharing Use Cases With Delta Sharing

    Update: Delta Sharing is now generally available on AWS and Azure. Data sharing has become an essential component to drive business value as...
    Engineering blog

    Extending Delta Sharing to Google Cloud Storage

    This blog article has been cross-posted from the Delta.io blog . We are excited for the release of Delta Sharing 0.4.0 for the...
    See all Machine Learning posts