Skip to main content

With its rich open source ecosystem and approachable syntax, Python has become the main programming language for data engineering and machine learning. Data and ML engineers already use Databricks to orchestrate pipelines using Python notebooks and scripts. Today, we are proud to announce that Databricks can now run Python wheels, making it easy to develop, package and deploy more complex Python data and ML pipeline code.

Python wheel tasks can be executed on both interactive clusters and on job clusters as part of jobs with multiple tasks. All the output is captured and logged as part of the task execution so that it is easy to understand what happened without having to go into cluster logs.

The wheel package format allows Python developers to package a project’s components so they can be easily and reliably installed in another system. Just like the JAR format in the JVM world, a wheel is a compressed, single-file build artifact, typically the output of a CI/CD system. Similar to a JAR, a wheel contains not only your source code but references to all of its dependencies as well.

To run a Job with a wheel, first build the Python wheel locally or in a CI/CD pipeline, then upload it to cloud storage. Specify the path of the wheel in the task and choose the method that needs to be executed as the entrypoint. Task parameters are passed to your main method via *args or **kwargs.

Python Wheel tasks in Databricks Jobs are now Generally Available.

Python Wheel tasks in Databricks Jobs are now Generally Available. We would love for you to try out this capability and tell us how we can better support Python data engineers.

Try Databricks for free

Related posts

Need for Data-centric ML Platforms

This blog is the first in a series on MLOps and Model Governance. The next blog will be by Joseph Bradley and will...

Using MLOps with MLflow and Azure

October 13, 2020 by Oliver Koernig in
The blog contains code examples in Azure Databricks, Azure DevOps and plain Python. Please note that much of the code depends on being...

Automate continuous integration and continuous delivery on Databricks using Databricks Labs CI/CD Templates

CONTENTS Overview Why do we need yet another deployment framework? Simplifying CI/CD on Databricks via reusable templates Development lifecycle using Databricks Deployments How...
See all Platform Blog posts