CI/CD Templates: Continuous Delivery of ML-Enabled Data Pipelines on Databricks

Download Slides

Data & ML projects bring many new complexities beyond the traditional software development lifecycle. Unlike software projects, after they were successfully delivered and deployed, they cannot be abandoned but must be continuously monitored if model performance still satisfies all requirements. We can always get new data with new statistical characteristics that can break our pipelines or influence model performance. All these qualities of data & ML projects lead us to the necessity of continuous testing and monitoring of our models and pipelines.

In this talk we will show how CI/CD Templates can simplify these tasks: bootstrap new data project within a minute, set up CI/CD pipeline using GitHub Actions, implement integration tests on Databricks. All this is possible because of conventions introduced by CI/CD Templates which helps automate deployments & testing of abstract data pipelines and ML models.

Speakers: Michael Shtelma and Ivan Trusov


 
Watch more Data + AI sessions here
or
Try Databricks for free
« back
About Michael Shtelma

Databricks

Databricks Senior Solutions Architect and ex-Teradata Data Engineer with focus on operationalizing Machine Learning workloads in cloud.

About Ivan Trusov

Databricks

I'm a Solutions Architect at Databricks, helping our customers to solve toughest data problems using Unified Data Analytics Platform. My main topics of experience are Apache Spark, machine learning and data processing applications.