Data & ML projects bring many new complexities beyond the traditional software development lifecycle. Unlike software projects, after they were successfully delivered and deployed, they cannot be abandoned but must be continuously monitored if model performance still satisfies all requirements. We can always get new data with new statistical characteristics that can break our pipelines or influence model performance. All these qualities of data & ML projects lead us to the necessity of continuous testing and monitoring of our models and pipelines.
In this talk we will show how CI/CD Templates can simplify these tasks: bootstrap new data project within a minute, set up CI/CD pipeline using GitHub Actions, implement integration tests on Databricks. All this is possible because of conventions introduced by CI/CD Templates which helps automate deployments & testing of abstract data pipelines and ML models.
Speakers: Michael Shtelma and Ivan Trusov
Databricks Senior Solutions Architect and ex-Teradata Data Engineer with focus on operationalizing Machine Learning workloads in cloud.
I'm a Solutions Architect at Databricks, helping our customers to solve toughest data problems using Unified Data Analytics Platform. My main topics of experience are Apache Spark, machine learning and data processing applications.