Erni Durdevic is a Machine Learning Engineer at Quby, the creator and provider of Toon, a leading European smart home platform. In this role he is responsible for bringing data science algorithms to production. He enjoys pairing with Data Scientists and Data Engineers to transform proof-of-concepts into products running at scale. Erni has a Master degree in Computer Science Engineering and has spent the last three years in the Data Science field.
Energy wastage by residential buildings is a significant contributor to total worldwide energy consumption. Quby, an Amsterdam based technology company, offers solutions to empower homeowners to stay in control of their electricity, gas and water usage. Using Europe's largest energy dataset, consisting of petabytes of IoT data, the company has developed AI-powered products that are used by hundreds of thousands of users daily to maintain a comfortable climate in their homes and reduce their environmental footprint. In this talk, Erni and Stephen will take you on a tour of how Quby leverages the full Databricks stack to quickly prototype, validate, launch and scale data science products. We will explore the technical workflow of a Data Science project from end to end. Starting from developing a notebook prototype and tracking model performance with MLflow, we move towards production-grade Databricks jobs with a CI/CD pipeline and monitoring system in place. We will see how Quby manages more than 1 million models in production, how Delta Lake allows batch and streaming on the same IoT data and the impact these tools have had on the team itself.
Quby is the creator and provider of Toon, a leading European smart home platform. We enable Toon users to control and monitor their homes using both an in-home display and app. As a data driven company, we use machine learning algorithms to generate actionable insights for our end users. We have developed data driven services to ensure that users do not needlessly waste energy and can receive real-time alerts about problems with their heating system. In this talk, Erni will describe our journey of productionizing data science algorithms. We'll take a deep dive into our pipeline and describe our streamlined development and deployment workflow. We'll explain how we define and manage dependencies between jobs in multiple environments (test, acceptance and production) and schedule the pipeline computation. We'll delve into scale challenges, metrics, monitoring and data quality. Also, we will reflect on the lessons learned while building high volume infrastructure that offers multiple data driven services to hundreds of thousands of users. Session hashtag: #SAISML4