Leandro Almeida

Data Scientist, Whylabs

Leandro Almeida leads the data science team at WhyLabs, the AI Observability company on a mission to build the interface between AI and human operators. Prior to WhyLabs, Leandro helped build one the first web-based Deep Learning tools for analysis of Histological and Cytological slides, along with building and deploying Multi-Modal Recommendation models for retail. With over 15 years of ML/AI experience, Leandro combines technical and executive leadership, and helped organizations build AI and data science capabilities, while establishing their ML foundation. Leandro received his PhD in Theoretical Physics from Stony Brook University, he has over 14 publications and has given numerous talks on topics ranging from physics to neuroscience to machine learning.

Past sessions

As organizations launch complex multi-modal models into human-facing applications, data governance becomes both increasingly important, and difficult. Specifically, monitoring the underlying ML models for accuracy and reliability becomes a critical component of any data governance system. When complex data, such as image, text and video, is involved, monitoring model performance is particularly problematic given the lack of semantic information. In industries such as health care and automotive, fail-safes are needed for compliant performance and safety but access to validation data is in short supply, or in some cases, completely absent. However, to date, there have been no widely accessible approaches for monitoring semantic information in a performant manner.
In this talk, we will provide an overview of approximate statistical methods, how they can be used for monitoring, along with debugging data pipelines for detecting concept drift and out-of-distribution data in semantic-full data, such as images. We will walk through an open source library, whylogs, which combines Apache Spark and novel approaches to semantic data sketching. We will conclude with practical examples equipping ML practitioners with monitoring tools for computer vision, and semantic-full models.

