Session

Anomaly Detection at Apple for Large-Scale Data Using Apache Spark and Flink

Overview

ExperienceIn Person
TypeBreakout
TrackArtificial Intelligence
IndustryEnterprise Technology, Media and Entertainment, Financial Services
TechnologiesApache Spark, AI/BI, Apache Iceberg
Skill LevelBeginner
Duration40 min

Anomaly detection in time series data is crucial for identifying unusual patterns and trends, enabling better alerting and action when data deviates from normal. Most anomaly detection algorithms perform adequately on a single node machine with public datasets, but do not scale well with distributed processing frameworks used in modern big data environments. This talk will focus on how we scaled anomaly detection for large-scale datasets using Apache Spark and Flink for both batch and near real-time use cases. We will also discuss how we leveraged Apache Spark to parallelize and scale common anomaly detection algorithms, enabling support for large-scale data processing. We'll highlight some of the challenges faced and how we resolved them to make it useful for massive datasets with varying degree of anomalies. Finally, we will demonstrate how our anomaly detection framework works in batch for petabytes of data and in streaming mode for hundreds of thousands of transactions per second.

Session Speakers

Anupam Panwar

/Senior Machine Learning Engineer
Apple Inc

Himadri Pal

/Principal Software Engineer
Apple Inc