Anomaly Detection at Apple for Large-Scale Data Using Apache Spark and Flink
Overview
Experience | In Person |
---|---|
Type | Breakout |
Track | Artificial Intelligence |
Industry | Enterprise Technology, Media and Entertainment, Financial Services |
Technologies | Apache Spark, AI/BI, Apache Iceberg |
Skill Level | Beginner |
Duration | 40 min |
Anomaly detection in time series data is crucial for identifying unusual patterns and trends, enabling better alerting and action when data deviates from normal. Most anomaly detection algorithms perform adequately on a single node machine with public datasets, but do not scale well with distributed processing frameworks used in modern big data environments. This talk will focus on how we scaled anomaly detection for large-scale datasets using Apache Spark and Flink for both batch and near real-time use cases. We will also discuss how we leveraged Apache Spark to parallelize and scale common anomaly detection algorithms, enabling support for large-scale data processing. We'll highlight some of the challenges faced and how we resolved them to make it useful for massive datasets with varying degree of anomalies. Finally, we will demonstrate how our anomaly detection framework works in batch for petabytes of data and in streaming mode for hundreds of thousands of transactions per second.
Session Speakers
Anupam Panwar
/Senior Machine Learning Engineer
Apple Inc
Himadri Pal
/Principal Software Engineer
Apple Inc