Session

Anomaly Detection at Apple for Large-Scale Data Using Apache Spark and Flink

Overview

Experience	In Person
Type	Breakout
Track	Artificial Intelligence
Industry	Enterprise Technology, Media and Entertainment, Financial Services
Technologies	Apache Spark, AI/BI, Apache Iceberg
Skill Level	Beginner
Duration	40 min

Anomaly detection in time series data is crucial for identifying unusual patterns and trends, enabling better alerting and action when data deviates from normal. Most anomaly detection algorithms perform adequately on a single node machine with public datasets, but do not scale well with distributed processing frameworks used in modern big data environments. This talk will focus on how we scaled anomaly detection for large-scale datasets using Apache Spark and Flink for both batch and near real-time use cases. We will also discuss how we leveraged Apache Spark to parallelize and scale common anomaly detection algorithms, enabling support for large-scale data processing. We'll highlight some of the challenges faced and how we resolved them to make it useful for massive datasets with varying degree of anomalies. Finally, we will demonstrate how our anomaly detection framework works in batch for petabytes of data and in streaming mode for hundreds of thousands of transactions per second.

Anomaly Detection at Apple for Large-Scale Data Using Apache Spark and Flink

Overview

Session Speakers

Anupam Panwar

Himadri Pal