Ankur Pathela is a software engineer in the Data Platform team at Facebook, where he works on supporting batch workloads on Apache Spark. Ankur drives threat model analysis and vulnerability fixes for spark applications at Facebook. Ankur loves working on distributed and large scale systems, and previously worked supporting data ingestion while on the team building Experience Platform at Adobe. Ankur received his Bachelors degree from the Indian Institute of Technology.
June 23, 2020 05:00 PM PT
At Facebook, Apache Spark handles large batch workloads which at times may deal with sensitive data that require protection and isolation covering all surfaces of authentication, authorization, and encryption. With jobs from multiple teams running across data-centers and geo-distributed regions, spark actors (driver, executors, shuffle service) need to securely communicate over networks spanning large geographical areas. Spark at FB also operates in a multi-tenant environment with strict access control policies that need to be enforced to guarantee data protection and job isolation. Operating at this scale presents several scalability challenges and we'll share our approach to solving a few such challenges in this talk.
More specifically, as part of this talk, we'll share how we deployed TLS encryption for Spark jobs to secure data in transit over an untrusted network, and discuss the implications and overhead of doing so. In addition to this, we'll cover how tenant isolation, security and fine-grained access control (i.e., row/column level security) are designed and implemented, along with our work on scaling the generation and validation of signed access tokens and jobs resource distribution (files, archives and jars).