Kerberizing Spark - Databricks

Kerberizing Spark

Download Slides

Spark had been elected, deservedly, as the main massive parallel processing framework, and HDFS is the one of the most popular Big Data storage technologies. Therefore its combination is one of the most usual Big Data’s use cases. But, what happens with the security? Can these two technologies coexist in a secure environment? Furthermore, with the proliferation of BI technologies adapted to Big Data environments, that demands that several users interacts with the same cluster concurrently, can we continue to ensure that our Big Data environments are still secure? In this lecture, Abel and Jorge will explain which adaptations of Spark´s core they had to perform in order to guarantee the security of multiple concurrent users using a single Spark cluster, which can use any of its cluster managers, without degrading the outstanding Spark’s performance.

« back
About Jorge Lopez-Malla

Jorge has been involved in the inception and implementation of projects related to several fields such as digital media, telcos, banks & insurance companies. He is in charge of Stratio’s Big Data training, having been one of the first engineers to become Spark certified

About Abel Rincon

Abel has a wide experience developing high concurrency systems. He has been involved in several projects such as Sparta (real time aggregation engine based on spark), nowadays he focuses on security in big data environments, and he is developing a unified authentication, authorization and audit system