Real-Time Detection of Anomalies in the Database Infrastructure using Apache Spark

Download Slides

At CERN, the biggest physics laboratory in the world, large volumes of data are generated every hour, it implies serious challenges to store and process all this data. An important part of this responsibility comes to the database group which not only provides services for RDBMS but also scalable systems as Hadoop, Spark and HBase. Since databases are critical, they need to be monitored, for that we have built a highly scalable, secure and central repository that stores consolidated audit data and listener, alert and OS log events generated by the databases. This central platform is used for reporting, alerting and security policy management. The database group want to further exploit the information available in this central repository to build intrusion detection system to enhance the security of the database infrastructure. In addition, build pattern detection models to flush out anomalies using the monitoring and performance metrics available in the central repository. Finally, this platform also helps us for capacity planning of the database deployment. The audience would get first-hand experience of how to build real time Apache Spark application that is deployed in production. They would hear the challenges faced and decisions taken while developing the application and troubleshooting Apache Spark and Spark streaming application in production.
Session hashtag: #EUde13

« back
About Daniel Lanza

At CERN, the organization that does fundamental particle physics research, Daniel is working on developing and providing Big Data solutions that involve data analytics and machine learning techniques. During his two Degrees and two Masters, where he studied computer science, telecommunications, and Big Data, he's was also interested in Evolutionary Computation, a field of knowledge where he has several publications. His responsibilities range from the deployment of Big Data tools to the development of machine learning algorithms.

About Prasanth Kothuri

Prasanth Kothuri is currently working as Sr Big Data Engineer for CERN in defining and architecting the next generation of Data Analytics platform based on Hadoop and Spark. He's working with various user communities at CERN in building data analytics solutions around Apache Hadoop, Apache Spark, and Apache Kudu for the past 3 years. Before this, he was an Oracle Database specialist for a decade, covering all areas from performance tuning to upgrading databases and disaster recovery to securing databases.