Scalable Monitoring Using Apache Spark and Friends - Databricks

Scalable Monitoring Using Apache Spark and Friends

Download Slides

This session will give a new dimension to Apache Spark’s usage. See how Apache Spark and other open source projects can be used together in providing a scalable, real-time monitoring system. Apache Spark plays the central role in providing this scalable solution, since without Spark Streaming we would not be able to process millions of events in real time. This approach can provide a lot of learning to the DevOps/Infrastructure domain on how to build a scalable and automated logging and monitoring solution using Apache Spark, Apache Kafka, Grafana and some other open-source technologies.
Sony PlayStation’s monitoring pipeline processes about 40 billion events every day, and generates metrics in near real-time (within 30 seconds). All the components, used along with Apache Spark, are horizontally scalable using any auto-scaling techniques, which enhances the reliability of this efficient and highly available monitoring solution. Sony Interactive Entertainment has been using Apache Spark, and specifically Spark Streaming, for the last three years. Hear about some important lessons they have learned. For example, they still use Spark Streaming’s receiver-based method in certain use cases instead of Direct Streaming, and will share the application of both the methods, giving the knowledge back to the community.

Session hashtag: #SFexp12

« back
About Utkarsh Bhatnagar

Utkarsh is currently working as a senior engineer at Tinder. He worked for Sony PlayStation for past 3 years where he designed and implemented a scalable monitoring solution that can process billions of events per day in near-real time. He's passionate about building scalable and automated solutions. He's also an active open source contributor to Grafana and maintains a project named wizzy (cli tool for Grafana). He likes to listen to audio books and play poker occasionally.