Spark Query Service (Job Server) at Ooyala - Databricks

Spark Query Service (Job Server) at Ooyala

Download Slides

We would like to share with you the innovative ways that we use Spark at Ooyala, together with Apache Cassandra, to tackle interactive analytics and OLAP applications.

In particular, we are turning Spark into a Service with our Spark Job Server. The job server has been a big help to our development efforts, providing a single, REST API for:

  • enabling interactive query jobs in long-running SparkContexts with shared RDD data
  • submitting and managing Spark Jobs on both standalone and Mesos clusters
  • tracking and serializing job status, progress, and job results
  • providing a programmatic API for job management scripts and query servers
  • cancelling problematic jobs

We believe the job server could be a significant help to Spark developer productivity everywhere.

About Evan Chan

Evan loves to design, build, and improve bleeding edge distributed data and backend systems using the latest in open source technologies. He has led the design and implementation of multiple big data platforms based on Storm, Spark, Kafka, Cassandra, and Scala/Akka, including a columnar real-time distributed query engine. He is an active contributor to the Apache Spark project, a Datastax Cassandra MVP, and co-creator and maintainer of the open-source Spark Job Server. He is a big believer in GitHub, open source, and meetups, and have given talks at various conferences including Spark Summit, Cassandra Summit, FOSS4G, and Scala Days.

About Kelvin Chu

Kelvin is a founding member of the Hadoop team at Uber. He is creating tools and services on top of Spark to support multi-tenancy and large scale computation-intensive applications. He is creator and lead engineer of Spark Uber Development Kit, Paricon and SparkPlug services which are main initiatives of Spark Compute at Uber. At Ooyala, he was co-creator of Spark Job Server which was an open source RESTful server for submitting, running, and managing Spark jobs, jars and contexts. He implemented real-time video analytics engines on top of it by datacube materializations via RDD.