Romain Rigaux - Databricks

Romain Rigaux

Software Engineer, Cloudera

Romain is an engineer at Cloudera and the Lead of Hue. Before he worked on distributed systems at Yahoo! and Google and has been building Web apps since the early days.


A Web application for interactive data analysis with Spark

How to build and use a Web application for interactive data analysis with Spark A Hue Spark application was recently created. It lets users execute and monitor Spark jobs directly from their browser and be more productive. The Spark Application is based on Spark Job Server contributed by Ooyala at the last Spark Summit 2013. This new server will enable a real interactivity with Spark and is closer to the community. This talk will describe the architecture of the application and demo several business use cases now made easy with this application.

Building a REST Job Server for interactive Spark as a service

Livy is a new open source Spark REST Server for submitting and interacting with your Spark jobs from anywhere. Livy is conceptually based on the incredibly popular IPython/Jupyter, but implemented to better integrate into the Hadoop ecosystem with multi users. Spark can now be offered as a service to anyone in a simple way: Spark shells in Python or Scala can be ran by Livy in the cluster while the end user is manipulating them at his own convenience through a REST api. Regular non-interactive applications can also be submitted. The output of the jobs can be introspected and returned in a tabular format, which makes it visualizable in charts. Livy can point to a unique Spark cluster and create several contexts by users. With YARN impersonation, jobs will be executed with the actual permissions of the users submitting them. Livy also enables the development of Spark Notebook applications. Those are ideal for quickly doing interactive Spark visualizations and collaboration from a Web browser! This talk is technical and details the architecture and design decisions taken for developing this server, as well as its internals. It also describes the alternatives we tried and the challenges that were faced. The capabilities of Livy will then be lived demo in Hue's Notebook Application through a real life scenario.

Related Articles:
  • Introducing DataFrames in Apache Spark for Large Scale Data Science
  • Getting Started with Apache Spark on Databricks