Interactive Graph Analytics with Spark

Download Slides

The Spark community has a lot of experience using Spark for offline batch analysis tasks coming from a broad range of use cases. But creating an interactive web application which aims for sub-second response times using Spark as the computation backend is still a somewhat unexplored territory. We at Lynx Analytics wandered into this territory when we built Kite, our big graph analysis tool. The tool enables users to interactively explore graphs of hundreds of millions of vertices and billions of edges. Exploration includes global and local views of the graph featuring visualization of attributes, connections and distributions. This talk is about the technical challenges – general and domain specific – we faced during building this software and about our solutions. We will talk about problems like scheduler delay, GC pauses, interoperability with other Akka based libraries and solutions like sorted RDDs, prefix sampling, and column based attribute representation.