Luc has been working on the JVM since 2002, first for IBM on the Eclipse project, in the Debugger team, where he wrote the expression evaluation engine. After a few other Eclipse projects, he went to TomTom, to recreate their data distribution platform for over-the-air services. He joined Typesafe in 2011 to work on the Eclipse plugin for Scala. Luc then switched to the Fast Data team, with a focus on deployment and interaction with other frameworks.
Mesos is a general purpose cluster manager that can scale to tens of thousands of nodes and that can handle mixed data loads and general applications. Mesos is being used in large deployments such as Twitter and AirBnB. Its versatility makes it particularly appealing to organizations that have a mixed workload and want to maximize their cluster utilization (are there any other?). But how exactly does it work when the workload is a long-running Spark Streaming job? Particularly, how does one deal with failures that are bound to happen at this scale, without data loss and service disruptions? In this talk we'll discuss how Spark integrates with Mesos, the differences between client and cluster deployments, and compare and contrast Mesos with Yarn and standalone mode. Then we'll look at deploying a Spark Streaming application that should run 24/7 and show how to deploy, configure and tune a Mesos cluster such that: - the application runs efficiently and uses only the resources it needs - if any of the nodes fails (including the driver), the application recovers without data loss
The "Reactive Manifesto":http://www.reactivemanifesto.org/ describes the 4 characteristics defining a reactive application: responsive, resilient, elastic and message driven. "Reactive Streams":http://www.reactive-streams.org/ is one of the tools used to create reactive application. It is a small API for the JVM defining the interfaces needed to connect a stream of data, with back pressure, between to the parts of a reactive application. And with the addition of back pressure support in Spark Streaming in Spark 1.5, it is simpler than before to use these 3 technologies together. This talk will define what is communication with back pressure, describe its implementation in reactive streams, and show how it can be used to integrate Spark Streaming in reactive applications.
Spark allows you to configure your job to claim and release processing resources as the job needs evolve. This can allow you to run more computation on the same cluster, as workers do not stay idle for too long. In this presentation, I will go through the configuration needed to be able to use Dynamic Resource Allocation, describe the parameters available and how they affect the life cycle of the cluster. We will run a few tests on a real cluster, to see dynamic allocation in action, the effects of the parameters, and cases when using dynamic allocation is not a great idea.