Ethan Jackson - Databricks

Ethan Jackson

PhD Student, UC Berkeley

Ethan J. Jackson is a Computer Science PhD student at UC Berkeley advised by Scott Shenker. His primary is focus on Distributed Systems and Computer Networks. He is also a major contributor to the Open vSwitch project, focusing on Software-Defined Networking, Network Virtualization, and high performance software switching.



Automated Spark Deployment With Declarative InfrastructureSummit 2016

With the rise of the cloud, distributed systems have grown in both scale and complexity. To implement the functionality users have come to expect, applications now require a multitude of distributed systems including serving infrastructure, key value stores, in-memory caching, stream processing, and batch processing systems. These systems have complex dependency and trust relationships, subtle requirements with respect to network, CPU, memory, and disk, as well as differing expectations in the event of failure. These systems are so difficult to manage because of the complex configuration policy required in distributed environments. Unlike centralized applications packaged as simple binaries, modern systems need extensive configuration around network, security, and placement to operate effectively (or at all). Worse, this policy is traditionally baked into systems implicitly through disparate mechanisms such as firewall configurations, network topology, and server placement. Modern clouds improve the situation slightly with virtual machines, virtual network topology, and security groups. However, they don't alleviate it, still requiring carefully executed API calls to achieve a particular policy goal. We propose an alternative approach: rather than manipulating infrastructure so that it conforms to implicit policy, we make policy explicit through a declarative language. Thus, infrastructure is made to conform to the policy specification, not the other way around. These specifications can be actualized in a variety of environments including diverse cloud providers, physical data-centers, and developer laptops. The policy outlines exactly what an application requires, allowing the environment to be tailored to meet its needs. We present a new system designed to automate deployment and management of distributed systems like Spark. We show how Spark can be deployed on any cloud provider or physical infrastructure, instantly, with no understanding of it's internal architecture. We will describe the architecture of the system, its typical usage, and present a brief demo.