Spark Streaming in a Multitenant World

Download Slides

SAAS businesses deal with the challenges of managing multitenant data, and Spark streaming is no exception. Efficient resource usage and fairness are challenging. Join us to learn about our solutions to these issues. We will discuss optimal resource usage, dynamic multi-tenancy, and stream control through sequencing of an RDD within a tenant.
• Spark streaming in a multitenant world
• Multitenant kafka Dstream
• Why we needed it
• Dynamic tenant provisioning
• Efficient resource sharing
• Kafka and spark integration limitations
• Fairness and Brownout protection across tenants
• Architecture
• Custom Dstream implementation
• Zookeeper as global tenant directory
• Services and clusters
• Fairness
• Pluggable strategies
• Management of multi tenancy
• Upscaling and downscaling
• Migration of tenants from one job to another

About Neelesh Shastry

Neelesh is Architect at Marketo, a leading Marketing automation SAAS platform. At Marketo, he is focused on transitioning Marketo's platform to a scalable, multitenant aware Hadoop based infrastructure. He has 18 years of experience designing and building scalable applications.

About Shaun Klopfenstein

Shaun is CTO at Marketo, a leading Marketing automation SAAS platform, and Lab Director of Marketo’s office in Portland, Oregon, where he is based. At Marketo, he is focused on scale and performance of their marketing platform. Previous to Marketo, Shaun was CTO and a founder of Crowd Factory. Crowd Factory was social a marketing company, which was acquired by Marketo in 2012.