Matt is a Couchbase co-founder and Engineering Director who leads SDK and Connector development at Couchbase. He has a deep software development background with extensive experience scaling Java, Ruby on Rails and AMP web applications. He is a contributor to the memcached project, one of the maintainers of the Java spymemcached client and a core developer on Couchbase. He heads up Couchbase’s efforts to help developers be most effective and continues to lead the team in getting the right bits needed for Node.js, Java, .NET and PHP developers among others.
For an operational database, Spark is like Batman's utility belt: it handles a variety of important tasks from data cleanup and migration to analytics and machine learning that make the operational database much more powerful than it would be on its own. In this talk, we describe the Couchbase Spark Connector that lets you easily integrate Spark with Couchbase Server, an open source distributed NoSQL document database that provides low latency data management for large scale, interactive online applications. We'll start with common use cases for Spark and Couchbase, then cover the basics of creating, persisting and consume RDDs and DataFrames from Couchbase's key/value and SQL interfaces. Advanced topics include: • Best practices and gotchas working with DataFrames, especially related to schema inferences in Spark and the latest Couchbase N1QL describe / infer • How the Couchbase Spark Connector optimizes work with key/value RDDs and Couchbase's key/value interfaces • How and why create Spark Streams from Couchbase Database Change Protocol streams (memory to memory streams that are used to replicate data between nodes and services) • Performance tuning: topology awareness in Couchbase and locality in Spark • SparkSQL, predicate pushdown, and in-memory indexing