Succinct Spark: Fast Interactive Queries on Compressed RDDs - Databricks

Succinct Spark: Fast Interactive Queries on Compressed RDDs

Download Slides

Search is becoming an increasingly powerful primitive in big data analytics. Spark currently supports search by full-RDD scans, which can be slow for many applications. I will talk about Succinct Spark, a recently released Spark package, that enables search, random access, range queries and even Regular Expression (RegEx) matches on Spark without full-RDD scans. Succinct Spark has the additional benefit of having a very low storage overhead. The main technique used is to store a compressed representation of RDDs, and execute search and regular expression queries directly on this compressed representation. Succinct Spark allows users to use Spark as a document store (with search on documents) similar to ElasticSearch, a key value interface (with search on values) similar to HyperDex, and an experimental DataFrame interface (with search along columns in a table). I will discuss how Succinct integrates all these powerful data models using a simple unified flat file interface. I will also discuss many use cases that have come up since the release.

« back