BI-style analytics on Spark (without Shark) using SparkSQL & SchemaRDD

Download Slides

“Spark allows for extremely fast analytics and joins across huge amounts of data, and the SparkSQL and SchemaRDD extensions in Spark 1.0 provide for new, easier interoperability with existing Hadoop-based data resources and schematized data.

We will share our work at Zoomdata implementing real-time and historical BI-style slice and dice analytics and dashboarding directly on top of Spark (without Shark, due to performance issues that we will discuss). We will highlight our early lessons learned related to data scalability, loading, context sharing, real-time RDD appending/coalescing, and concurrent query handling.

Also we will discuss the new SparkSQL and SchemaRDD features available in Spark 1.0 that allow direct access to Parquet and other schematized data, and discuss partitioning strategies to allow for in-application partition elimination to speed large analytical queries.”

« back
About Justin Langseth

Justin Langseth is the Founder & CEO of Zoomdata. He previously founded, Claraview, Clarabridge, and Augaroo. A graduate of MIT, Justin is an expert in big data, business intelligence, text analytics, sentiment analytics, and real-time data processing and holds 14 technology patents.

About Farzad Aref

Farzad is the head of Product at Zoomdata and one of its founding employees. He is responsible for Zoomdata’s Roadmap, UX, and Quality. He has over 12 years of experience in building and delivering complex Analytics solutions to Fortune 500 companies through his tenures at Clarabridge, IBM, Deloitte, and now Zoomdata.