Farzad is the head of Product at Zoomdata and one of its founding employees. He is responsible for Zoomdata’s Roadmap, UX, and Quality. He has over 12 years of experience in building and delivering complex Analytics solutions to Fortune 500 companies through his tenures at Clarabridge, IBM, Deloitte, and now Zoomdata.
June 30, 2014 05:00 PM PT
“Spark allows for extremely fast analytics and joins across huge amounts of data, and the SparkSQL and SchemaRDD extensions in Spark 1.0 provide for new, easier interoperability with existing Hadoop-based data resources and schematized data.
We will share our work at Zoomdata implementing real-time and historical BI-style slice and dice analytics and dashboarding directly on top of Spark (without Shark, due to performance issues that we will discuss). We will highlight our early lessons learned related to data scalability, loading, context sharing, real-time RDD appending/coalescing, and concurrent query handling.
Also we will discuss the new SparkSQL and SchemaRDD features available in Spark 1.0 that allow direct access to Parquet and other schematized data, and discuss partitioning strategies to allow for in-application partition elimination to speed large analytical queries.”
June 30, 2014 05:00 PM PT
The Application Spotlight will highlight selected “Certified on Spark” applications that leverage Spark to help their users derive greater value from their data. For each application their will be a brief demo of key functionality followed by a fireside chat discussing the developers experience with Spark, lessons learned, and wish list for the future.
March 17, 2015 05:00 PM PT
As one of Spark’s early adopters, we will share lessons learned from our work at Zoomdata while making Spark a key architecture component of our application and bringing production level comfort to our user community. This session will focus on specific challenges we faced, the alternatives we evaluated, and ultimately what decisions we made which includes … – Building a decoupled Spark proxy to run as a separate process (YARN enabled) to allow a sharable spark context with sharable RDDs across load balanced Zoomdata servers which enables high user concurrency and fault tolerance. – Simulating various stress loads in Ganglia to measure and optimize for high user concurrency on large amounts of datasets while writing to and reading from Spark in parallel. – Properly tuning our spark application to handle Garbage collection, Serialization/De-serialization of RDDs, using Columnar Storage RDDS rather than Object RDDs, and eventual use of partitioned RDDs. – Lifecycle management when having to deal with large numbers of RDDs – Overcoming binary incompatibilities when supporting Spark on top of different Hadoop distributions and versions.