After earning his Ph.D in bioinformatics from UCSF, Russell Spitzer took his love of big data to DataStax. There he has worked on all aspects of integrating Cassandra with other Apache technologies like Spark, Hadoop and Solr. Now his main focus on the integration of Cassandra with Apache Spark via the Spark Cassandra Connector.
Data Source V2 has arrived for the Spark Cassandra Connector, but what does this mean for you? Speed, Flexibility and Usability improvements abound and we'll walk you through some of the biggest highlights and how you can take advantage of them today. Learn about such highlights as: Spark's ability to understand Cassandra's internal clustering, previously only available through the RDD api; Manipulating the Cassandra catalogue directly from Spark; and much more! Have Cassandra and Spark? Then this talk is for you!
Learn from someone who has made just about every basic Apache Spark mistake possible so you don't have to! We'll go over some of the most common things that users do that end up doing that cause unnecessary pain and actually explain how to avoid them. Confused about serialization? Not sure what is meant by use a singleton to share connections? Together we will walk through concrete examples of how to handle these situation. Learn how to: do all your work remotely, not break your catalyst optimizations, use all your resources, and much more! Together lets learn how to make our Spark Applications better! Session hashtag: #DevSAIS13