After completing his PhD work at University of California, San Francisco, Russell joined DataStax to fulfill his deep longing to work with distributed systems. Since then, he has worked with Cassandra, Spark, Tinkerpop, Hadoop, as well as a myriad of other big data technologies. His favorite hobby is finding new ways of bringing these technologies together so that everyone can benefit from the new information age.
Data Source V2 has arrived for the Spark Cassandra Connector, but what does this mean for you? Speed, Flexibility and Usability improvements abound and we'll walk you through some of the biggest highlights and how you can take advantage of them today. Learn about such highlights as: Spark's ability to understand Cassandra's internal clustering, previously only available through the RDD api; Manipulating the Cassandra catalogue directly from Spark; and much more! Have Cassandra and Spark? Then this talk is for you!
Learn from someone who has made just about every basic Apache Spark mistake possible so you don't have to! We'll go over some of the most common things that users do that end up doing that cause unnecessary pain and actually explain how to avoid them. Confused about serialization? Not sure what is meant by use a singleton to share connections? Together we will walk through concrete examples of how to handle these situation. Learn how to: do all your work remotely, not break your catalyst optimizations, use all your resources, and much more! Together lets learn how to make our Spark Applications better! Session hashtag: #DevSAIS13