Spark And Cassandra: 2 Fast, 2 Furious - Databricks

Spark And Cassandra: 2 Fast, 2 Furious

Download Slides

Not since peanut butter and jelly has there been such an epic combo. Spark is the world’s foremost distributed analytics platform, delivering in-memory analytics with a speed and ease of use unheard of in Hadoop. Cassandra is the lighting fast distributed database powering such IT giants as Outbrain and Netflix. Did you know you can combine them with free open source technology? Integrate them easily with the Datastax Open Source Spark Cassandra Connector. This feature-rich integration allows Spark to fully take advantage of Cassandra as well as use Cassandra-specific Spark optimizations. Increase the efficiency of your application with the insider knowledge delivered by one of the main authors of the connector. In this session we’ll go over some of the most common use cases of the Spark Cassandra Connector and highlight how to avoid the most common pitfalls. We will walk through: Spark Cassandra Basic Features: How the Spark Cassandra Connector reads and writes data to C* How Spark Dataframes are integrated with Cassandra How to use Cassandra data locality to your advantage How Cassandra predicate pushdown works in SparkSQL Building and Tuning Spark Streaming Applications with Cassandra: Tuning standard RDD operations for maximum throughput Using the internal C* driver pool for flexibility and efficient access Understanding how receivers work and interact with Cassandra locality Use Spark to Perform Common Cassandra Maintenance: Migrate data from RDBMS sources directly into Cassandra Using Spark to migrate information between different Cassandra Clusters Bulk loading Cassandra using Spark and DataFrames Rebuilding Cassandra tables with different indexes using Spark