Today, Datastax and Databricks announced a partnership in which Apache Spark becomes an integral part of the Datastax offering, tightly integrated with Cassandra. We’re very excited to be embarking on this journey with Datastax for a multitude of reasons:
Integrating operational systems with analytics
One of the use cases that we’ve increasingly been asked about by Spark users is the ability to create a closed loop system: perform advanced analytics directly on operational data that is then fed back into the operational system to drive necessary adaptation. The tight integration of Cassandra and Spark will enable users to achieve this goal by leveraging Cassandra as the high-performance transactional database that powers online applications and Spark as a next generation processing engine that can deliver deeper insights, faster while seamlessly moving between the two.
Spark beyond Hadoop
The most talked about usage model for Spark to date has been within Hadoop deployments – Spark can operate directly over data in HDFS (without needing to move the data first) and natively supports YARN and Mesos, popular resource managers for Hadoop. However, Spark’s applicability is much broader: it is designed to be a general Big Data processing engine, and the Spark / Cassandra integration is a prime example of this – native processing without requiring a batch movement of data to Hadoop first (or even a Hadoop cluster). Furthermore, the recently announced SparkSQL will help optimize this integration further – not only will Spark be able to directly access data stored in Cassandra, but it will also be able to execute selected parts of the query in Cassandra itself. It can then pull the resulting data set into Spark for performing machine learning and other advanced analytics.
Innovation in the Open
This partnership also brings together two groups with very strong open source commitments and heritage. Databricks is focused on keeping Apache Spark 100% open source and Datastax has invested numerous resources in growing the Apache Cassandra community, so it should be no surprise that a key tenet of this partnership is delivering joint innovation back to the open source community to help drive greater integration between the Spark and Cassandra communities over time. Look for significant contributions as we move forward on this journey.
Please join us at the upcoming Spark Summit to hear more about the value of using Spark and Cassandra together and additional innovations on the horizon in a keynote talk by Martin Van Ryswyk, Datastax’s VP of Engineering.