Apache Spark – The Next Big Innovation
Once every few years or so, the big data open source community experiences a major innovation that advances the capabilities of data processing frameworks. For many years, MapReduce and the Hadoop open-source platform served as an effective foundation for the distributed processing of large data sets. Then last year, the introduction of YARN provided the resource manager needed to enable interactive workloads, bringing data processing performance to another level. However, as organizations entrust big data platforms to handle more of their critical business information, the volume and variety of data will continue to grow rapidly as will the need for speed to insight and action on that data. As most of the community would agree, we believe that Apache Spark is the next big innovation and platform to help take on the data challenges of tomorrow.
The decision to support Spark was easy – it was largely driven by our customers. Spark’s usefulness as a powerful all-around big data platform for interactive queries and data processing has made it one of the most frequently requested data sources in the last couple months. As a company, we strive to make the data sources that are important to our customers universally accessible. It was also prompted by the strong momentum of the Apache Spark project and the broad uptake in community support. Within the last 8 months, 10 of the Hadoop distributors including Cloudera, Hortonworks and MapR have committed to ship Spark as a part of their distribution as well as accelerate the development of the project. Lastly, Tableau was inspired to integrate with Spark because it is a technology that was architected intelligently from the very beginning as demonstrated by some of the early performance results. In addition, our co-founders like to say that we are just getting started at Tableau, and we believe the same is true of Spark.
Tableau Software is “Certified on Spark”
Today, we are delighted to announce that Tableau Software is now “Certified on Spark.” Tableau sought qualification in the program so that our customers feel confident that the integration of the technologies works seamlessly and delightfully. We also want to help maintain the compatibility of Spark SQL across different distributions as it helps to facilitate a vibrant open source community – one of collaboration and integration. Tableau is committed to supporting Spark and ensuring compatibility with future releases.
In conjunction with our certification and our mission to “help people see and understand their data,” Tableau is launching a new native Spark SQL connector for both Windows and Mac (currently in beta). We are excited to work with Databricks to bring the performance and versatility of the Spark data processing engine to the masses through visual analysis.
Tableau + Spark = Better Together
Tableau’s integration with Spark brings tremendous value to the Spark community – users can visually analyze their data without writing a single line of Spark SQL code. That’s a big deal because creating a visual interface to your data expands the Spark technology beyond data scientists and data engineers to all business users. The Spark connector takes advantage of Tableau’s flexible connection architecture that gives customers the option to connect live and issue interactive queries, or use Tableau’s fast in-memory database engine. Tableau also provides users the capability to blend Spark data with data from any of our other 40+ direct connectors, empowering users to leverage their existing data assets wherever they are.
Now to see Tableau and Spark SQL in action, we have created a short video demonstrating how users can connect to a Spark cluster and interact with data in Tableau.
To Learn More:
Read: To read more about Tableau’s integration with Spark SQL, please check out our post on the Tableau blog.
Join the beta: In order to use Tableau directly against Spark, you’ll need to be a part of the beta program. If you’re interested in joining, please send us an email.