October 31, 2014

Hortonworks: A shared vision for Apache Spark on Hadoop

This post is guest authored by our friends at Hortonworks announcing a broader partnership with Databricks around Apache Spark.

At Hortonworks we are very excited by the emerging use cases and potential of Apache Spark and Apache Hadoop. Spark is representative of just one of the shifts underway in the data landscape towards memory optimized processing, that when combined with Hadoop, can enable a new generation of applications.

We are excited to announce that Hortonworks and Databricks have extended our partnership focus from providing a Certified Spark Distribution to include a shared vision to further Apache Spark as an enterprise ready component of the Hortonworks Data Platform. We are closely aligned on a strategy and vision of bringing 100% open source software to market for the enterprise and supporting the customer use cases.

Having two leaders in our respective communities come together makes sense for the community and for customers. Together with Databricks’ expertise in Apache Spark combined with Hortonworks expertise in building a complete enterprise Hadoop data platform, we are better able to engineer solutions that meet the enterprise requirements for big data processing.

From the Hortonworks perspective, our view has been very consistent: enabling a wide range of batch, interactive, real-time data processing applications to run simultaneously within a single enterprise Hadoop data platform against shared datasets. We believe applications leveraging Spark can benefit greatly from enabling it as a natively integrated engine within the Hortonworks Data Platform: integrated with YARN and supported by a common set of services for Security, Operations and Governance.

In June of 2014 we endorsed the standard set of open APIs for application development for Spark on the Hortonworks Data Platform making it a Certified Spark Distribution. This allows developers to build applications on this new engine while enabling operators to leverage a common data platform (Hadoop).

We are extending our partnership to include a commitment to invest in the following areas with Databricks:

Engineering: Spark optimized on YARN enables Spark-based applications to share the resources and operate along side other workloads, whether batch or streaming. Additionally integrating Spark with the Security, Operations, and Governance components of the Hortonworks Data Platform/Apache Hadoop provides fully tested and enterprise-ready modern data platform.
Customers: Hortonworks and Databricks will jointly collaborate to support the usage of Spark and the Hortonworks Data Platform for our customers.
Open Source Foundation: We share a common vision for working with the open source community and delivering innovation, which will land into the upstream projects and is then delivered as enterprise ready software.

We look forward to working with the Databricks team to further enable Spark on Hadoop.

Get the latest posts in your inbox

Subscribe to our blog and get the latest posts delivered to your inbox.

View all blogs

Get the latest posts in your inbox

Sign up