Skip to main content

At Hortonworks we are very excited by the emerging use cases and potential of Apache Spark and Apache Hadoop. Spark is representative of just one of the shifts underway in the data landscape towards memory optimized processing, that when combined with Hadoop, can enable a new generation of applications.

We are excited to announce that Hortonworks and Databricks have extended our partnership focus from providing a Certified Spark Distribution to include a shared vision to further Apache Spark as an enterprise ready component of the Hortonworks Data Platform. We are closely aligned on a strategy and vision of bringing 100% open source software to market for the enterprise and supporting the customer use cases.

Having two leaders in our respective communities come together makes sense for the community and for customers. Together with Databricks’ expertise in Apache Spark combined with Hortonworks expertise in building a complete enterprise Hadoop data platform, we are better able to engineer solutions that meet the enterprise requirements for big data processing.

From the Hortonworks perspective, our view has been very consistent: enabling a wide range of batch, interactive, real-time data processing applications to run simultaneously within a single enterprise Hadoop data platform against shared datasets. We believe applications leveraging Spark can benefit greatly from enabling it as a natively integrated engine within the Hortonworks Data Platform: integrated with YARN and supported by a common set of services for Security, Operations and Governance.

In June of 2014 we endorsed the standard set of open APIs for application development for Spark on the Hortonworks Data Platform making it a Certified Spark Distribution. This allows developers to build applications on this new engine while enabling operators to leverage a common data platform (Hadoop).

We are extending our partnership to include a commitment to invest in the following areas with Databricks:

  • Engineering: Spark optimized on YARN enables Spark-based applications to share the resources and operate along side other workloads, whether batch or streaming. Additionally integrating Spark with the Security, Operations, and Governance components of the Hortonworks Data Platform/Apache Hadoop provides fully tested and enterprise-ready modern data platform.
  • Customers: Hortonworks and Databricks will jointly collaborate to support the usage of Spark and the Hortonworks Data Platform for our customers.
  • Open Source Foundation: We share a common vision for working with the open source community and delivering innovation, which will land into the upstream projects and is then delivered as enterprise ready software.

We look forward to working with the Databricks team to further enable Spark on Hadoop.

Try Databricks for free

Related posts

Top Considerations When Migrating Off of Hadoop

July 22, 2021 by Manveer Sahota and Ron Guerrero in
Apache Hadoop was created more than 15 years ago as an open source, distributed storage and compute platform designed for large data sets...

Databricks and Cloudera Partner to Support Apache Spark

October 28, 2013 by Ion Stoica in
Today, Cloudera announced that it will distribute and support Apache Spark. We are very excited about this announcement, and what it brings to...

Apache Spark and Hadoop: Working Together

January 21, 2014 by Ion Stoica in
We are often asked how does Apache Spark fits in the Hadoop ecosystem , and how one can run Spark in a existing...
See all Company Blog posts