Skip to main content
Company Blog

As the move to the next generation of integration platforms grows momentum, the need to implement a proven and scalable technology is critical. Databricks and Apache Spark, delivered on the major Hadoop distributions, is one such area where the delivery of massively scalable technology with low risk implementation is really key.

At Talend we see a wide array of batch processes, moving to an operational and real time perspective, driven by the consumers of the data. In this vein, the uptake in adoption and the growing community of Apache Spark, the powerful open-source processing engine, has been hard to miss. In a relatively short time, it is now a part of every major Hadoop vendor’s offering, is the most active open source project in the Big Data space, and has been deployed in production across a number of verticals.

Traditional ETL and enterprise integration provides limitations, in both timeliness of extract, speed of load and importantly the supportability of the integration infrastructure itself. Spark delivers a new execution topology, allowing your Big Data platform to deliver much more than just ‘traditional’ Hadoop MapReduce tasks.

The power of integrating Spark and Talend Studio is that Talend users will get to immediately harness the power of Spark, all 80+ operations and sophisticated analytics, directly from the familiar and easy to use Talend Studio interface. With Talend and its unique approach to code generation, the code required to load data and execute a query in Spark is managed for you. The designers simply need to identify the data, Talend can then provide the tools to deliver the data at the right time, in the right format and into the desired Hadoop environment.

Additionally, we’re thrilled to announce – in conjunction with Databricks - that Talend Studio is now officially “Certified on Spark”. With the certification of Talend 5.5, the interoperability of your integration job created with Talend, and its execution on any Certified Spark Distribution is guaranteed. It also means that as a technology user you can benefit from the power of the platforms without having to maintain your own detailed roadmap of component upgrade, update and continued refactoring of jobs – Talend manages this for you. More broadly speaking, Talend is also supportive of the open and transparent nature of the certification process, which is designed to maintain compatibility within the Spark ecosystem while simultaneously encouraging innovation at the platform and application layers.

Beyond the technical integration, Talend Labs worked closely with the R&D team, based in Paris, to create an end-to-end scenario to showcase the key features and functions of the integrated Spark solution. This means that users have a fully-functional starting point, available from Talend and proven with Spark, to get you started on your journey.

Figure 1: Talend Studio and Spark

Talend is always evolving its certification in line with its key partners and the Big Data Ecosystem, and Spark is no exception. With such a fast moving project, significant features and improvements are being rolled out rapidly. Talend is committed to supporting Spark and will be moving fast to certify and ensure compatibility with future Spark releases.