How Informatica Data Engineering Goes Hadoop-less with Databricks

Published: October 10, 2019

Back in May, we announced our partnership with Informatica to build out a rich set of integrations between our two platforms.

It’s been exciting work for the team because of what we can do for joint customers that combine our Managed Delta Lake with Informatica’s Big Data Management and Enterprise Data Catalog. The vision led us to use the term “Intelligent Data Pipelines” that we outlined in our first blog post. Customers can have a solution that enables data engineers to quickly ingest high volumes of data from multiple hybrid sources into the cloud, stream that into an optimized data lake, and ensure that data is properly governed, making it accurate and ready for downstream analytics and ML.

Migrating Big Data Workloads from On-premises Hadoop to the Cloud

Most recently, we focused specifically on organizations looking to migrate their big data workloads from on premises Hadoop to the cloud. Those data teams still spend a lot of time on data preparation and ingestion vs. the higher-value advanced analytics and machine learning. Core Hadoop services such as YARN and HDFS are complex to manage that results in high TCO. Users have to manually configure and optimize clusters for scale-up and scale-down, which is time consuming and directly impacts the reliability and performance of Hadoop-based data lakes.

Key Questions Concerning a Hadoop to Cloud Migration

Does migrating from Hadoop to the cloud release the operational burden of managing shared clusters? How do you manage compute and storage when migrating to the cloud? What are the key benefits of migrating to a cloud-native platform like Databricks? How does Databricks compare to YARN and HDFS?

Those questions are the exact topic of this blog co-authored by Informatica and Databricks. It is a detailed review of the architecture changes in migrating from Hadoop to Databricks, and for added measure it covers best practices of Hadoop migration to fully leverage the Databricks and Informatica data engineering integration. Check it out!

What's next?

December 11, 2024/15 min read

Introducing Databricks Generative AI Partner Accelerators and RAG Proof of Concepts

December 11, 2024/4 min read

Migrating Big Data Workloads from On-premises Hadoop to the Cloud

Key Questions Concerning a Hadoop to Cloud Migration

Never miss a Databricks post

Sign up

What's next?

Introducing Databricks Generative AI Partner Accelerators and RAG Proof of Concepts

Innovators Unveiled: Announcing the Databricks Generative AI Startup Challenge Winners!