Back in May, we announced our partnership with Informatica to build out a rich set of integrations between our two platforms.
It’s been exciting work for the team because of what we can do for joint customers that combine our Managed Delta Lake with Informatica’s Big Data Management and Enterprise Data Catalog. The vision led us to use the term “Intelligent Data Pipelines” that we outlined in our first blog post. Customers can have a solution that enables data engineers to quickly ingest high volumes of data from multiple hybrid sources into the cloud, stream that into an optimized data lake, and ensure that data is properly governed, making it accurate and ready for downstream analytics and ML.
Migrating Big Data Workloads from On-premises Hadoop to the Cloud
Most recently, we focused specifically on organizations looking to migrate their big data workloads from on premises Hadoop to the cloud. Those data teams still spend a lot of time on data preparation and ingestion vs. the higher-value advanced analytics and machine learning. Core Hadoop services such as YARN and HDFS are complex to manage that results in high TCO. Users have to manually configure and optimize clusters for scale-up and scale-down, which is time consuming and directly impacts the reliability and performance of Hadoop-based data lakes.
Key Questions Concerning a Hadoop to Cloud Migration
Does migrating from Hadoop to the cloud release the operational burden of managing shared clusters? How do you manage compute and storage when migrating to the cloud? What are the key benefits of migrating to a cloud-native platform like Databricks? How does Databricks compare to YARN and HDFS?
Those questions are the exact topic of this blog co-authored by Informatica and Databricks. It is a detailed review of the architecture changes in migrating from Hadoop to Databricks, and for added measure it covers best practices of Hadoop migration to fully leverage the Databricks and Informatica data engineering integration. Check it out!