by Anand Venugopal and James Nguyen
Companies rely on their big data and analytics platforms to support innovation and digital transformation strategies. However, many Hadoop users struggle with complexity, unscalable infrastructure, excessive maintenance overhead and overall, unrealized value. We help customers navigate their Hadoop migrations to modern cloud platforms such as Databricks and our partner products and solutions, and in this post, we’ll share what we’ve learned.
Teams migrate from Hadoop for a variety of reasons. It’s often a combination of “push” and “pull”: limitations with existing Hadoop systems are pushing teams to explore Hadoop alternatives, and they’re also being pulled by the new possibilities enabled by modern cloud data architectures. While the architecture requirements vary for different teams, we’ve seen a number of common factors with customers looking to leave Hadoop.
Beyond the technical challenges, we’ve also had customers raise concerns around the long term viability of the technology and the business stability of its vendors. Google, whose seminal 2004 paper on MapReduce underpinned the open-source development of Apache Hadoop, has stopped using MapReduce altogether, as tweeted by Google SVP Urs Hölzle: “... R.I.P. MapReduce. After having served us well since 2003, today we removed the remaining internal codebase for good…” These technology shifts are reflected by the consolidation and purchase activity in the space that Hadoop-focused vendors have seen. This collection of concerns has inspired many companies to re-evaluate their Hadoop investments to see if the technology still meets their needs.
Data platforms built for cloud-native use can deliver significant gains compared to legacy Hadoop environments, which “pull” companies into their cloud adoption. This also includes customers that have tried to use Hadoop in the cloud. Here are some results from a customer that migrated to Databricks from a cloud based Hadoop service.
Hadoop was not designed to run natively in cloud environments, and while cloud-based Hadoop services certainly have improvements compared to their on-premises counterparts, both still lag compared to modern data platforms architected to run natively in the cloud, in terms of both performance and their ability to address more sophisticated data use cases. On-premise Hadoop customers that we’ve worked with have seen improvements even greater than those noted above.
While migrating to a modern cloud data platform can be daunting, the customers we’ve worked with often consider the prospect of staying with their existing solutions to be even worse. The pain of staying where they were was significantly worse than the costs of migrating. We’ve worked hard to streamline the migration process across various dimensions:
Cloud migration decisions are as much about business decisions as they are about technology. They force companies to take a hard look at what their current systems deliver, and evaluate what they need to achieve their goals, whether they’re measured in petabytes of data processed, customer insights uncovered, or business financial targets.
With clarity on these goals comes important technical details, such as mapping technology components from on-premises models to cloud models, evaluating cloud resource utilization and cost-to-performance, and structuring a migration project to minimize errors and risks. If you want to learn more, check out my on-demand webinar to explore cloud migration concepts, data modernization best practices, and migration product demos.