In our previous blog, we explored the methodology recommended by our Professional Services teams for executing complex data warehouse migrations to Databricks. We highlighted the intricacies and challenges that can arise during such projects and emphasized the importance of making pivotal decisions during the migration strategy and design phase. These choices significantly influence both the migration's execution and the architecture of your target data platform. In this post, we dive into these decisions and outline the key data points necessary to make informed, effective choices throughout the migration process.
Once you’ve established your migration strategy and designed a high-level target data architecture, the next decision is determining which workloads to migrate first. Two dominant approaches are:
The ETL-first migration begins by creating a comprehensive Lakehouse Data Model, progressing through the Bronze, Silver, and Gold layers. This approach involves setting up data governance with Unity Catalog, ingesting data with tools like LakeFlow Connect and applying techniques like change data capture (CDC), and converting legacy ETL workflows and stored procedures into Databricks ETL. After rigorous testing, BI reports are repointed, and the AI/ML ecosystem is built on the Databricks Platform.
This strategy mirrors the natural flow of data—producing and onboarding data, then transforming it to meet use case requirements. It allows for a phased rollout of reliable pipelines and optimized Bronze and Silver layers, minimizing inconsistencies and improving the quality of data for BI. This is particularly useful for designing new Lakehouse data models from scratch, implementing Data Mesh, or redesigning data domains.
However, this approach often delays visible results for business users, whose budgets typically fund these initiatives. Migrating BI last means that improvements in performance, insights, and support for predictive analytics and GenAI projects may not materialize for months. Changing business requirements during migration can also create moving goalposts, affecting project momentum and organizational buy-in. The full benefits are only realized once the entire pipeline is completed and key subject areas in the Silver and Gold layers are built.
The BI-first migration prioritizes the consumption layer. This approach gives users early access to the new data platform, showcasing its capabilities while migrating workflows that populate the consumption layer in a phased manner, either by use case or domain.
Two standout features of the Databricks Platform make the BI-first migration approach highly practical and impactful: Lakehouse Federation and LakeFlow Connect. These capabilities streamline the process of modernizing BI systems while ensuring agility, security, and scalability in your migration efforts.
By leveraging Lakehouse Federation and LakeFlow Connect, organizations can implement two distinct patterns for BI-first migration:
Both patterns can be implemented use case by use case in an agile, phased approach. This ensures early business value, aligns with organizational priorities, and sets a blueprint for future projects. Legacy ETL can be migrated later, transitioning data sources to their true origins and retiring legacy EDW systems.
These migration strategies provide a clear path to modernizing your data platform with Databricks. By leveraging tools like Unity Catalog, Lakehouse Federation, and LakeFlow Connect, you can align your architecture and strategy with business goals while enabling advanced analytics capabilities. Whether you prioritize ETL-first or BI-first migration, the key is delivering incremental value and maintaining momentum throughout the transformation journey.