Skip to main content

Data Migration

Businesses rely on data more now than ever before. To ensure the usefulness of your data, you want to use the best possible data platform, which may require a data migration.

If you have questions about data migration and how to achieve it successfully, we have the answers.

What is data migration?

Data migration is the process of moving digital information from one platform to another. This might be a storage system, computing environment, database, data center or other application. It also includes transferring data between different file formats.

The migration process involves selecting, preparing and extracting data before the transfer and, in some cases, cleaning or transforming the data. The data needs to be validated during and after the transfer to ensure it works in the target system.

Now that we have a definition of data migration, let’s explore why and how you should use it.

Here’s more to explore

Your Guide to Lakehouse Migration

Your teams don’t need another data warehouse. They need a modern solution: a lakehouse that unifies data, governance, analytics and AI on one platform.

Get the eBook

The Wisdom of Transitioning to a Data Lakehouse Strategy

When it comes to meeting your emerging and future analytics needs, today’s data architecture strategies — including data lake — are limited. Find out how a data lakehouse approach can help you overcome these limitations.

Get the report

Your Next Data Warehouse?

Run all data workloads on one platform.

Get the eBook

Why do companies perform data migration?

Companies typically perform data migration when they want to replace legacy software and hardware or consolidate their applications into one system. For example, you might choose to simplify your data platform by migrating to the Databricks Data Intelligence Platform from an enterprise data warehouse or legacy data lake.

Here are some common data migration examples:

  • To replace, upgrade and expand existing storage systems
  • To integrate new and existing systems that share a dataset
  • To reorganize the business for a merger or acquisition
  • To consolidate information systems
  • To prepare data for analysis
  • To centralize databases and business data
  • To archive legacy data
  • To reduce storage and operational costs
  • To relocate to a more secure data center
  • To improve data-handling compliance
  • To reduce energy use and environmental footprint

What types of data migration are there?

There are several types of data migration, and companies often undertake more than one, depending on their business needs. Let’s examine the primary data migration approaches.

Storage

Storage migration is when you transfer data from one storage location to another, such as hardware-based to cloud-based storage or hard disk drives to solid-state drives. The new storage device may be in the same building or in a remote data center. This type of migration doesn’t typically involve altering the data’s content or format.

Database

This means moving your database files to a new platform, usually a new database management system (DBMS). You might also transfer data from the current version of your DBMS to an upgraded version. The process often requires data conversion, making it more complex than storage migration.

Application

This is when you transfer an application or program from one computing environment to another, such as from an outdated computing system to a more modern, streamlined network. This can involve both database and storage migrations. It usually happens when the existing software platform changes or when a company chooses to change its software or vendor.

Cloud

Cloud migration means moving data (or apps) from an on-premises location to the cloud or between different cloud environments. You might choose to move all data, applications and services or just some of them. Companies usually do this to reduce costs and centralize their data.

Business process

In this type of migration, you transfer business applications and any data regarding business processes — such as customer, product and operational information — to a new environment. This is typically done to optimize processes and streamline an organization’s management.

The two data migration strategies

What is a data migration strategy? It’s the overarching plan for how you’ll conduct the migration process, and it starts with choosing one of the following approaches.

Big bang

This is where you move all the data to the target environment in one go, within a set time frame. The advantage is that it doesn’t take as long and, therefore, costs less (as long as everything goes according to plan). However, it means that all systems will be shut down and unavailable during the migration. Small companies with small amounts of data may be able to do this by migrating the data over a weekend or public holiday.

Trickle

This is a phased or iterative migration. It involves splitting the migration into subprocesses — each with its own scope and time frame. Data is transferred in small increments, and the old system continues to operate during the process. This means no downtime and less risk, but it’s more complex, time-consuming and costly as you have to ensure users can switch between the two systems.

What are the most common data migration challenges?

While data migration brings many benefits, there are also some challenges that you need to be aware of.

Data corruption or loss

One of the most common data migration risks is data loss. Information can go astray due to automatic truncation, format incompatibility, unknown validation settings and network interference. If you don’t prepare and format the source data properly — and take data dependencies and semantics into account — you may end up with gaps, errors or duplication in your data once it’s in the new system.

Business continuity and unexpected costs

If you take the big bang approach, your systems will be down for a period of time, which obviously impacts your business. If the migration process takes longer than expected, it affects business continuity and budget. And if the migration fails, this can also prove expensive.

Data governance and security

Migration presents a risk to data governance and security, especially if you haven’t thoroughly tested the security permissions of the target system beforehand. Without putting protocols in place — for instance, encrypting data and creating virtual private networks for the transfer process — you may face data migration issues such as exposing or losing sensitive information.

Data and system integrations

Your data stack probably has multiple tools that work together, but ensuring they’ll still integrate seamlessly in a new environment is challenging. If they don’t, you’ll have problems with productivity. You may find that data integration doesn’t work unless you change the structure, attributes or format to fit the new data storage solution.

Planning a successful data migration process

It’s essential that you produce a clear plan for data migration, including setting a budget and assessing the risks. There are five critical data migration steps to include in a successful plan.

1. Discover target systems

First, you need to know where the data is going. You can then assess the destination system’s requirements and specifications and map the structure of your existing data to the new data system. This will allow you to ensure that it aligns with the new structure and format and to set up the target environment, including any necessary security permissions.

2. Assess existing data

Now you need to assess your data for volume, quality and stability. Look out for any potential conflicts or duplications and set data standards to mitigate these. You can clean the data if necessary to ensure that only valid, high-quality data gets migrated. It’s a good idea to use profilers to automate discovery and analyzers to provide a detailed assessment of code complexity and estimate migration project costs.

3. Design a strategy

Select your migration approach and create a roadmap to make it happen. You’ll need to list the systems and tools required, any data transformation processes, migration testing procedures and security protocols. Estimate the costs and set a realistic budget and timeline for completion. Don’t forget to specify how you’ll communicate with stakeholders, and build contingency plans into the strategy.

4. Run a pilot

Before you get started, it’s essential that you test the migration to check that it’ll work in reality. You would typically do this using a mirror of the production environment, but you can also test with smaller sets of data, dummy data or a copy of the live system data. Once you’ve completed the test, you should be able to see if any improvements are needed before the actual migration begins.

5. Execute migration

Now you can go ahead with the migration, following the guidance outlined in your strategy. This is when the extraction, transformation and loading (ETL) processes will also go live. Once you’ve validated the data in its new environment and you’re confident that the migration was successful, you can shut down your old system.

Five data migration best practices

Here are some best practices you can follow to help your data migration run smoothly.

1. Back up data

The only way to ensure that you won’t lose valuable data during the migration is to back it up thoroughly. That way, if something goes wrong and the data is lost or corrupted, you can retrieve and restore it from the backup. It’s best to have several forms of backups in place, including a local backup and an offsite cloud backup that protects the data even if your own servers are compromised.

2. Define business use case and requirements

Before deciding on a migration approach or planning your strategy, make sure you’ve clearly defined the business case for the migration. Why do you need to do this? How will it improve your business? What will you use this data for? Align the project with broader business goals and consider the business requirements of a new system. It’s also important to outline data permissions and document them in your plan.

3. Set up a dedicated team

Data migrations can be complex, so you need specialists to help you manage the project. Ideally, your team will include at least one person with significant experience in data migration.

Once you’ve assembled the right team, assign responsibilities to ensure accountability. If you don’t have the necessary skills internally, it may be worth hiring external consultants to help with the process — you can get help with migration execution from Databricks Professional Services.

4. Stick to the strategy

You’ve spent a lot of time and effort choosing a data migration approach and developing a plan — so make sure you stick to it! Deviation from the process — or failure to implement one of the critical steps, such as implementing data security protocols — may lead to a failed migration. It’s also helpful to document the migration as you go along, as this will highlight important learnings and issues to avoid in any future migrations.

5. Continue testing and validation

As you migrate your data, you must continue to monitor and test it. This will help ensure it’s being appropriately transferred, without quality issues, gaps or duplications. If you’re using the trickle method, you’ll be able to quickly identify any problems, including downtime in the old system. Check that the migration has been executed according to the guidelines, and audit the data in its new home to validate that it’s ready for use.

Confidently perform data migration with Databricks

Data migration is a big undertaking, so you need to feel confident that it’ll work as intended — and not cause business disruption or blow the budget.

With expert help from Databricks, you can ensure your data is transferred successfully and securely. Built on lakehouse architecture, the Databricks Data Intelligence Platform helps customers migrate from legacy data platforms using a phased end-to-end process.

Whether you’re moving data from an application, storage system or the cloud, this process provides a predictable model to help you understand the costs. Databricks offers automated tools, technical guidance, partner solutions and professional services to help you eliminate risk and realize value faster.

If you migrate data into Databricks from an enterprise data warehouse, you’ll be able to run all your data, analytics and AI workloads on a single unified data platform and quickly scale as your business evolves.

Data migration FAQs

What are the two types of data migration?

The two types of data migration are big bang and trickle. The first approach lets you transfer all data across in a single operation, saving time and money, but it involves system downtime and higher risk. The second is a phased approach, in which you move data in smaller chunks over a period of time while the old system continues running in parallel. It’s more complex and expensive, but there’s less risk of failure.

How does data migration work?

Data migration can involve moving data between storage locations, databases, applications or cloud environments. In some cases, business processes are also transferred. Whichever type of migration you use, you’ll start by choosing either the big bang or trickle approach. Then, you’ll set up the target system for the data to move into and assess the quality of the data itself.

The next stage is to draw up a plan for the migration, including a budget and time frame. Before the actual migration takes place, perform a test run to check for any potential issues. You should continue testing and validation throughout the process and check that the data has arrived safely in the target environment before decommissioning the old system.

What is the difference between data migration and data conversion?

The terms are often conflated, but they’re two different things. Data migration means transferring digital information from one place to another. It can include data conversion, but it doesn’t have to.

Data conversion means transforming data into a new format. The converted data may be moved into a new application, but it doesn’t necessarily move to a new data center, system or environment. Conversion is basically an optional element of data migration.

    Back to Glossary