Skip to main content

The Executive’s Guide to Data, Analytics and AI Transformation, Part 1: A blueprint for modernization

Chris D’Agostino
Mimi Park
Usman Zubair
Share this post

Now more than ever, organizations need to adapt quickly to market opportunities and emerging risks so that they are better positioned to adapt, innovate and thrive in the modern and dynamic economy. Business leaders see digital transformation as an opportunity to build a new technology foundation from which to run their business, while lowering costs and increasing business value.

However, conflicting organizational priorities, legacy-based information systems, and disparate data environments make that difficult to achieve. To that end, data, analytics and AI executives need to develop and execute a comprehensive strategy that enables them to easily deploy and transition to a modern data architecture. This blog series will share key insights and tactics that you should consider as you embark on your own journey.

To begin, we propose six tactics that should serve as the foundation of every data and platform modernization initiative.
 

  1. Secure executive buy-in and support
    Large organizations are difficult to change — but it's not impossible. In order to be successful, you need to have unwavering buy-in and support from the highest levels of management — including the CEO and board of directors. With this support, you have the leverage you need to develop the strategy, decide on an architecture and implement a solution that can truly change the way your business is run. Without it, you have a very expensive science project that has little hope of succeeding. The added work to support the initiative must be offset by a clear articulation of the resulting benefits — not only for the business but for the personnel within it. The transformation should result in a positive change to how people do their jobs on a daily basis.
     
  2. Go "all in" on multi-cloud
    The COVID-19 pandemic has caused rapid adoption of cloud-based solutions for digital business acceleration—and organizations are now using this time to reevaluate their use of on-premises and cloud-based services. The cloud vendors provide many benefits to organizations, including Infrastructure-as-a-Service (IaaS), Platform-as-a-Service (PaaS) and -SaaS solutions. These benefits, especially when combined with the use of open source software, increase the speed at which organizations can use the latest technologies while also reducing their capex in these budget-conscious times.

    At the same time, companies are well aware of vendor lock-in and want to abstract their applications so they can be moved across clouds if there is a compelling business reason. Enter the multi-cloud — it provides the organization more sovereignty over their data, flexibility to run workloads anywhere, ease of integration when acquiring businesses that run on different cloud providers and simplified compliance with emerging regulations that may require companies to diversify their data footprint to reduce risk to the consumer's personal information. As a result, data portability and the ability to run workloads on different cloud providers are becoming increasingly important.
     
  3. Modernize business applications
    As organizations begin to accelerate the adoption of the cloud, they should avoid a simple "lift and shift" approach. The majority of on-premises applications are not built with the cloud in mind. They usually differ in the way that they handle security, resiliency, scalability and failover. Their application designs often store data in ways that make it difficult to adhere to regulatory requirements such as the GDPR and CCPA standards. Finally, the features and capabilities of the application may be monolithic in nature and, therefore, tightly coupled. In contrast, modern cloud applications are modular in design and use RESTful web services and APIs to easily provide access to an application's functionality.

    As a first step, organizations should inventory their business-critical applications, prioritize them based on business impact and modernize them in a consistent manner for cloud-based deployments. It is these applications that generate and store a significant amount of the data consumed within an organization. Using a consistent approach to cloud-based application design makes it easier to extract data when it is needed.
     
  4. Land all data in a data lake
    Data-, analytics- and AI-driven organizations need to be able to store and process all their data, regardless of size, shape or speed as quickly as possible. Data is often siloed in various business applications and hard and/or slow to access. Likewise, organizations can no longer afford to wait for data to be loaded into data stores, like a data warehouse, with predefined schemas that only support very specific questions about that data. To further complicate matters, how do you handle new data sets that cannot easily be manipulated to fit into your predefined data stores? How do you find new insights as quickly as possible?
     
  5. Minimize time in the "seam"
    As you begin your data transformation, it is important to know that the longer it takes, the more risk and cost you introduce into your organization. The stepwise approach to migrating your existing data ecosystem to a modern data stack will require you to operate in two environments simultaneously, the old and the new, for some period of time. This will have a series of momentary adverse effects on your business:
    • increase your operational costs substantially, as you will run two sets of infrastructure
    • increase your data governance risk, since you will have multiple copies of your data sitting in two very different ecosystems
    • increases the cyberattack footprint and vectors, as the platforms will likely have very different security models and cyber defenses
    • cause strain on your IT workforce due to the challenges of running multiple environments
    • require precise communications to ensure that your business partners know which environment to use and for what data workloads
    To mitigate some of the strain on the IT workforce, some organizations hire staff augmentation firms to "keep the lights on" for the legacy systems while the new systems are being implemented and rolled out. It's important to remember this is a critical but short-lived experience for business continuity.
     
  6. Shut down legacy platforms
    In keeping with the goal of minimizing time in the seam, the project plan and timeline must include the steps and sequencing for shutting down legacy platforms. For example, many companies migrate their on premises Hadoop data lake to a cloud-based object store.

    Taking an on premises Hadoop system as an example, the approach for shutting down legacy systems is generally as follows:
    1. Identify the stakeholders (business and IT) who own the jobs that run in the Hadoop environment.
    2. Declare that no changes can be made to the Hadoop environment — with the exception of emergency fixes or absolutely critical new business use cases.
    3. Inventory the source systems and respective data flow paths that feed data into the Hadoop environment.
    4. Inventory the data that is currently stored in the Hadoop environment and understand the rate of change.
    5. Inventory the software processes (aka jobs) that handle the data and understand the consumers and output of the jobs.
    6. Prioritize the jobs to move to the modern data architecture. One by one, port the data input, job execution, job output and downstream consumers to the new architecture.
    7. Run legacy and new jobs in parallel for a set amount of time — in order to validate that things are working smoothly.
    8. Shut down the legacy data feeds, job execution and consumption. Wait. Look for smoke.
    9. Rinse and repeat — until all jobs are migrated.
    10. Shut down the Hadoop cluster(s).

To facilitate a multi-cloud strategy, you can follow the same process to migrate off cloud native big data systems such as EMR and Dataproc. It is important, however, to make sure that the organization has the fortitude to hold the line when there is pressure to make changes to the legacy environments or extend their lifespan. Setting firm dates for when these legacy systems will be retired will serve as a forcing function for teams when they onboard to the new modern data architecture. Having the executive buy-in plays a crucial role in seeing the shutdown of legacy platforms through.

Conclusion
Whether you are just getting started or already in the process of a multi-year strategy, this template can be applied at any time and repeated as you tackle your organization's vast portfolio one project at a time. Going "all-in" on the multi-cloud, consolidating all data into the data lake and securing executive sponsorship will provide a solid foundation for your data modernization strategy through pivots in an unpredictable and dynamic business environment.

In fact, thousands of customers have already revolutionized their data and AI capabilities by adopting the latest innovation in big (and small) data: the lakehouse–unifying their data warehousing, data lake architecture and use cases on Databricks. Managing distinct and redundant data environments, its respective security and governance controls and paying for each of them are the stories of yesterday for our customers. To learn more, please contact us.

Want to learn more? Check out our eBook Transform and Scale Your Organization With Data and AI.

The C-suite dialogue on generative AI
Industry experts reveal winning strategies that minimize risk.

Try Databricks for free

Related posts

Five Challenges CIOs Need to Overcome in the New Year

January 31, 2023 by Josh Howard in
As IT leaders kick off the new year during one of the most tumultuous times in recent history, CIOs are being forced to...

6 Guiding Principles to Build an Effective Data Lakehouse

In this blog post, we will discuss some guiding principles to help you build a highly-effective and efficient data lakehouse that delivers on...

5 Key Steps to Successfully Migrate From Hadoop to the Lakehouse Architecture

August 6, 2021 by Harsh Narula in
The decision to migrate from Hadoop to a modern cloud-based architecture like the lakehouse architecture is a business decision, not a technology decision...
See all Data Strategy posts