Effective data governance is crucial for organizations to harness their data assets. Learn how bp uses Databricks Unity Catalog to enhance their data governance framework, highlighting challenges, strategies, and benefits.
The benefits and impact that Unity Catalog has already delivered and those that it unlocks by enabling the next generation of features in the Databricks Platform cannot be understated.— Liam Donohoe, Principal Architect, bp
In today's data-driven world, effective data governance is crucial for organizations to harness the full potential of their data assets. bp, a global leader in the energy sector, has embarked on a transformative journey by implementing Databricks Unity Catalog to streamline and enhance their data governance framework. This program of work was successfully completed in close partnership between bp and Databricks. This blog post delves into bp’s experience, highlighting the challenges, strategies, and benefits of this significant migration.
Bp faced several challenges typical of large enterprises dealing with vast amounts of data spread across various systems and platforms. These challenges included:
To address these challenges, bp turned to Databricks Unity Catalog, a unified governance solution that promised to provide a comprehensive framework for managing their data assets.
Unity Catalog, Databricks' unified governance solution, provided bp with a comprehensive framework to address these challenges. Key features of Unity Catalog that benefited bp include:
The scale of bp’s Unity Catalog migration was impressive. The company successfully migrated over 270 Databricks workspaces from two separate data hub platforms and 15 standalone instances. These workspaces served over 10,000 people in key business entities, including Finance ERP, Customer and Products, Production and Operations, and Enterprise.
The migration was executed in a phased approach; each quarter, bp was able to migrate an order of magnitude more workspaces than the quarter before
This structured approach allowed bp to learn and adapt throughout the process, resulting in an accelerated pace of migration in the later stages.
bp's implementation strategy for Unity Catalog was methodical and strategic, tight engagement with Databricks, partners, and internal teams was key to addressing challenges quickly and effectively.
Our Unity Catalog Migration project, involving over 200+ Databricks workspaces, has achieved outstanding results in cloud cost savings, governance, and operational efficiency. By consolidating workspaces under Unity Catalog, we have significantly reduced operational costs, centralized governance, enhanced security, and streamlined data sharing and compliance. The close collaboration with Databricks engineers ensured this migration exceeded our expectations in terms of speed and efficiency. This milestone not only drives cost efficiencies but also advances secure and efficient data management at bp, enabling greater value for our business stakeholders.— Srinivas Chandolu, Staff Platform Engineer, bp
The implementation of Unity Catalog at bp has significantly improved data security and compliance, enhanced data visibility, and increased operational efficiency across the organization.
The implementation of Unity Catalog at bp yielded several significant technical benefits:
Access to the latest Databricks Features: Unity Catalog unlocked access to cutting-edge features like Databricks Genie, Delta Sharing, and Databricks Assistant, providing substantial benefits to bp's data engineers.
Addressing Technical Debt: The migration allowed bp to upgrade from runtime version 9.1, which had reached the end of its serviceable life, resolving issues with their custom entitlement framework.
Improved Data Visibility: Unity Catalog significantly enhanced data visibility across the organization. As a result, bp can now avoid duplication and silos.
Improved Data Security and Compliance: Unity Catalog’s centralized governance and robust security features have streamlined compliance processes and ensured that only authorized personnel have access to sensitive data.
The Unity Catalog migration is expected to deliver substantial financial benefits to bp:
Cost Savings: The Databricks consolidation effort, included as part of the Unity Catalog migration, has the potential to save nearly $1 million annually.
Operational Efficiency: By enabling access to cost and runtime job data directly from system tables, bp eliminated the need for explicit management of Overwatch jobs, saving approximately $4,000 weekly.
Optimized Runtime Costs: Access to the latest optimized Unity Catalog-compatible Databricks runtime has the potential to significantly reduce runtime costs.
The migration to Unity Catalog brought about several operational improvements:
Dedicated Service Line: bp can now operate Unity Catalog as a dedicated Service Line, enhancing their ability to manage operations and support.
Enhanced Data Access Framework: bp deployed a Platform Experience Portal with an improved Data Access framework, which improved security, compliance, and efficiency in data management.
Streamlined Access Management: The new framework facilitates updates and reduces administrative overhead for over 10,000 users, promoting collaboration among teams.
Unity Catalog has positioned bp to leverage several advanced features:
Delta Sharing: a full Delta-Parquet conversion was part of the migration, unlocking Delta sharing for non-Databricks consumers.
SQL Serverless Warehouses: These have been extensively implemented, enhancing bp's data processing capabilities.
AI Features: bp is now well-placed to use advanced AI features like model serving, fine-tuning, and vector databases.
GenAI Augmentation: bp can now use auto-generated comments, AI assistants and Genie to increase engineering productivity and business user accessibility.
Enhanced Telemetry: Access to system tables and granular telemetry provides bp with deeper insights into their data operations.
bp's journey with Unity Catalog exemplifies how a unified data governance solution can transform data management practices in large enterprises. bp plans to continue building on this by further consolidating workspaces and decommissioning the legacy hive metastore. By addressing key challenges and leveraging the robust features of Unity Catalog, bp has set a new standard for data governance in the energy sector. This migration not only solved immediate data governance challenges but also positioned bp to leverage advanced data and AI capabilities in the future, cementing its position as a data-driven leader in the energy industry.
This blog post was jointly authored by Liam Donohoe (bp), Srinivas Chandolu (bp), Krishanu Mrinal Roy (bp), Marjorie Adriaenssens (Databricks) and Bimal Sebastian (Databricks).