Leading the AI journey for audit, tax and advisory services
KPMG modernizes the data estate with Azure Databricks
As one of the world’s leading accounting firms, the KPMG global organization operates in 143 countries and territories worldwide, offering audit, tax and advisory services. Serving more than 82% of the Fortune Global 500 and more than 80% of the Forbes Global 1000, KPMG LLP is on a mission to deliver high-quality, timely and relevant insights to its clients. As a data-driven organization pivoting in the age of AI, KPMG quickly saw that its legacy Hadoop infrastructure struggled with increasing speed-to-market requirements and prohibitive costs due to rapidly growing amounts of data. KPMG turned to Databricks Data Intelligence Platform as the core engine to drive its data transformation, analytics and model development processes. Since moving to Databricks, KPMG has seen increased productivity, faster time to market and increased scale and performance.
Legacy Hadoop architecture slows innovation
As a professional services firm operating in over 143 countries, KPMG International is a trusted partner for businesses in all industries, offering audit, tax and advisory services. KPMG LLP, the American member firm of KPMG International, employs more than 40,000 people and serves all 50 U.S. states. KPMG prides itself on a commitment to quality and service excellence in all that it does. Still, an aging Hadoop infrastructure hindered its ability to meet speed-to-market requirements and keep costs down with rapidly increasing data volumes and variety. What’s more, KPMG understood the increasing demands of AI and needed a solution that was flexible, open and agile enough to meet the needs of a quickly evolving market.
Seeing the need to future-proof its data platform, KPMG embarked on a pivotal transition in 2021 to move its data operations from on-premises relational databases to Microsoft Azure, with Azure Databricks as a core pillar of KPMG LLP’s cloud data strategy. Dennis Tally, who leads the firm’s delivery of a modern data platform as part of its Chief Digital Office, explained, “Databricks has enabled KPMG to build a scalable and resilient lakehouse architecture, powering data transformation, analytics and AI workloads to meet our emerging AI requirements from across the firm, while also reducing data integration complexity and costs.”
At present, KPMG boasts a strong base of over 850 active power users successfully utilizing Databricks, supporting use cases within the Chief Digital Office and across the organization in areas like finance, risk prediction and human resources. “We truly use Databricks as the foundation for everything we do,” Tally said.
Seamless federation and data governance in the modern era
Since choosing to modernize its platform with Databricks, KPMG has been able to federate its data to easily support downstream analytics and machine learning needs. Unity Catalog and Delta Sharing have allowed KPMG to break down data silos that existed across the firm, enabling data federation from a variety of sources, including on-premises SQL servers, cloud data warehouses and Azure Data Lake Storage. This versatility allows for a faster time to value for business projects, as it eliminates the need for manual data integration. The addition of automatically captured and updated lineage further simplifies the process, making it easier to identify the sources for data assets and their downstream usage. “Federation is a key plank in our strategy, and we are currently using Unity Catalog to federate different data sources in one place in production across our environments. The governed sharing of data has never been easier,” Tally said. The governance of Unity Catalog offers a single place for KPMG to administer data access policies that apply across all workspaces. KPMG can now define its policies once, and they will be enforced across the lakehouse, simplifying the management of access controls, which helps it meet regulatory compliance and better support its customers. The amount of data managed under Databricks Unity Catalog currently exceeds 500TB and counting. With more than 850 active users on the platform, the value created continues to grow, with thousands of additional platform customers utilizing Power BI dashboards built on data ingested into the lakehouse.
Not only has Databricks provided an ideal governance solution for KPMG, but Databricks SQL enables users across the business to run BI and ETL workloads at scale while easily integrating with Power BI for dashboarding. Serverless compute removes the need for KPMG to manage, configure or scale its Azure cloud infrastructure on the lakehouse since its Databricks SQL Serverless elastic SQL compute — decoupled from storage — automatically scales for KPMG LLP’s high concurrency use cases. On the machine learning front, MLflow provides an end-to-end MLOps platform, streamlining the process of taking ML models to production and then maintaining and monitoring those models. With Unity Catalog and MLflow, KPMG can leverage the power of AI to automate monitoring, diagnose errors and uphold data and ML model quality, freeing up data scientists to focus on innovation in areas like engagement risk evaluation, next-best service and forecasting.
Beyond core data engineering and AI workloads, Databricks also serves data scientists and analysts in other departments at KPMG with use cases in finance, growth and strategy, and risk. KPMG uses Databricks SQL to support the development of finance and accounting dashboarding using Power BI, scaling beyond the limitations of legacy data warehousing systems and expanding accessible value for the organization. On the growth and strategy side, Databricks Clean Rooms let KPMG provision access to full-volume production data and a set of privilege and engineering tools for distributed development of analytics and AI. For example, the CDO team is deploying Databricks with the KPMG risk organization to develop models that evaluate and predict the risk associated with KPMG LLP’s engagements and determine when engagements are not properly aligned for client success. Databricks has provided a common playing field for KPMG LLP’s data science and engineering teams looking to develop the next generation of AI.
Accelerating into the future with data and AI
Today, KPMG has built a best-in-class modern data platform that relies heavily on Azure Databricks as the underlying engine for executing business logic across the organization. With Databricks, KPMG has unified all its data and AI across the firm. Tally said, “For the first time, we’ve created a true enterprise platform that can support the dynamic needs of both our front- and back-office professionals at KPMG.” Using Databricks, KPMG has seen increased productivity, faster time to market, and increased scale and performance.
Databricks has also enabled KPMG to become an industry leader in data and artificial intelligence, allowing the firm to experiment with cutting-edge generative AI technologies like Databricks Assistant, a context-aware AI assistant fully integrated into the Databricks Platform. Databricks Assistant lets KPMG developers query data using natural language through its conversational interface. It then generates the corresponding SQL or Python queries using context from code cells, libraries, tags, popular tables and Unity Catalog schemas.
In the age of new technologies like GenAI, KPMG will continue to be at the forefront of leveraging emerging technology to ensure the best outcomes for its customers. Tally stated, “AI will continue to dominate our business, from the way we run ourselves internally to taking advantage of the right tooling to deliver impact to our engagement teams and ultimately to our customers.” Looking ahead, KPMG has plans for continued innovations across its organization with the Databricks Data Intelligence Platform. “We are a professional services organization leading others on their AI journey, and it is imperative that we excel in this domain,” Tally concluded.
Register for the KPMG data governance talk at Data + AI Summit: STREAMLINING DATA GOVERNANCE AND INTEGRATION WITH UNITY CATALOG