How Home Trust Modernized Batch Processing with Databricks Data Intelligence Platform and dbt Cloud

Published: March 17, 2025

by Hiral Jasani, Bill Tan and Marshall Gu

Summary

Home Trust transitioned to the Databricks Data Intelligence Platform and dbt Cloud to overcome the limitations of their legacy ETL solution
The adoption of dbt Cloud facilitated better collaboration between analytics and engineering teams
Home Trust improves customer experience by leveraging Databricks AI/BI Genie to derive insights from their data through natural language queries and LLMs to automate tasks such as underwriting and generating follow-up emails

At Home Trust, we measure success in terms of relationships. Whether we’re working with individuals or businesses, we strive to help them stay “Ready for what’s next.”

Staying one step ahead of our customers’ financial needs means keeping their data readily available for analytics and reporting in an enterprise data warehouse, which we call the Home Analytics & Reporting Platform (HARP). Our data team now uses Databricks Data Intelligence Platform and dbt Cloud to build efficient data pipelines so that we can collaborate on business workloads and share them with the critical partner systems outside the enterprise. In this blog, we share the details of our work with Databricks and dbt and outline the use cases that are helping us be the partner our customers deserve.

The perils of slow batch processing

When it comes to data, HARP is our workhorse. We could hardly run our business without it. This platform encompasses analytics tools such as Power BI, Alteryx and SAS. For years, we used IBM DataStage to orchestrate the different solutions within HARP, but this legacy ETL solution eventually began to buckle under its own weight. Batch processing ran through the night, finishing as late as 7:00 AM and leaving us little time to debug the data before sending it off to partner organizations. We struggled to meet our service level agreements with our partners.

It wasn’t a difficult decision to move to Databricks Data Intelligence Platform. We worked closely with the Databricks team to start building our solution – and just as importantly, planning a migration that would minimize disruptions. The Databricks team recommended we use DLT-META, a framework that works with Databricks Delta Live Tables. DLT-META served as our data flow specification, which enabled us to automate the bronze and silver data pipelines we already had in production.

We still faced the challenge of fast-tracking a migration with a team whose skill sets revolved around SQL. All our previous transformations in IBM solutions had relied on SQL coding. Looking for a modern solution that would allow us to leverage these skills, we decided on dbt Cloud.

Right from our initial trial of dbt Cloud, we knew we had made the right choice. It supports a wide range of development environments and provides a browser-based user interface, which minimizes the learning curve for our team. For example, we performed a very familiar Slowly Changing Dimensions-based transformation and cut our development time considerably.

How the lakehouse powers our mission-critical processes

Every batch processing run at Home Trust now relies on Databricks Data Intelligence Platform and our lakehouse architecture. The lakehouse doesn’t just ensure we can access data for reporting and analytics – as important as those activities are. It processes the data we use to:

Enable loan renewal processes in the broker community
Exchange data with the U.S. Treasury
Update FICO scores
Send important business fraud alerts
Run our default recovery queue

In short, if our batch processing were to get delayed, our bottom line would take a hit. With Databricks and dbt, our nightly batch now ends around 4:00 AM, leaving us ample time for debugging before we feed our data into at least 12 external systems. We finally have all the computing power we need. We no longer scramble to hit our deadlines. And so far, the costs have been fair and predictable.

Here’s how it works from end to end:

Azure Data Factory drops data files into Azure Data Lake Storage (ADLS). For SAP source files, SAP Data Services drops the files into ADLS.
From there, DLT-META processes bronze and silver layers.
dbt Cloud is then used for transformation at the gold layer so it is ready for downstream analysis.
The data then hits our designated pipelines for activities such as loans, underwriting and default recovery.
We use Databricks Workflows and Azure Data Factory for all our orchestration between platforms.

None of this would be possible without intense collaboration between our analytics and engineering teams – which is to say none of it would be possible without dbt Cloud. This platform brings both teams together in an environment where they can do their best work. We’re continuing to add dbt users so that more of our analysts can build proper data models without help from our engineers. Meanwhile, our Power BI users will be able to leverage these data models to create better reports. The results will be greater efficiency and more trustworthy data for everyone.

Data aggregation happens almost suspiciously quickly

Within Databricks Data Intelligence Platform, depending on the team’s background and comfort level, some users access code through Notebooks while others use SQL Editor.

By far the most useful tool for us is Databricks SQL – an intelligent data warehouse. Before we can power our dashboards for analytics, we have to use complicated SQL commands to aggregate our data. Thanks to Databricks SQL, many different analytics tools such as Power BI can access our data because it’s all sitting in one place.

Our teams continue to be amazed by the performance within Databricks SQL. Some of our analysts used to aggregate data in Azure Synapse Analytics. When they began running on Databricks SQL, they had to double-check the results because they couldn’t believe an entire job ran so quickly. This speed enables them to add more detail to reports and crunch more data. Instead of sitting back and waiting for jobs to finish hanging, they’re answering more questions from our data.

Unity Catalog is another game changer for us. So far, we’ve only implemented it for our gold layer of data, but we plan to extend it to our silver and bronze layers eventually across our entire organization.

Built-in AI capabilities deliver speedy answers and streamline development

Like every financial services provider, we’re always looking for ways to derive more insights from our data. That’s why we started using Databricks AI/BI Genie to engage with our data through natural language.

We plugged Genie into our loan data – our most important data set – after using Unity Catalog to mask personally identifiable information (PII) and provision role-based access to the Genie room. Genie uses generative AI that understands the unique semantics of our business. The solution continues to learn from our feedback. Team members can ask Genie questions and get answers that are informed by our proprietary data. Genie learns about every loan we make and can tell you how many mortgages we funded yesterday or the total outstanding receivables from our credit card business.

Our goal is to use more NLP-based systems like Genie to eliminate the operational overhead that comes with building and maintaining them from scratch. We hope to expose Genie as a chatbot that everyone across our business can use to get speedy answers.

Meanwhile, the Databricks Data Intelligence Platform offers even more AI capabilities. Databricks Assistant lets us query data through Databricks Notebooks and SQL Editor. We can describe a task in plain language and then let the system generate SQL queries, explain segments of code and even fix errors. All of this saves us many hours during coding.

Lower overhead means a better customer experience

Although we’re still in our first year with Databricks and dbt Cloud, we’re already impressed by the time and cost savings these platforms have generated:

Lower software licensing fees. With Unity Catalog, we’re running data governance through Databricks rather than using a separate platform. We also eliminated the need for a legacy ETL tool by running all our profiling rules through Databricks Notebooks. In all, we’ve reduced software licensing fees by 70%.
Faster batch processing. Compared to our legacy IBM DataStage solution, Databricks and dbt process our batches 90% faster.
Faster coding. Thanks to increased efficiency through Databricks Assistant, we’ve reduced our coding time by 70%.
Easier onboarding of new hires. It was getting hard to find IT professionals with 10 years of experience with IBM DataStage. Today, we can hire new graduates from good STEM programs and put them right to work on Databricks and dbt Cloud. As long as they studied Python and SQL and used technologies such as Anaconda and Jupyter, they’ll be a good fit.
Less underwriting work. Now that we’re mastering the AI capabilities within Databricks, we’re training a large language model (LLM) to perform adjudication work. This project alone could reduce our underwriting work by 80%.
Fewer manual tasks. Using the LLM capabilities within Databricks Data Intelligence Platform, we write follow-up emails to brokers and place them in our CRM system as drafts. Each of these drafts saves a few valuable minutes for a team member. Multiply that by thousands of transactions per year, and it represents a major time savings for our business.

With more than 500 dbt models in our gold layer of data and about half a dozen data science models in Databricks, Home Trust is poised to continue innovating. Each of the technology enhancements we’ve described supports an unchanging goal: to help our customers stay “Ready for what’s next.”

To learn more, check out this MIT Technology Review report. It features insights from in-depth interviews with leaders at Apixio, Tibber, Fabuwood, Starship Technologies, StockX, Databricks and dbt Labs.

What's next?

December 9, 2024/6 min read

Scale Faster with Data + AI: Insights from the Databricks Unicorns Index

December 11, 2024/4 min read