Databricks SQL enhances decision-making in the financial services industry
Faster querying
Cost reduction
CRED is a members-only credit card bill payment and lifestyle platform. Members are rewarded with exclusive rewards and experiences for clearing their bills on time . The company leverages data to customize the member experience — features, products, rewards and more. CRED uses the Databricks Data Intelligence Platform, Databricks SQL and Unity Catalog to improve its data management and governance capabilities.
Improved data management for better decision-making
Data analytics helps CRED to turn structured and unstructured data into insights for better decision-making. In the initial stage, CRED managed data with its own data warehouse and Redshift. But as it grew and rapidly scaled with multiple business lines, data expansion strained its systems. In CRED’s previous data warehouse solution, segregating compute was a challenge. For example, if a query in one queue consumed substantial resources, it would impact the entire cluster, resulting in query slowdowns. In one case, a slow query response resulted in the delay of a critical report.
At the same time, loading structured, unstructured and clickstream data from Amazon S3 to Redshift or directly querying the data from Redshift Spectrum was difficult and costly and resulted in business reporting slowdowns.
There were also challenges in isolating workloads at the line-of-business or team level. As the company grew, identifying areas that were consuming the most resources became a bottleneck. User management was also increasingly complex as the company added additional tools.
The company needed a simplified, cohesive solution that would allow it to do more with its data without slowing systems down. CRED chose to work with Databricks for its ability to query batch and streaming data tables using an open data platform and because it prevented vendor lock-in.
Mining data at scale with Databricks SQL
With the Databricks Data Intelligence Platform, which is built on Delta Lake — an open format storage layer that manages both streaming and batch operations — teams at CRED now have a single source for structured, semi-structured and unstructured data. And rather than managing all its tools separately, CRED employees can access them via a single platform.
CRED also uses Databricks SQL, a serverless data warehouse on the Databricks Data Intelligence Platform, to ingest, store and govern business-critical data at scale. “In Databricks, I can easily spin up a warehouse for a team and another for another team. It is very intuitive,” says Deepanshu Rai, Data Engineer at CRED.
Databricks also allows the company to isolate workloads and conduct cost breakdowns to allocate costs to specific departments. Post-implementation of Databricks, teams at CRED can assess usage by different teams down to the user level.
CRED also uses Databricks Unity Catalog for data governance and data lineage. “Unity Catalog enables us to manage all users across multiple workspaces in one central place, so it makes user management significantly easier,” says Omesh Patil, Data Architect at CRED. “Also, the data lineage feature is out of the box, which helps us identify the downstream dependencies without any manual overheads. In the case of BI data, we can figure out how the BI data is flowing and where the users are putting it using data lineage.”
Finally, Databricks empowered CRED to use streaming use cases, which was not feasible with its previous warehouse, Redshift Provider. That helped the company to pull real-time data from its internal data sources into its data lake, enabling a new feature. “It’s very easy to configure the new pipelines, and performance-wise Databricks is very efficient,” says Patil.
Improving user experience and decision-making
Using Databricks has improved productivity, performance and ease of use for CRED. For example, Databricks SQL enables it to create different warehouses for reporting use cases. Because Databricks SQL warehouses auto-scale, query execution is faster, system slowdowns are eliminated and critical reports are now delivered on time. “Databricks SQL is intuitive, and from an admin perspective, managing permissions and governance is very easy with Unity Catalog,” says Rai.
CRED can also now isolate workloads using Databricks SQL warehouses, which translates to better query times and removes “noisy neighbor” behavior. Users can quickly display dashboards or digital queries, and monitoring queries and cluster health is easy.
CRED data scientists can also now download data from Databricks SQL directly instead of downloading to S3 first and then downloading a specific file. Eliminating that step improves productivity and provides users with “one source of the truth.”
Databricks also powers a streaming pipeline for CRED, which is impossible in traditional data warehousing solutions. For example, CRED partners often request a statement of transactions completed using CRED Pay. CRED uses Databricks SQL to perform the query on one data source on the lakehouse, which combines historical and streaming data and delivers statements within seconds. “With Redshift, we had to do hack-arounds to make that happen. It was very inefficient,” says Rai. “With Databricks, we have all types of data in one place, allowing us to respond to requests much faster, which improves the experience.”
Finally, Unity Catalog lets CRED manage all its users across multiple workspaces in one place, which improves the user experience, makes user management easy, and enables data lineage to identify any downstream dependencies. “Team collaboration and sharing notebooks or code are also easier using Databricks, as is scheduling queries or building dashboards,” says Patil. “Databricks has many capabilities that are helping us.”