YipitData provides data-driven research to empower investors by combining alternative data sources with web data for comprehensive coverage. By leveraging Databricks, YipitData’s data team has been able to reduce data processing time by up to 90 percent, increasing their analysts’ ability to deliver impactful, reliable insights to their clients. Additionally, by moving to Databricks on AWS, YipitData has reduced database expenses by almost 60%.
Making sound investment choices requires information. The more actionable information you have, the odds of financial services institutions (FSIs) obtaining a better understanding of their customers, markets, and businesses increases significantly. YipitData is in the business of providing data-driven insights to the world’s largest hedge funds and corporations to help them gain a real competitive edge and provide better service to their customers. Specifically, they leverage alternative data sources and web scrapes to help banking institutions and asset managers alike make better decisions by revealing valuable information about consumer behavior (e.g. utility payment history, transaction information) and extends across a variety of use cases including trade analyses, credit risk, and ESG risk.
“Alternative data is a critical key to the success of our financial services customers,” explained Anup Segu, Senior Software Engineer at YipitData. “However, most organizations don’t have the means to leverage alternative data to the greatest extent possible. That’s where we come in to help.”
The challenge the YipitData team faced however was not only the sheer volume and variety of alternative data (each month they make billions of requests collecting data from hundreds of websites), but they were also limited by siloed teams and the inability to scale their data processing and analytics. Running queries and scaling their previous data warehouse proved challenging and time-consuming.
“We were constantly running into performance bottlenecks,” explained Segu. “Very large queries could take up to six hours which slowed our ability to answer questions.”
Collaboration across teams was also an issue as they struggled to share learnings and code. “We struggled with siloes of tribal knowledge, which hampered our ability to scale and operate with speed,” explained Bill Mensch, data analyst at YipitData.
With Databricks, the team at YipitData is now able to manage the entire data analytics workflow from data ingestion to downstream analytics. Integrated cluster management with features like autoscaling has greatly simplified infrastructure management while lowering operational costs. “Since we can manage compute and storage independently, Databricks has allowed us to optimize our cluster management and AWS spend,” explained Segu.
Databricks has empowered its 40+ data analysts to evolve their roles into hybrid data engineers and analysts — enabling them to independently create data ingestion systems, manage ETL workflows, and produce meaningful financial research for their clients.
Databricks has given our analysts flexibility so that they can be in control,” explained Steve Pulec, CTO at YipitData. “As a result, data engineering doesn’t even need to be involved and can focus on higher valued tasks.”
Now they can rapidly construct and deploy robust ETL workflows within the Databricks notebooks, and leverage their programming language of choice (Python or SQL) to explore, visualize, and analyze their data.
The biggest gains from using Databricks has been the sheer processing power at scale, improved cost efficiencies of a cloud platform, and the democratization of data. With scalable cloud infrastructure at their fingertips, they’ve been able to accelerate their data pipelines by up to 90% on average. And in some cases, some very large queries that used to take up to 6 hours can now be completed in roughly 7 seconds.
“Databricks allows us to effortlessly trade scale for speed, which was not possible before,” said Andrew Gross, Staff Engineer from YipitData. “Now we are able to answer more questions with the same resources.”
Not only are they processing more data faster, but they are also doing so more efficiently which has helped drive business forward. “COVID has created tons of questions in the market, and we have gone into overdrive in terms of analyzing data to uncover answers,” said Pulec. “All of that additional work has had a huge impact on our top line and has really helped our business to be able to answer those questions for investors in a timely manner. It probably would not have been possible in the old world.”
Although the scale of analyses and reporting to customers has increased by 4-5x, Pulec estimates that overall operational spending has decreased significantly. “Databricks has reduced our operations costs by almost 60%,” said Pulec. And overall, with the help of Databricks and some savvy cost-cutting techniques, they were able to cut their annual AWS bill by 50% or $2.5 million.
Databricks serving as the foundation for their data analytics workflow, YipitData is looking to expand the adoption of Databricks across the company — promoting greater transparency and cross-team collaboration. Looking ahead, YipitData is well-positioned to take full advantage of the explosion of alternative data and unlocking new insights for FSIs and corporations to make smarter business decisions.
Meet the great data team that’s behind YipitData
With Databricks, we’re innovating faster than ever before across our data engineering and analyst functions, and paying less in database expenses every year.”
– Steve Pulec, CTO at YipitData