- Date & Time: October 19th, 2016 at 10:00am PT / 1:00pm ET / 5:00pm UTC
- Presenters: Shaun Elliott – Senior Software Engineer & Team Lead, Edmunds.com and Christian Lugo – Software Engineer, Edmunds.com
Edmunds.com is a leading online car information and shopping marketplace serving nearly 20 million visitors each month to their website. Their ability to drive revenue is directly correlated to the user experience of their web and mobile applications. One of the most impactful ways to increase customer engagement is to ensure the highest levels of data quality on their auto listings pages.
However, identifying missing and inaccurate details within the thousands of auto listing pages was difficult to keep under control and do so in a cost effective manner, primarily because of the 10x increase in data to 100+ TBs in the past four years across various siloed data sources including both internal and paid external sources. For example, what percentage of Subarus have “sunroof” inaccurately listed under options? These are the questions Edmunds’ data teams are trying to solve.
Executing on this approach introduced technical challenges around the amount of DevOps time spent integrating a growing number of data sources and maintaining resource intensive MapReduce jobs required to deliver the insights they needed to make the right data source decisions.
Edmunds.com turned to Databricks to simplify the management of their Apache Spark infrastructure while accelerating data exploration at scale by 6x. Now they can quickly analyze large datasets to determine the best sources for car data on their website.
Join this webinar to learn:
- Why Edmunds.com moved from MapReduce to Databricks for ad hoc data exploration.
- How Databricks democratized data access across teams to improve decision making and feature innovation.
- Best practices for doing ETL and building a robust data pipeline with Databricks.