Skip to main content

Edmunds_Databricks

  • Date & Time: October 19th, 2016 at 10:00am PT / 1:00pm ET / 5:00pm UTC
  • Presenters: Shaun Elliott - Senior Software Engineer & Team Lead, Edmunds.com and Christian Lugo - Software Engineer, Edmunds.com

Register for this webinar now.

Edmunds.com is a leading online car information and shopping marketplace serving nearly 20 million visitors each month to their website. Their ability to drive revenue is directly correlated to the user experience of their web and mobile applications. One of the most impactful ways to increase customer engagement is to ensure the highest levels of data quality on their auto listings pages.

However, identifying missing and inaccurate details within the thousands of auto listing pages was difficult to keep under control and do so in a cost effective manner, primarily because of the 10x increase in data to 100+ TBs in the past four years across various siloed data sources including both internal and paid external sources. For example, what percentage of Subarus have “sunroof” inaccurately listed under options? These are the questions Edmunds’ data teams are trying to solve.

Executing on this approach introduced technical challenges around the amount of DevOps time spent integrating a growing number of data sources and maintaining resource intensive MapReduce jobs required to deliver the insights they needed to make the right data source decisions.

Edmunds.com turned to Databricks to simplify the management of their Apache Spark infrastructure while accelerating data exploration at scale by 6x. Now they can quickly analyze large datasets to determine the best sources for car data on their website.

Join this webinar to learn:

  • Why Edmunds.com moved from MapReduce to Databricks for ad hoc data exploration.
  • How Databricks democratized data access across teams to improve decision making and feature innovation.
  • Best practices for doing ETL and building a robust data pipeline with Databricks.

Register Now

Try Databricks for free

Related posts

5 Steps to Implementing Intelligent Data Pipelines With Delta Live Tables

September 8, 2021 by Awez Syed and Amit Kara in
Get an early preview of O'Reilly's new ebook for the step-by-step guidance you need to start using Delta Lake Many IT organizations are...

How Uplift built CDC and Multiplexing data pipelines with Databricks Delta Live Tables

This blog has been co-developed and co-authored by Ruchira and Joydeep from Uplift, we’d like to thank them for their contributions and thought...

Apache Spark™ Clusters in Autopilot Mode

Apache Spark™ is a unified analytics engine that helps users use a single distributed computing framework for various use cases. With the advent...
See all Company Blog posts