customer story
Enabling the connected car with AI

INDUSTRY: Automotive

SOLUTION: Connected car

TECHNICAL USE CASE: Data ingest and ETL, streaming, machine learning

wejo was founded with the global ambition to be the world’s largest connected car company. To date, wejo has curated over 140 billion miles of data and expects to have 17 million cars on its platform by the end of the year. With more than 15 billion data points every day and counting, wejo trusts Databricks to deliver ML-powered innovations to the automotive industry, and deliver better driving experiences.

Pipelines stretched to ingest 3 trillion monthly data points

In order to drive value to the customer, wejo ingests streaming data across 50 million connected vehicles, processing data from OEMs and satellite navigation systems every three seconds. This data provides insights into improving traffic flow, reducing accidents, safety alerts and emergency services, right through to new innovations in optimizing parking. With various data streams coming in from multiple disjointed sources, harnessing the insights from the data through data science is very difficult and resource-intensive.

  • Massive data volumes: They are processing over three trillion data points per month, all in a streaming environment from car to marketplace in less than 40 seconds — requires significant scale within a low latency environment.
  • Challenges to scale: With so much data to ingest, wejo struggled with relying on Mapreduce clusters which were rigid in size and limited in the libraries that were available. This would result in days of delay waiting for the right Python modules to be installed, which slowed innovation.
  • Slow performance: Long-running jobs could take hours if not days to process.

Reliable, performant data pipelines at scale with Delta Lake

Databricks provides wejo with a unified data analytics platform that has fostered a scalable and collaborative environment across data science and engineering, allowing data teams to more quickly innovate and deliver ML-powered innovations to the automotive industry.

  • Managed platform in the cloud simplifies the provisioning of compute clusters to any size.
  • Support for multiple languages (SQL, Scala, Python, R) improves collaboration across data engineering, data science, and analysts.
  • Native support for Delta Lake allows their data engineering team to reliably run and scale both batch and streaming pipelines on the same data.

Making the roads safer with ML innovations

With Databricks, wejo is now able to do large-scale data processing and machine learning faster and cheaper. But most importantly, they are now able to easily share the output across the team and the organization — enabling others to drive innovation into the market.

  • Improved operational efficiency: Features such as auto-scaling clusters has improved data engineering operations, accelerating pipelines for downstream analytics from weeks to minutes.
  • Better cross-team collaboration: Shared notebook environment with support for multiple languages has improved team productivity.
  • Faster time-to-insight: We now get in over a 20x performance benefit over open-source Spark with Databricks and 90% decrease in time to market.
  • 50x
    Faster time-to-insight due to Improved IT operations
  • 20x
    Faster data processing of vehicle and road data
  • 90%
    Decrease in time-to-market of new innovations

Before Databricks, the time to market would’ve been weeks, if not months to meet the analysis requirements for some of our customers. Now, it takes hours.”

– Steve Pimblett, Chief Information Officer and Data Officer, wejo