Témoignage de client
Accurate insurance pricing
with data and ML

INDUSTRY: Insurance

SOLUTION: Predictive analytics

TECHNICAL USE CASE: Data ingest and ETL, machine learning, deep learning

The explosive growth in data availability and increasing market competition are challenging insurance providers to provide better pricing to their customers. With 100s of millions of insurance records to analyze for downstream ML, Nationwide realized their legacy batch analysis process was slow and inaccurate, providing limited insight to predict the frequency and severity of claims. With Databricks, they have been able to employ deep learning models at scale to provide more accurate pricing predictions, resulting in more revenue from claims.

Batch Jobs and Manual Processes, Inaccurate Models

The key to providing accurate insurance pricing lies in leveraging information from insurance claims. However, data challenges were difficult as they had to analyze insurance records that were volatile as claims were infrequent and unpredictable — resulting in inaccurate pricing.

  • Big data challenges: 100s of millions of insurance records needed to be analyzed for insurance pricing. Furthermore, insurance claims were infrequent and would arise due to chance, making it difficult to hone in on accurate pricing of claims.
  • Limited scalability: Infrastructure limitations meant they could only leverage individual workstations to analyze limited sets of data to make predictions.
  • Cross-team silos: Large data organization with data analysts, data engineering, and data science made collaboration on data and models challenging.

Databricks Unlocks Scale and ML Innovation

Nationwide leverages the Databricks Unified Analytics Platform to manage the entire analytics process from data ingestion to the deployment of deep learning models. The fully managed platform has simplified IT operations and unlocked new data-driven opportunities for their data science teams.

  • La plateforme entièrement managée avec gestion automatisée des clusters simplifie l'infrastructure et les opérations à n'importe quelle échelle.
  • Collaborative Workspaces: interactive notebooks improve cross-team collaboration and data science creativity, allowing Nationwide to greatly accelerate model prototyping for faster iteration.
  • Simplified ML Lifecycle: managed MLflow simplifies their ability to train and deploy deep learning (hierarchical neural networks) into production.

Deep Learning Results in More Accurate Insurance Pricing

With the use of Databricks across data engineering and data science, Nationwide has seen significant improvements around data processing speeds and the ability to quickly train accurate models for their use cases.

  • Data processing at scale: Improved runtime of their entire data pipeline from 34 hours to less than 4 hours, a 9x performance gain.
  • Faster featurization: Data engineering is able to identify features 15x faster — from 5 hours to around 20 minutes.
  • Faster model training: Reduced training times by 50%, enabling faster time-to-market of new models.
  • Improved model scoring: Accelerated model scoring from 3 hours to less than 5 minutes, a 60x improvement.
  • 9x
    Faster data pipelines, improving runtime from 34 hours to less than 4 hours
  • 15x
    Improvement in featurization speeds for downstream ML
  • 50%
    Reduction in time to train and deploy ML models

With Databricks, we are able to train models against all our data more quickly, resulting in more accurate pricing predictions that have had a material impact on revenue.”

– Bryn Clark, Data Scientist, Nationwide

Contenu associé

Technical Talk at Spark + AI Summit NA 2019