customer story
Democratizing data for better shopping experiences

INDUSTRY: Retail

SOLUTION: Personalized customer experience

TECHNICAL USE CASE: Data ingest and ETL, machine learning, deep learning

As a leading e-commerce company in fashion, Wehkamp has dedicated itself to providing the best possible shopping experience for their customers. With that mission in mind, they trust in Databricks for data analytics and machine learning — allowing them to build an exciting and highly engaging web shop that is personalized to each of their customers.

Legacy data warehouse not keeping up with website demand

Wehkamp was using a traditional corporate data warehouse, but as its business grew, its inability to scale without intensive DevOps support was slowing things down. Furthermore, their legacy systems were not collaborative and created silos as only their data analysts were able to access the data, most of which was left unused due to the challenges created by data silos. This all had a cumulative effect on their ability to not only innovate with machine learning, but when they did build new features, they were not able to scale them across their global websites at the rate their business required.

  • Massive volumes of data: Data generated by 500,000 visitors and 400,000 products every single day.
  • Data silos and inability to scale: Struggled to scale operations to support data science efforts against huge amounts of data due to data silos and a traditional data warehouse environment. As a result, time-to-insight was slower than they required to drive innovation across their various global websites.
  • Inefficient machine learning: Inability to scale model building and training to meet business needs.
  • Slow time to market: Building new features was slow, taking them over one year to go from ideation to production — impeding their ability to quickly scale regional successes across their global websites.

Democratizing data and machine learning

Databricks provides Wehkamp with a Unified Data Analytics Platform that fostered a collaborative and democratic environment across the entire company, enabling them to ingest large volumes of high velocity data, and develop a powerful image classification and recommendation engine to improve the customer experience.

  • Fully managed platform on AWS: Automated cluster management simplifies the infrastructure and operations at any scale.
  • More efficient data flow: Able to easily integrate with other tools like airflow and Kubernetes, allowing them to build automated data pipelines while establishing CI/CD best practices.
  • Improved cross-team collaboration: Collaborative notebook environment with support for multiple languages (SQL, Scala, Python, R) enables a diverse team of users to work together in their preferred language allowed them to accelerate data science operations and innovation.
  • Streamlined ML lifecycle: Native support for MLflow enables data science teams to easily replicate experiments, track model performance, and rapidly iterate across their models in a systematic fashion.

Enabling a shopping experience that converts

With Databricks, anyone in Wehkamp can easily get access to data, to work, display and integrate with other services to make more use of that data. The machine learning use cases have provided tremendous value and direct impact on revenue.

  • Improved data team productivity: With data analysts, scientists, and engineers working together and efficiently, Wehkamp broke the data silos, making it easier to use the data. Wehkamp has enabled all of their analysts to use Databricks and Tableau to analyze their data and drive better business decisions.
  • Improved operational efficiency: features such as auto-scaling clusters and MLflow has improved operations from data ingest to managing the entire machine learning lifecycle — allowing them to build and train hundreds of models per day. In addition, Wehkamp uses Tableau to consume data directly from Delta Lake running on Databricks, enabling analysts to more easily visualize their entire data lake.
  • Reduced operational costs: Transition from Hadoop to Databricks, reducing operational costs by roughly 70%.
  • More data science innovations: Automated display of products with image classification and more personalized shopping experience for customers. Serve 10 different kinds of recommendations at scale and increased customer engagement with more personalized content and eventually 2X their revenue.
  • 70%
    Reduction in operational costs
  • 2x
    Increase in revenue due to increased customer engagement
  • 100s
    Of models built per day

Databricks is an incredible platform that enables engineers, scientists and analysts to collaborate to build some of the most state of the art products out there.”

– Arnoud de Munnik, Data Scientist, Wehkamp

Related Content


Technical Talk at Spark + AI Summit EU 2019