Customer Case Study: Brandless - Databricks


Customer Case Study


Brandless is an e-commerce company that manufactures and sells an assortment of products under its own Brandless label.

Vertical Use Case

  • Leveraging data and AI to create personalized online shopping experiences

Technical Use Case

  • Ingest/ETL
  • Machine Learning
  • Deep learning

The Challenges

  • Massive Volumes of Disparate Data: Generating terabytes of data across millions of customers ranging from supply chain, in merchandising information, or web analytics with interaction with their website.
  • Infrastructure Complexity: With a small team, they struggled with managing complex infrastructure on AWS and a variety of tools to support downstream data science. As a result, the data science team spent too much time on setting up and maintaining infrastructure rather than the data.
  • Machine Learning at Scale: Ability to build, train, and deploy ML models in a repeatable and reproducible manner was impossible due to the lack of a distributed system to handle the scale they needed to be successful.

The Solution

Databricks has provided Brandless with a fully managed analytics platform on AWS that simplifies data engineering and accelerates AI innovation.

  • Fully Managed Platform: A fully managed cloud platform on AWS simplifies operations and delivers superior performance of data pipelines at scale.
  • Automated Infrastructure Management: Simplified cluster management with auto-scaling significantly reduced time spent on data engineering and development.
  • Easier Data Engineering: Comprehensive audit logs and Github integration made implementation, debugging, testing, and code reviewing process easier, faster, and more interactive.
  • Robust Machine Learning Infrastructure: MLflow greatly streamlined their machine learning lifecycle, simplifying model deployment, model versioning which enables reproducibility, and process repeatability.
  • Collaborative Notebooks: Data scientists can collaborate, share, and track data and insights across various programming languages, fostering an environment of transparency and improving productivity.

The Results

  • 3X Faster Model Training: Distributed system enables model training at a massive scale — accelerating model training from an average of 3 hours to less than 1 hour. This has allowed Brandless to train more models more frequently.
  • Improved Data Science Productivity: Automated infrastructure management, built-in debugging tools, and collaborative notebooks in a centralized platform where they can share and reuse code accelerated data science innovation.

Databricks and MLflow has allowed a relatively small data team at Brandless to achieve goals that we otherwise thought were only possible for very large companies.

Adam Bernhard – Head of Data at Brandless