Mike Vedomske is a Senior Data Scientist in PetSmart’s Advanced Analytics Group. PetSmart is the largest specialty pet retailer of services and solutions for the lifetime needs of pets and is headquartered in Phoenix, Arizona, USA. Before PetSmart, Mike worked in the world of startups and defense contractors applying various modeling techniques to problems as diverse as the US healthcare system, marketing, cybersecurity, the Internet of Things, critical infrastructure protection, fraud, medical informatics, and more. Mike is a National Science Foundation Graduate Research Fellow and received his PhD from the Department of Systems Engineering at the University of Virginia.
PetSmart, with over 1,600 stores in North America, is the largest specialty pet retailer of services and solutions for the lifetime needs of pets. The Advanced Analytics Group is a small team of highly business-oriented strategy and data science professionals that uses various data and modeling methodologies to generate breakthrough insights for various business units throughout the company to deliver top- and bottom-line growth for PetSmart.
As a retailer, PetSmart has many years of transaction data, loss prevention store reports, customer feedback, labor schedules, supply chain, and other data. Loss prevention deals with reduction of preventable losses whether it be from, theft, fraud, vandalism, waste, abuse, incidents, accidents, or misconduct.
Store leaders at PetSmart locations submit free text reports to the Loss Prevention team of investigators which must be prioritized for further resolution. Most reports are of low priority and are reported as a matter of policy fulfillment but some require further investigation by this team. However, the team must still read each report in order to filter out low priority reports and then spend time investigating the higher priority reports. The Advanced Analytics Group was asked if we could help automatically prioritize these reports. Developing a prioritization system with performance high enough to automatically prioritize would require near-human performance. To achieve that level (96% accuracy) we utilized FastAI’s ULMFiT NLP classifier.
FastAI is not natively supported on Azure Databricks so setup required special configuration. Azure Databricks newly released ML Beta and GPU clusters were instrumental in enabling the setup. Other challenges included actually extracting the data from a legacy reporting system. Without the flexibility Azure Databricks provides, the iterations, training, and eventual operationalization of the model would have taken much longer and at a greater cost.