James Conner is the lead architect for the 84.51°/Kroger data science and data management platforms. His big data journey began with Hadoop in 2008, and continued with Spark in 2012. His education is in Archaeology and Biological Anthropology from the University of Alaska. He is an avid wildlife photographer, SCUBA Dive Master, beer connoisseur, and is learning electrical engineering in his spare time.
Productionizing data science models is a challenging tasks for many organizations. At 84.51, we've built an end-to-end data science architecture on the Google and Azure cloud platforms that allows us to take data science from proof of concepts to production products. In this presentation, I will demonstrate how we are using Azure and Databricks services to develop data science code, push that code to production environments, and then perform orchestration to generate predictions. We'll use the software development concepts of Continuous Integration and Continuous Development for productionization, and also discuss the importance of developing data science code as libraries to simplify productionization. We'll use Azure Databricks, Azure Data Factory, Azure DevOps, Python, PySpark, and Python Wheels as the technology stack in the demonstration. Instructions, code and diagrams will be made available on GitHub so attendees can try the process themselves.