Phan is a data engineer on T-Mobile Marketing Solutions team, his current focus is scaling machine learning models in telecommunication network traffic using Apache Spark and supporting marketing decisions using data insights. With a wide experience from data warehouse, data analyst to software and product development, Phan brings a strong connection between different teams and helps to build a complete Machine Learning product from end-to-end.
The development of big data products and solutions - at scale - brings many challenges to the teams of platform architects, data scientists, and data engineers. While it is easy to find ourselves working in silos, successful organizations intensively collaborate across disciplines such that problems can be understood, a proposed model and solution can be scaled and optimized on multi-terabytes of data. In this session, the T-Mobile Marketing Solutions (TMS) Data Science team will present a platform architecture and production framework supporting TMS internal products and services. Powered by Apache Spark technologies, these services operate in a hybrid of on-premises and cloud environments. As a showcase example, we will discuss key lessons learned and best practices from our Advertising Fraud Detection service. An important focus is on how we scaled data science algorithms outside of the Spark MLlib framework. We will also demonstrate various Spark optimization tips to improve product performance and utilization of MLflow for tracking and reporting. We hope to show the best practices we've learned from our journey of building end-to-end Big Data products.