Tatiana Dashevskiy is the lead of the data science team on T-Mobile Marketing Solutions team. Tatiana has PhD in Physics focused on applied math in dynamical systems and has experience building predictive models to characterize network behavior, application of graph theory, machine learning algorithms, and models deployment into production in big data environment using Python and Apache Spark.
The development of big data products and solutions - at scale - brings many challenges to the teams of platform architects, data scientists, and data engineers. While it is easy to find ourselves working in silos, successful organizations intensively collaborate across disciplines such that problems can be understood, a proposed model and solution can be scaled and optimized on multi-terabytes of data. In this session, the T-Mobile Marketing Solutions (TMS) Data Science team will present a platform architecture and production framework supporting TMS internal products and services. Powered by Apache Spark technologies, these services operate in a hybrid of on-premises and cloud environments. As a showcase example, we will discuss key lessons learned and best practices from our Advertising Fraud Detection service. An important focus is on how we scaled data science algorithms outside of the Spark MLlib framework. We will also demonstrate various Spark optimization tips to improve product performance and utilization of MLflow for tracking and reporting. We hope to show the best practices we've learned from our journey of building end-to-end Big Data products.