As a Data Scientist at Capital One, I work on various projects to build customer-facing products that directly impact millions of our credit card customers. I design, develop, implement and deploy data pipelines and data products to bring better customer experience to all Capital One customers. I love coding in Python and Scala and I am using Spark on a daily basis. Before joining Capital One I was a Ph.D. student major in Geographic Information Science.
In this talk we will introduce the business use case of how we create a real-time platform for our Second Look project using Spark and Kafka. Second Look is a feature created by Capital One to detect and notify cardholders of these potential mistakes and unexpected charges. We bring them to the attention of the customers automatically through email alerts and push notifications to ensure customers can take timely action. The situations can be resolved through a conversation with the merchant, or a dispute on your charge directly to Capital One. We help to guide the user through this resolution path through our user experiences. We use Spark extensively to build the infrastructure for this project. Before we use Spark and Kafka, the alerts were not sent in real-time and there were delays in days between when the customers transact and when customers receive the alerts. With the power of Spark and Kafka, we are able to send the alert in a more timely manner. We will share how we connect each parts together from data ingestion to processing, alert generation, and alert delivery. We will demonstrate how Spark plays critical role in the whole infrastructure. What’s next? We will leverage more power of machine learning using Spark to generate various types of alerts.