Sarah is a data scientist at Bitly. She loves Python, machine learning, and the startup world. She is an accomplished conference speaker and an O’Reilly Media author, and is very involved in the Python community.
Bitly generates over 9 billion clicks on shortened links a month, as well as over 100 million unique link shortens. Analyzing data of this scale is not without its challenges. At Bitly, we have started adopting Apache Spark as a way to process our data. In this talk, I'll elaborate on how I use Spark as part of my data science workflow. I'll cover how Spark fits into our existing architecture, the kind of problems I'm solving with Spark, and the benefits and challenges of using Spark for large-scale data science.