Realtime Risk Management Using Kafka, Python, and Spark Streaming

Download Slides

At Shopify, we underwrite credit card transactions, exposing us to the risk of losing money. We need to respond to risky events as they happen, and a traditional ETL pipeline just isn’t fast enough. Spark Streaming is an incredibly powerful realtime data processing framework based on Apache Spark. It allows you to process realtime streams like Apache Kafka using Python with incredibly simplicity.

Related Articles:

  • Real-Time End-to-End Integration with Apache Kafka in Apache Spark‚Äôs Structured Streaming
  • Processing Data in Apache Kafka with Structured Streaming in Apache Spark 2.2
  • Building Realtime Data Pipelines with Kafka Connect and Spark Streaming

    « back
  • About Nick Evans

    Nick has applied his Statistics education to epidemiology, survey collection, and more recently, data science. He works for Shopify, and spends his days writing PySpark jobs. He also leads the development of their realtime risk management software, which recently switched to using Spark Streaming. Nick hails from Northern Ontario, Canada, and is one of those crazy few who love cold weather.