Blockchain has become a buzzword: people are excited about distributed ledgers and cryptocurrencies, but these technologies are shrouded in myths, and misunderstanding. This talk will shed some light into how this awesome technology is actually used in practice by using Apache Spark to analyze blockchain transactions.
We’ll start with a brief introduction to blockchain transactions and how we can ETL transaction graph data obtained from the public binary format. Then we will look at how to model graph data in Spark, briefly comparing GraphFrames and GraphX. The majority of the presentation will be a live demo, running on Spark in the cloud, showing how we can run various queries on the transaction graph data, solve graph algorithms such as PageRank for identifying significant BTC addresses, observe network evolution, and more.
All of the work described in this talk is published as open source code and all of the data are available in public and available for community experimentation as well as all the containers. You will leave this talk with a better understanding of blockchain technology and graph processing in Spark and you will have the concrete tools to reproduce my research or start answering your own questions.
Session hashtag: #Exp6SAIS
Jiri is a developer, open-source enthusiast, Red hatter, juggler, geek, data scientist and father of two