Jiri is a developer, open-source enthusiast, Red hatter, juggler, geek, data scientist and father of two
October 15, 2019 05:00 PM PT
Have you ever wondered how to implement your own operator pattern for you service X in Kubernetes? You can learn this in this session and see an example of open-source project that does spawn Apache Spark clusters on Kubernetes and OpenShift following the pattern. You will leave this talk with a better understanding of how spark-on-k8s native scheduling mechanism can be leveraged and how you can wrap your own service into operator pattern not only in Go lang but also in Java. The pod with spark operator and optionally the spark clusters expose the metrics for Prometheus so it makes it eas
June 4, 2018 05:00 PM PT
Blockchain has become a buzzword: people are excited about distributed ledgers and cryptocurrencies, but these technologies are shrouded in myths, and misunderstanding. This talk will shed some light into how this awesome technology is actually used in practice by using Apache Spark to analyze blockchain transactions.
We'll start with a brief introduction to blockchain transactions and how we can ETL transaction graph data obtained from the public binary format. Then we will look at how to model graph data in Spark, briefly comparing GraphFrames and GraphX. The majority of the presentation will be a live demo, running on Spark in the cloud, showing how we can run various queries on the transaction graph data, solve graph algorithms such as PageRank for identifying significant BTC addresses, observe network evolution, and more.
All of the work described in this talk is published as open source code and all of the data are available in public and available for community experimentation as well as all the containers. You will leave this talk with a better understanding of blockchain technology and graph processing in Spark and you will have the concrete tools to reproduce my research or start answering your own questions.
Session hashtag: #Exp6SAIS