Andy is a mathematician turned into a distributed computing engineer with an entrepreneurship trait. Andy is a certified Scala/Spark trainer and wrote the Learning Play! Framework 2 book. He participated in many projects, building on top of spark, cassandra, and other distributed technologies, in various fields including Geospatial, IoT, Automotive and Smart cities projects. He is the creator of one of the top projects on GitHub related to Apache Spark and Scala, the spark-notebook (https://github.com/andypetrella/spark-notebook/). He also co-founded, with Xavier Tordoir, the Data Fellas company dedicated to data science and distributed computing.
Genomics and Health data is nowadays one of the hot topics requiring lots of computations and specially machine learning. This helps science with a very relevant societal impact to get even better outcome. That is why Apache Spark and its ADAM library is a must have. This talk will be twofold. First, we'll show how Apache Spark, MLlib and ADAM can be plugged all together to extract information from even huge and wide genomics dataset. Everything will be packed into examples from the Spark Notebook, showing how bio-scientists can work interactively with such a system. Second, we'll explain how these methodologies and even the datasets themselves can be shared at very large scale between remote entities like hospitals or laboratories using micro services leveraging Apache Spark, ADAM, Play Framework 2, Avro and Tachyon.