Very Large Data Files, Object Stores, and Deep Learning—Lessons Learned While Looking for Signs of Extra-Terrestrial Life - Databricks

Very Large Data Files, Object Stores, and Deep Learning—Lessons Learned While Looking for Signs of Extra-Terrestrial Life

In this session, IBM will present details on advanced Apache Spark analytics currently being performed through a collaborative project with the SETI Institute, NASA, Swinburne University, Stanford University and IBM. The Allen Telescope Array in northern California has been continuously scanning the skies for over two decades, generating data archives with over 200 million signal events.
Come and learn how astronomers and researchers are using Apache Spark, in conjunction with assets such as IBM’s Cognitive Compute Cluster with over 700 GPUs, to train neural net models for signal classification, and to perform computationally intensive Spark workloads on multi-terabyte binary signal files. The speakers will also share details on one of the key components of this implementation: Stocator, an open source (Apache License 2.0) object store connector for Hadoop and Apache Spark, specifically designed to optimize their performance with object stores. Learn how Stocator works, and see how it was able to greatly improve performance and reduce the quantity of resources used, both for ground-to-cloud uploads of very large signal files, and for subsequent access of radio data for analysis using Spark.

Session hashtag: #SFeco2

About Graham Mackintosh

Graham is project executive in the IBM Emerging Technology Division, focusing on the use of Apache Spark on complex analytic areas such as radio signal processing, high energy particle physics jet discrimination, homeland security, and cyber crime detection. He received his degree in computer science from Queen’s University in Canada, and has over 25 years of career experience in advanced analytics, and artificial intelligence. Most recently, Graham has focused on the application of Apache Spark to very large volumes of scientific data, with an emphasis on the application of deep learning to large data sets..

About Gil Vernik

Gil is a researcher in IBM, storage clouds, security, and analytics group. He received his PhD degree in Mathematics from the University of Haifa and completed a post doctoral position in Germany. In IBM he works with Apache Spark, Hadoop, Object Stores, no-SQL databases. He has more than 25 years of experience as a code developer, both server side and client side, knows Java, Python, Scala, C/C++, and Erlang.