Panos Labropoulos - Databricks

Panos Labropoulos

Sr. Support Engineer, Bright Computing, Inc. / Univ. of Athens

Panos Labropoulos is a Sr. Support Engineer at Bright Computing assisting Bright’s customers in using the company’s flagship HPC and Big Data solutions . Prior to that he was a postdoctoral fellow at the Dutch Institute of Radio Astronomy and the University of Groningen. He graduated with a Ph.D. in Astronomy from the University of Groningen, where he was a member of the LOFAR Epoch of Reionization key science project. His research interests include high-performance and distributed computing, image processing and big data analytics.


Distributed Data Processing Using Spark in Radio Astronomy

The new generation of radio interferometric arrays will enable scientists to observe the Universe with unprecedented imaging capabilities, both in terms of sensitivity and resolution. An innovative feature of these arrays is that instead of using a mechanical pointing system , the electric signals from a very large number of receiving elements will be combined in software in order to form beams in various directions. Imaging software is one of the important aspects for processing the high-volume data streams produced by such and array and is one of the best places to use Spark. Moreover, the computational requirements are so extreme that the cost of the processing systems will be dominated by power consumption, mostly through data transfers. Spark's in-memory processing model and extra algorithmic flexibility in comparison to Map-Reduce offer a clear advantage. The required data processing can be categorized as two distinct processes: calibration and imaging. The crucial point is to exploit the data parallelism across observing frequency as well as the smoothness of the variation of calibration parameters across frequency. The imaging problem itself is a deconvolution problem. Traditionally, this problem has been approached a regularized inversion however novel approaches originating form deep learning like Artificial Neural Networks trained on simulated and high SNR data can be used for optimal results. In this talk we will discuss the feasibility as well as the advantages of using Spark as a distributed processing framework for large scale (radio)-astronomical data analysis.