Distributed Data Processing Using Spark in Radio Astronomy - Databricks

Distributed Data Processing Using Spark in Radio Astronomy

Download Slides

The new generation of radio interferometric arrays will enable scientists to observe the Universe with unprecedented imaging capabilities, both in terms of sensitivity and resolution. An innovative feature of these arrays is that instead of using a mechanical pointing system , the electric signals from a very large number of receiving elements will be combined in software in order to form beams in various directions. Imaging software is one of the important aspects for processing the high-volume data streams produced by such and array and is one of the best places to use Spark. Moreover, the computational requirements are so extreme that the cost of the processing systems will be dominated by power consumption, mostly through data transfers. Spark’s in-memory processing model and extra algorithmic flexibility in comparison to Map-Reduce offer a clear advantage. The required data processing can be categorized as two distinct processes: calibration and imaging. The crucial point is to exploit the data parallelism across observing frequency as well as the smoothness of the variation of calibration parameters across frequency. The imaging problem itself is a deconvolution problem. Traditionally, this problem has been approached a regularized inversion however novel approaches originating form deep learning like Artificial Neural Networks trained on simulated and high SNR data can be used for optimal results. In this talk we will discuss the feasibility as well as the advantages of using Spark as a distributed processing framework for large scale (radio)-astronomical data analysis.

About Panos Labropoulos

Panos Labropoulos is a Sr. Support Engineer at Bright Computing assisting Bright's customers in using the company's flagship HPC and Big Data solutions . Prior to that he was a postdoctoral fellow at the Dutch Institute of Radio Astronomy and the University of Groningen. He graduated with a Ph.D. in Astronomy from the University of Groningen, where he was a member of the LOFAR Epoch of Reionization key science project. His research interests include high-performance and distributed computing, image processing and big data analytics.

About Sarod Yatawatta

Sarod Yatawatta is a system researcher at the Netherlands institute for radio astronomy (ASTRON). After obtaining his PhD in electrical engineering from Drexel, he has been working on developing novel data processing techniques for radio astronomical observations. He is interested in statistical signal processing and using distributed algorithms for signal processing, especially in calibration and imaging of radio astronomical data. He is a core member of the LOFAR Epoch of Reionization key science project and a senior member of IEEE.