Stephen Plaza received a PhD in computer engineering from the University of Michigan, researching computer architecture, transistor-level chip optimization, and computational algorithms. After spending some time in the industry doing semi-conductor optimization. He shifted focus to bio-informatics at the Janelia Research Campus. After working on image segmentation and graph-based optimization strategies, he transitioned into a technical and managerial lead of the FlyEM project. The goal of FlyEM is to identify the connectivity of neurons in the fly brain using both automatic and manual techniques with the hope of understanding brain functionality and uncovering neurological motifs potentially relevant across many organisms.
The emerging field of connectomics aims to unlock the mysteries of brain by understanding the connectivity between neurons. To map this connectivity, we acquire thousands of electron microscopy (EM) images with nanometer-scale resolution. Once analyzed, these images have the potential to reveal neuronal shapes and the connections between them via synapses. However, extracting connectivity information from such large-scale image data is very time consuming. Imaging the brain of even a tiny organism like the fruit fly yields terabytes of data. It can take years of manual effort to examine such image volumes and trace their neuronal connections. To alleviate the time-consuming manual effort required to extract neural connectivity, we apply cutting-edge image segmentation algorithms to semi-automatically segment neurons from EM datasets. In particular, we propose a novel strategy to segment those large neurons whose volumes exceed the capacity of a single machine. Our solution is robust to the potential segmentation errors that inevitably arise from such large datasets, such as anomalies in the raw image data, or regions of data for which we lack representative training samples. To realize this solution, we implement a Spark application that runs a scalable batch segmentation on several overlapping subvolumes and implements a global algorithm for stitching these subvolumes together. To help verify this solution, we implement scalable segmentation quality analysis tools. Our segmentation results are accurate for significant portions of many neurons spanning a large dataset highlighting the quality of our global stitching strategy. Our implementation exposes flexible configuration and a plugin architecture to allow custom components for parts of the workflow. Other key features of our solution include 1) fast compression and serialization of large numpy datastructures, 2) rollback mechanisms to enable mid-application recovery needed for long-running Spark applications, and 3) reading/writing data through the image-oriented, versioned data service called DVID.