Jason T. Brown is a Senior Data Scientist at Astraea, Inc. applying machine learning to Earth-observing data to provide actionable insights to clients’ and partners’ challenges. He brings a background in mathematical modeling and statistics together with an appreciation for data visualization, geography, and software development.
Overhead imagery from satellites and drones have entered the mainstream of how we explore, understand, and tell stories about our world. They are undeniable and arresting descriptions of cultural events, environmental disasters, economic shifts, and more. Data scientists recognize that their value goes far beyond anecdotal storytelling. It is unstructured data full of distinctive patterns in a high dimensional space. With machine learning, we can extract structured data from the vast set of imagery available. RasterFrames extends Spark SQL with a strong Python API to enable processing of satellite, drone, and other spatial image data. This talk will discuss the fundamentals ideas to make sense of this imagery data. We will discuss how RasterFrames custom DataSource exploits convergent trends in how public and private providers publish images. Through deep Spark SQL integration, RasterFrames lets users consider imagery and other location-aware data sets in their existing data pipelines. RasterFrames fully supports Spark ML and interoperates smoothly with TensorFlow, Keras, and PyTorch. To crystallize these ideas, we will discuss a practical data science case study using overhead imagery in PySpark.