Charcey is a Software Engineer within John Deere’s Intelligent Solutions Group. Her formal training is in Geographic Information Systems and Remote Sensing, but has found an interest in systems engineering, specifically building complex data ingestion pipelines that take into account scale, speed, and cost. At John Deere she develops data pipelines that contribute to advanced machine learning algorithms.
May 27, 2021 04:25 PM PT
John Deere ingests petabytes of precision agriculture data every year from its customers' farms across the globe. In order to scale our data science efforts globally, our data scientists need to perform geospatial analysis on our data lake in an efficient and scalable manner. In this talk, we will describe some of the methods our data engineering team developed for efficient geospatial queries including:
- Leveraging Quadtree spatial indexing to partition our Delta Lake tables
- Extending the Spark Catalyst Optimizer to perform efficient geospatial joins in our data lake