Kiavash Kianfar, Ph.D., is a Sr. Software Engineer at Databricks. He develops algorithms and software for the Delta and streaming project as well as genomics applications (including Project Glow). Kiavash was a tenured associate professor at Texas A&M University before joining Databricks. In addition to software development, he has years of teaching and research experience in algorithmics, bioinformatics, and optimization.
May 26, 2021 04:25 PM PT
Machine learning practitioners are most comfortable using high-level programming languages such as Python. This is a barrier to parallelizing algorithms with big data frameworks such as Apache Spark, which are written in lower-level languages. Databricks partnered with the Regeneron Genetics Center to create the Glow library for population-scale genomics data storage and analytics. Glow V1.0.0 includes PySpark-based implementations for both existing and novel machine learning algorithms. We will discuss how leveraging tooling for Python users, especially Pandas UDFs, accelerated our development velocity and impacted our algorithms’ computational performance.