Supporting Over a Thousand Custom Hive User Defined Functions - Databricks

Supporting Over a Thousand Custom Hive User Defined Functions

Download Slides

Over the years, Facebook has used Hive as the primary query engine to be used by our data engineers. Since Hive uses SQL-like query language called HQL, the list of built-in User Defined Functions (UDFs) did not always satisfy our customer requirements and as a result, an extensive list of custom UDFs was developed over time. As we started migrating pipelines from Hive to Spark SQL, a number of custom UDFs appeared incompatible with Spark, and many others showed bad performance. In this talk will first take a deep dive into how Hive UDFs work with Spark. We will then share what challenges we overcame on the way to support 99.99% of the custom UDFs in Spark.

« back
About Sergey Makagonov

Sergey Makagonov is a software engineer in Big Compute team at Facebook. Sergey is passionate about building large-scale distributed systems to solve real world problems. Prior to Facebook, he worked as a software engineer at Ipsy, where he scaled personalization platform of the subscription service using Apache Spark. Sergey obtained a Master's degree in Computer Science from Kazakh-British Technical University.

About Xin Yao

Xin Yao is a Software Engineer at Facebook Spark team. Before Facebook, Xin worked as a Senoir Software Engineer at Hulu, where he built the realtime ETL pipeline and scaled data warehouse. Xin received his master from Beijing University of Posts and Telecommunications in 2013.