Appraiser : How Airbnb Generates Complex Models in Spark for Demand Prediction

Download Slides

Many open source machine learning frameworks exist, such as Spark’s MLLIB and the Hadoop based Mahout project. These frameworks are great for getting started with using ML in products, but because they are so generic they may lack certain production driven features. In this talk we will present the ML framework used to generate Appraiser and discuss some production driven concepts that inform the development of the framework such as: Configurable feature engineering Feature code is written once and configured using text files using a feature transformation pipeline Interactions between features are picked to make sense and thus we can scale boosting to many millions of bushy trees Debuggability Boosted random forests are hard to debug Product quantization enables engineers to rapidly debug models and check for data quality Production constraints Creating smooth models Enforcing monotonicity (e.g. demand should always decrease with increasing price)