Willem Pienaar leads the data science platform team at GOJEK, working on the GOJEK ML platform, which supports a wide variety of models and handles over 100 million orders every month. His main focus areas are building data and ML platforms, allowing organizations to scale machine learning and drive decision making. In a previous life, Willem founded and sold a networking startup and was a software engineer in industrial control systems.
Gojek, Indonesia's first billion-dollar startup, has seen an explosive growth in both users and data over the past three years. Today, it uses big data-powered machine learning to inform decision making in its ride-hailing, lifestyle, logistics, food delivery, and payment products, from selecting the right driver to dispatch to dynamically setting prices to serving food recommendations to forecasting real-world events. Hundreds of millions of orders per month, across 18 products, are all driven by machine learning. Features are at the heart of what makes these machine learning systems effective. However, many challenges still exist in the feature engineering life-cycle. Developing features from big data is often an engineering heavy task, with challenges in both the scaling of data processes and the serving of features in production systems.
Teams also face challenges in enabling discovery, reducing duplication, improving understanding, and providing standardization of features throughout organizations. In this talk, Willem Pienaar will explain the need for features at organizations like Gojek and will discuss the challenges faced in creating, managing, and serving them in production. He will describe how leveraging open source software like Spark and MLflow allowed their team to build Feast, an open source feature store that bridges data engineering and machine learning. He will explain how Feast and Spark allows them to overcome these challenges, the lessons they learned along the way, and the impact the feature store had at Gojek. Finally, he demonstrate how democratizing the process of creating, sharing, and managing features dramatically reduces time to market and leads to key insights.
GOJEK, the Southeast Asian super-app, has seen an explosive growth in both users and data over the past three years. Today the technology startup uses big data powered machine learning to inform decision-making in its ride-hailing, lifestyle, logistics, food delivery, and payment products. From selecting the right driver to dispatch, to dynamically setting prices, to serving food recommendations, to forecasting real-world events. Hundreds of millions of orders per month, across 18 products, are all driven by machine learning. Building production grade machine learning systems at GOJEK wasn't always easy. Data processing and machine learning pipelines were brittle, long running, and had low reproducibility. Models and experiments were difficult to track, which led to downstream problems in production during serving and model evaluation. In this talk we will cover these and other challenges that we faced while trying to scale end-to-end machine learning systems at GOJEK. We will then introduce MLflow and explore the key features that make it useful as part of an ML platform. Finally, we will show how introducing MLflow into the ML life cycle has helped to solve many of the problems we faced while scaling machine learning at GOJEK.