Ben Weber is a distinguished data scientist at Zynga with past experience at Twitch, Electronic Arts, Daybreak Games, and Microsoft Studios. He received his PhD in computer science from UC Santa Cruz.
June 25, 2020 05:00 PM PT
At Zynga we've opened up our PySpark environment to our full analytics organization, which includes game analytics, data science, and engineering teams across the globe. The result of democratizing Spark is that more of our teams are able to perform analyses at scale and our data scientists are now responsible for productizing predictive modeling pipelines. The biggest impact that opening up our data platform has had is teams identifying novel applications of PySpark including large-scale experimentation, player segmentations, recommendation systems, and anomaly detection. PySpark is the latest step in the transformation of our analytics organization, which has migrated from SQL to Python to Spark. We've focused on three key areas to make Spark accessible at Zynga: infrastructure, onboarding, and features.
One of the prerequisites we had for scaling our usage of Spark was building connections between Databricks and the rest of our data platform. To accomplish this, we authored a set of libraries that enable our Spark environment to work seamlessly with our data lake, data warehouse, and application databases. To onboard our teams onto PySpark, we created templated notebooks, held training sessions during our onsite conferences, and provided sandbox environments for learning. In order to ease the transition from Python to PySpark, we've been leveraging newer features in Spark including Pandas UDFs and Koalas to provide familiar interfaces. The result of this effort is that the majority of our teams are now using PySpark for large-scale analyses and our data science teams are responsible for multiple data products in production. This session will discuss our approach for enabling our full analytics organization to leverage PySpark, discuss growing pains that we encountered, and well as successes from democratizing Spark/p>