SESSION

Best Practices for Unit Testing PySpark

Accept Cookies to Play Video

OVERVIEW

EXPERIENCEIn Person
TYPELightning Talk
TRACKData Engineering and Streaming
TECHNOLOGIESApache Spark, Developer Experience, ETL
SKILL LEVELIntermediate
DURATION20 min

This talk shows you best practices for unit testing PySpark code. Unit tests help you reduce production bugs and make your codebase easy to refactor. You will learn how to create PySpark unit tests that run locally and in CI via GitHub actions. You will learn best practices for structuring PySpark code so it’s easy to unit test. You’ll also see how to run integration tests with a cluster for staging datasets. Integration tests provide an additional level of safety.

SESSION SPEAKERS

Matthew Powers

/Staff Developer Advocate
Databricks