PySpark in 2023: A Year in ReviewMarch 25, 2024 by Hyukjin Kwon, Takuya Ueshin, Allison Wang, Ruifeng Zheng, Xinrong Meng, Haejoon Lee and Amanda Liu in Industries With the releases of Apache Spark 3.4 and 3.5 in 2023, we focused heavily on improving PySpark performance, flexibility, and ease of use...
Simplify PySpark testing with DataFrame equality functionsMarch 6, 2024 by Haejoon Lee, Allison Wang and Amanda Liu in Engineering Blog The DataFrame equality test functions were introduced in Apache Spark™ 3.5 and Databricks Runtime 14.2 to simplify PySpark unit testing. The full set...
10 Minutes from pandas to Koalas on Apache SparkMarch 31, 2020 by Haejoon Lee, Yifan Cao, Hyukjin Kwon and Takuya Ueshin in Solutions This is a guest community post from Haejoon Lee, a software engineer at Mobigen in South Korea and a Koalas contributor. pandas is...