Articles by Hyukjin Kwon - Databricks Blog

Page 2

How to Monitor Streaming Queries in PySpark

May 27, 2022 by Hyukjin Kwon, Karthik Ramasamy and Alexander Balikov in Engineering Blog

Streaming is one of the most important data processing techniques for ingestion and analysis. It provides users and developers with low latency and...

Introducing Apache Spark™ 3.2

October 19, 2021 by Gengliang Wang, Wenchen Fan, Hyukjin Kwon, Xiao Li and Reynold Xin in Engineering Blog

We are excited to announce the availability of Apache Spark™ 3.2 on Databricks as part of Databricks Runtime 10.0 . We want to...

Pandas API on Upcoming Apache Spark™ 3.2

October 4, 2021 by Hyukjin Kwon and Xinrong Meng in Engineering Blog

We're thrilled to announce that the pandas API will be part of the upcoming Apache Spark™ 3.2 release. pandas is a powerful, flexible...

Benchmark: Koalas (PySpark) and Dask

April 7, 2021 by Xinrong Meng and Hyukjin Kwon in Engineering Blog

Koalas is a data science library that implements the pandas APIs on top of Apache Spark so data scientists can use their favorite...

Introducing Apache Spark™ 3.1

March 2, 2021 by Hyukjin Kwon, Wenchen Fan, Xiao Li and Reynold Xin in Engineering Blog

We are excited to announce the availability of Apache Spark 3.1 on Databricks as part of Databricks Runtime 8.0 . We want to...

How to Manage Python Dependencies in PySpark

December 22, 2020 by Hyukjin Kwon in Engineering Blog

Controlling the environment of an application is often challenging in a distributed computing environment - it is difficult to ensure all nodes have...

Python Autocomplete Improvements for Databricks Notebooks

December 15, 2020 by Richard Fung, Xinrong Meng, Takuya Ueshin, Hyukjin Kwon and Austin Ford in Engineering Blog

At Databricks, we strive to provide a world-class development experience for data scientists and engineers, and new features are constantly getting added to...

An Update on Project Zen: Improving Apache Spark for Python Users

September 4, 2020 by Hyukjin Kwon and Matei Zaharia in Solutions

Apache Spark™ has reached its 10th anniversary with Apache Spark 3.0 which has many significant improvements and new features including but not limited...

Interoperability between Koalas and Apache Spark

August 11, 2020 by Takuya Ueshin, Hyukjin Kwon and Xiao Li in Solutions

Koalas is an open source project which provides a drop-in replacement for pandas, enabling efficient scaling out to hundreds of worker nodes for...

A Comprehensive Look at Dates and Timestamps in Apache Spark™ 3.0

July 22, 2020 by Maxim Gekk, Wenchen Fan and Hyukjin Kwon in Engineering Blog

Apache Spark is a very popular tool for processing structured and unstructured data. When it comes to processing structured data, it supports many...