Introducing the Support of Lateral Column AliasSeptember 19, 2023 by Xinyi Yu, Wenchen Fan and Gengliang Wang in Engineering Blog We are thrilled to introduce the support of a new SQL feature in Apache Spark and Databricks: Lateral Column Alias (LCA). This feature...
Introducing Apache Spark™ 3.5September 15, 2023 by Yuanjian Li, Daniel Tenedorio, Martin Grund, Allan Folting, Hyukjin Kwon, Herman van Hövell, Wenchen Fan, Weichen Xu, Gengliang Wang, Allison Wang, Jungtaek Lim, Xiao Li and Reynold Xin in Engineering Blog Today, we are happy to announce the availability of Apache Spark™ 3.5 on Databricks as part of Databricks Runtime 14.0. We extend our...
Introducing English as the New Programming Language for Apache SparkJune 29, 2023 by Gengliang Wang, Xiangrui Meng, Reynold Xin, Allison Wang, Amanda Liu and Denny Lee in Open Source Introduction We are thrilled to unveil the English SDK for Apache Spark, a transformative tool designed to enrich your Spark experience. Apache Spark™...
Introducing Apache Spark™ 3.2October 19, 2021 by Gengliang Wang, Wenchen Fan, Hyukjin Kwon, Xiao Li and Reynold Xin in Engineering Blog We are excited to announce the availability of Apache Spark™ 3.2 on Databricks as part of Databricks Runtime 10.0 . We want to...
Apache Avro as a Built-in Data Source in Apache Spark 2.4November 30, 2018 by Gengliang Wang, Wenchen Fan and Michael Armbrust in Solutions Try this notebook in Databricks Apache Avro is a popular data serialization format. It is widely used in the Apache Spark and Apache...
Benchmarking Apache Spark on a Single Node MachineMay 3, 2018 by Gengliang Wang, Reynold Xin and Jules Damji in Solutions Apache Spark has become the de facto unified analytics engine for big data processing in a distributed environment. Yet we are seeing more...