Improving the Spark Exclusion Mechanism in Databricks
Ed Note: This article contains references to the term blacklist, a term that the Spark community is actively working to remove from Spark. The feature name will be changed in the upcoming Spark 3.1 release to be more inclusive, and we look forward to this new release. Why Exclusion? The exclusion mechanism was introduced for...
Interoperability between Koalas and Apache Spark
Koalas is an open source project which provides a drop-in replacement for pandas, enabling efficient scaling out to hundreds of worker nodes for everyday data science and machine learning. After over one year of development since it was first introduced last year, Koalas 1.0 was released. pandas is a Python package commonly used among data...
Introducing Koalas 1.0
Koalas was first introduced last year to provide data scientists using pandas with a way to scale their existing big data workloads by running them on Apache SparkTM without significantly modifying their code. Today at Spark + AI Summit 2020, we announced the release of Koalas 1.0. It now implements the most commonly used pandas...
Introducing Apache Spark 3.0
We’re excited to announce that the Apache SparkTM 3.0.0 release is available on Databricks as part of our new Databricks Runtime 7.0. The 3.0.0 release includes over 3,400 patches and is the culmination of tremendous contributions from the open-source community, bringing major advances in Python and SQL capabilities and a focus on ease of use...
Now on Databricks: A Technical Preview of Databricks Runtime 7 Including a Preview of Apache Spark 3.0
Introducing Databricks Runtime 7.0 Beta We’re excited to announce that the Apache SparkTM 3.0.0-preview2 release is available on Databricks as part of our new Databricks Runtime 7.0 Beta. The 3.0.0-preview2 release is the culmination of tremendous contributions from the open-source community to deliver new capabilities, performance gains and expanded compatibility for the Spark ecosystem. Using...
Introducing Apache Spark 2.4
UPDATED: 11/19/2018 We are excited to announce the availability of Apache Spark 2.4 on Databricks as part of the Databricks Runtime 5.0. We want to thank the Apache Spark community for all their valuable contributions to the Spark 2.4 release. Continuing with the objectives to make Spark faster, easier, and smarter, Spark 2.4 extends its...
Introducing Apache Spark 2.3
Today we are happy to announce the availability of Apache Spark 2.3.0 on Databricks as part of its Databricks Runtime 4.0. We want to thank the Apache Spark community for all their valuable contributions to Spark 2.3 release. Continuing with the objectives to make Spark faster, easier, and smarter, Spark 2.3 marks a major milestone...