Introducing New Built-in and Higher-Order Functions for Complex Data Types in Apache Spark 2.4
Apache Spark 2.4 introduces 29 new built-in functions for manipulating complex types (for example, array type), including higher-order functions. Before Spark 2.4, for manipulating the complex types directly, there were two typical solutions: 1) Exploding the nested structure into individual rows, and applying some functions, and then creating the structure again 2) Building a User...
Working with Nested Data Using Higher Order Functions in SQL on Databricks
Nested data types offer Databricks customers and Apache Spark users powerful ways to manipulate structured data. In particular, they allow you to put complex objects like arrays, maps and structures inside of columns. This can help you model your data in a more natural way. While this feature is certainly useful, it can be a...
SQL Subqueries in Apache Spark 2.0
In the upcoming Apache Spark 2.0 release, we have substantially expanded the SQL standard capabilities. In this brief blog post, we will introduce subqueries in Apache Spark 2.0, including their limitations, potential pitfalls and future expansions, and through a notebook, we will explore both the scalar and predicate type of subqueries, with short examples that...