Bucketing in Spark SQL 2.3

Bucketing is an optimization technique in Spark SQL that uses buckets and bucketing columns to determine data partitioning. When applied properly bucketing can lead to join optimizations by avoiding shuffles (aka exchanges) of tables participating in the join. The talk will give you the necessary information so you can use bucketing to optimize Spark SQL structured queries.

Session hashtag: #SAISDev12

« back
About Jacek Laskowski

Jacek Laskowski, an independent consultant, software engineer and trainer focusing exclusively on Apache Spark and Apache Kafka (with Scala and sbt, and as much as necessary with Apache Mesos, Hadoop YARN, and DC/OS). He is best known by the gitbooks at https://jaceklaskowski.gitbooks.io about Apache Spark, Spark Structured Streaming, and Apache Kafka. Find me at https://twitter.com/jaceklaskowski.