If you want to get even slightly better performance of your structured queries (regardless whether they are batch or streaming) you have to peek at the foundations of Dataset API starting with QueryExecution. That’s where any structured query ends at and my talk starts from. The talk will show you what stages a structured query has to go through before execution in Spark SQL. I’ll be talking about the different phases of query execution and the logical and physical optimizations. I’ll show the different optimizations in Spark SQL 2.3 and how to write one yourself (in Scala).
Session hastag: #SAISDD1
Jacek Laskowski, an independent consultant, software engineer and trainer focusing exclusively on Apache Spark and Apache Kafka (with Scala and sbt, and as much as necessary with Apache Mesos, Hadoop YARN, and DC/OS). He is best known by the gitbooks at https://jaceklaskowski.gitbooks.io about Apache Spark, Spark Structured Streaming, and Apache Kafka. Find me at https://twitter.com/jaceklaskowski.