If you want to get even slightly better performance of your structured queries (regardless whether they are batch or streaming) you have to peek at the foundations of Dataset API starting with QueryExecution. That’s where any structured query ends at and my talk starts from. The talk will show you what stages a structured query has to go through before execution in Spark SQL. I’ll be talking about the different phases of query execution and the logical and physical optimizations. I’ll show the different optimizations in Spark SQL 2.3 and how to write one yourself (in Scala).
Jacek is an independent consultant who offers development and training services for Apache Spark (and Scala, sbt with a bit of Hadoop YARN, Apache Kafka, Apache Hive, Apache Mesos, Akka Actors/Stream/HTTP, and Docker). He leads Warsaw Scala Enthusiasts and Warsaw Spark meetups. The latest project is to get in-depth understanding of Apache Spark in https://jaceklaskowski.gitbooks.io/mastering-apache-spark/.