The Joy of Nested Types with Spark

If you have a parent child relationship or a many to many relationship in your data model you will want to learn about nested dataset functionality in Spark. Ted Malaska (co-author of Hadoop Application Architecture) will walk through why nested types may change your life in solving common problems like large joins and even cartesian joins. This talk will include a full code example of create nested tables with Spark SQL, populating them those tables, and finally accessing them through a number of ways.

About Ted Malaska

Ted is working on the Battle.net team at Blizzard, helping support great titles like World of Warcraft, Overwatch, HearthStone, and much more. Previously, he was a Principal Solutions Architect at Cloudera, helping clients be successful with Hadoop and the Hadoop ecosystem. Previously, he was a Lead Architect at the Financial Industry Regulatory Authority (FINRA). He has also contributed code to Apache Flume, Apache Avro, Apache Yarn, Apache HDFS, Apache Spark, Apache Sqoop, and many more. Ted is also a co-author or O’Reilly “Hadoop Application Architectures” and a frequent speaker at many conferences, and a frequent blogger on data architectures.