Extending Spark SQL 2.4 with New Data Sources (Live Coding Session)—continues - Databricks

Extending Spark SQL 2.4 with New Data Sources (Live Coding Session)—continues

Download Slides

Spark SQL 2.4.x gives you two Data Source APIs that your structured queries can use to access data in custom formats, possibly in unsupported storage systems. There is the older and almost legacy DataSource API V1 and what you can consider a modern DataSource API V2. This talk will introduce you to the main entities of each DataSource API and show you the steps how to write a new data source live on stage. That should give you enough knowledge on expanding available data sources in Spark SQL with new ones.

 

Try Databricks
See More Spark + AI Summit Europe 2019 Videos

« back
About Jacek Laskowski

Development and training services

Jacek is an independent consultant who offers development and training services for Apache Spark (and Scala, sbt with a bit of Hadoop YARN, Apache Kafka, Apache Hive, Apache Mesos, Akka Actors/Stream/HTTP, and Docker). He leads Warsaw Scala Enthusiasts and Warsaw Spark meetups. The latest project is to get in-depth understanding of Apache Spark in https://jaceklaskowski.gitbooks.io/mastering-apache-spark/.