Extending Apache Spark’s Ingestion: Building Your Own Java Data Source

Download Slides

Apache Spark is a wonderful platform for running your analytics jobs. It has great ingestion features from CSV, Hive, JDBC, etc. however, you may have your own data sources or formats you want to use. Your solution could be to convert your data in a CSV or JSON file and then ask Spark to do ingest it through its built-in tools. However, for enhanced performance, we will explore the way to build a data source, in Java, to extend Spark’s ingestion capabilities. We will first understand how Spark works for ingestion, then walk through the development of this data source plug-in. Targeted audience Software and data engineers who need to expand Spark’s ingestion capability. Key takeaways Requirements, needs & architecture – 15%. Build the required tool set in Java – 85%.
Session hashtag: #EUdev6

« back
About Jean Georges Perrin

Jean Georges "JGP" Perrin is a Software Architect for Zaloni. He is proud to have been the first in France to be named as an IBM Champion, and to have been awarded the honor for his ninth consecutive year. Active within the Raleigh-Durham Spark community, JGP shares his more than 20 years of experience in IT as a presenter and participant at conferences and by publishing articles in print and online media. His blog is visible at http://jgp.net. When he is not immersed in IT, which he loves, he enjoys exploring his adopted region of North Carolina with his kids.