You have the perfect use case for your Spark applications – whether it be batch processing or super fast near-real time streaming — Now, where to store your valuable data!? In this talk we take a look at four storage options; HDFS, HBase, Solr and Kudu. With so many to choose from, which will fit your use case? What considerations should be taken into account? What are the pros and cons, what are the similarities and differences and how do they fit in with your Spark application? Learn the answers to these questions and more with a look at design patterns and techniques, and sample code to integrate into your application immediately. Walk away with the confidence to propose the right architecture for your use cases and the development know-how to implement and deliver with success.
Session hashtag: #EUdev10
Mladen Kovacevic is a Senior Solutions Architect at Cloudera and has architected and developed end-to-end Hadoop applications providing meaningful insight for clients. He has operationalized Hadoop clusters targeted for multi-tenant use, designed and implemented numerous pipelines both batch and real-time, and helps clients meet security best practices as well as performance objectives. Mladen has over a decade of professional experience in software development in RDBMS technology as well as SQL on Hadoop. He has architected Hadoop applications leveraging Spark, Hive, Impala, Oozie and more and has contributed to several open source projects.