Jules S. Damji is an Apache Spark Community and Developer Advocate at Databricks. He is a hands-on developer with over 15 years of experience and has worked at leading companies, such as Sun Microsystems, Netscape, @Home, LoudCloud/Opsware, VeriSign, ProQuest, and Hortonworks, building large-scale distributed systems. He holds a B.Sc and M.Sc in Computer Science and MA in Political Advocacy and Communication from Oregon State University, Cal State, and Johns Hopkins University respectively.
We all know what they say - the bigger the data, the better. But when the data gets really big, how do you use it? This talk will cover three of the most popular deep learning frameworks: TensorFlow, Keras, and Deep Learning Pipelines, and when, where, and how to use them. We'll also discuss their integration with distributed computing engines such as Apache Spark (which can handle massive amounts of data), as well as help you answer questions such as:
Of all the developers' delight, none is more attractive than a set of APIs that make developers productive, that are easy to use, and that are intuitive and expressive. Apache Spark offers these APIs across components such as Spark SQL, Streaming, Machine Learning, and Graph Processing to operate on large data sets in languages such as Scala, Java, Python, and R for doing distributed big data processing at scale. In this talk, I will explore the evolution of three sets of APIs-RDDs, DataFrames, and Datasets-available in Apache Spark 2.x. In particular, I will emphasize three takeaways: 1) why and when you should use each set as best practices 2) outline its performance and optimization benefits; and 3) underscore scenarios when to use DataFrames and Datasets instead of RDDs for your big data distributed processing. Through simple notebook demonstrations with API code examples, you'll learn how to process big data using RDDs, DataFrames, and Datasets and interoperate among them. (this will be vocalization of the blog, along with the latest developments in Apache Spark 2.x Dataframe/Datasets and Spark SQL APIs: https://databricks.com/blog/2016/07/14/a-tale-of-three-apache-spark-apis-rdds-dataframes-and-datasets.html) Session hashtag: #EUdev12
Today, we have 625 and 430K spark meetups and members respectively around the globe. How can we work, share, collaborate, and promote speakers and sessions?This BoF is for anyone who's Spark Meetup Orangizer, attendee, speaker, or anyone interested to share ideas for better sharing and collaborating on tech-talks and content.