Jules Damji - Databricks

Jules Damji

Developer Advocate, Databricks

Jules S. Damji is an Apache Spark Community and Developer Advocate at Databricks. He is a hands-on developer with over 15 years of experience and has worked at leading companies, such as Sun Microsystems, Netscape, @Home, LoudCloud/Opsware, VeriSign, ProQuest, and Hortonworks, building large-scale distributed systems. He holds a B.Sc and M.Sc in Computer Science and MA in Political Advocacy and Communication from Oregon State University, Cal State, and Johns Hopkins University respectively.


A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, and Deep Learning PipelinesSummit 2018

We all know what they say - the bigger the data, the better. But when the data gets really big, how do you use it? This talk will cover three of the most popular deep learning frameworks: TensorFlow, Keras, and Deep Learning Pipelines, and when, where, and how to use them. We'll also discuss their integration with distributed computing engines such as Apache Spark (which can handle massive amounts of data), as well as help you answer questions such as:

  • As a developer how do I pick the right deep learning framework for me?
  • Do I want to develop my own model or should I employ an existing one?
  • How do I strike a trade-off between productivity and control through low-level APIs?
In this session, we will show you how easy it is to build an image classifier with Tensorflow, Keras, and Deep Learning Pipelines in under 30 minutes. After this session, you will walk away with the confidence to evaluate which framework is best for you, and perhaps with a better sense for how to fool an image classifier! Session hashtag: #DL4SAIS


A Tale of Three Apache Spark APIs: RDDs, DataFrames, and DatasetsSummit Europe 2017

Of all the developers' delight, none is more attractive than a set of APIs that make developers productive, that are easy to use, and that are intuitive and expressive. Apache Spark offers these APIs across components such as Spark SQL, Streaming, Machine Learning, and Graph Processing to operate on large data sets in languages such as Scala, Java, Python, and R for doing distributed big data processing at scale. In this talk, I will explore the evolution of three sets of APIs-RDDs, DataFrames, and Datasets-available in Apache Spark 2.x. In particular, I will emphasize three takeaways: 1) why and when you should use each set as best practices 2) outline its performance and optimization benefits; and 3) underscore scenarios when to use DataFrames and Datasets instead of RDDs for your big data distributed processing. Through simple notebook demonstrations with API code examples, you'll learn how to process big data using RDDs, DataFrames, and Datasets and interoperate among them. (this will be vocalization of the blog, along with the latest developments in Apache Spark 2.x Dataframe/Datasets and Spark SQL APIs: https://databricks.com/blog/2016/07/14/a-tale-of-three-apache-spark-apis-rdds-dataframes-and-datasets.html) Session hashtag: #EUdev12

BoF Discussion-Apache Spark Meetup OrganizersSummit Europe 2017

Today, we have 625 and 430K spark meetups and members respectively around the globe. How can we work, share, collaborate, and promote speakers and sessions?This BoF is for anyone who's Spark Meetup Orangizer, attendee, speaker, or anyone interested to share ideas for better sharing and collaborating on tech-talks and content.