Stepping beyond ETL in batches, large enterprises are looking at ways to generate more up-to-date insights. As we step into the age of Continuous Application, this session will explore the ever more popular Structure Streaming API in Apache Spark, its application to R, and building examples of machine learning use cases. Starting with an introduction to the high-level concepts, the session will dive into the core of the execution plan internals and examine how SparkR extends the existing system to add the streaming capability. Learn how to build various data science applications on data streams integrating with R packages to leverage the rich R ecosystem of 10k+ packages. Session hashtag: #SFdev2
Felix started in the big data space about 5 years ago with the then state-of-the-art MapReduce. Since then, he (re-)built Hadoop cluster from metal more times than he would like, created a Hadoop "distro" from two dozens or so projects into .rpm/.deb, and kicked off clusters in the cloud with hundreds of cores on-demand. He built a few interesting app with Apache Spark for 3.5 years and ended up contributing to it for more than 3 years, and became a Committer & PMC along the way. In addition to building stuff, he frequently presented in conferences, meetups, or workshops.