Scalable Data Science with SparkR - Databricks

Scalable Data Science with SparkR

Download Slides

R is a very popular platform for Data Science. Apache Spark is a highly scalable data platform. How could we have the best of both worlds? How could a Data Scientist leverage the rich 9000+ packages on CRAN, and integrate Spark into their existing Data Science toolset?In this talk we will walkthrough many examples how several new features in Apache Spark 2.x will enable this. We will also look at exciting changes in and coming next in Apache Spark 2.x releases.

About Felix Cheung

Felix started in the big data space about 5 years ago with the then state-of-the-art MapReduce. Since then, he (re-)built Hadoop cluster from metal more times than he would like, created a Hadoop "distro" from two dozens or so projects into .rpm/.deb, and kicked off clusters in the cloud with hundreds of cores on-demand. He built a few interesting app with Apache Spark for 3.5 years and ended up contributing to it for more than 3 years, and became a Committer & PMC along the way. In addition to building stuff, he frequently presented in conferences, meetups, or workshops.