Scalable Data Science with SparkR

Download Slides

R is a very popular platform for Data Science. Apache Spark is a highly scalable data platform. How could we have the best of both worlds? How could a Data Scientist leverage the rich 9000+ packages on CRAN, and integrate Spark into their existing Data Science toolset?In this talk we will walkthrough many examples how several new features in Apache Spark 2.x will enable this. We will also look at exciting changes in and coming next in Apache Spark 2.x releases.

About Felix Cheung

Felix is the VP of Engineering at SafeGraph, bringing over 20 years of engineering and 7 years of data experience. He led teams in Uber's Data Platform and was pivotal in rebuilding their open-source program. Previously he spent time at Microsoft and startups. Felix is a strong proponent of open-source; as a Member of the Apache Software Foundation, he works on Apache Spark (data), Apache Zeppelin (notebook), and also helps mentor 6 projects in the Apache Incubator, including geospatial project Apache Sedona, and leading Apache Superset (visualization) to graduate.