Two months ago we held a live webinar – Enabling Exploratory Analysis of Large Data with Apache Spark and R – to demonstrate one of the most important use cases of SparkR: the exploratory analysis of very large data. The webinar shows how Spark’s features and capabilities, such as caching distributed data and integrated SQL execution, complement R’s great tools such as visualization and diverse packages in a real world data analysis project with big data.
We have answered the common questions raised by webinar viewers below. If you have additional questions, please check out the Databricks Forum.
Common webinar questions and answers
Click on the question to see answer:
- R can reside on my local PC, and collect from Spark in AWS, correct?
- Is it possible to work with RDD in R? For example, I want to do the reduceByKey on a huge dataset.
- Any new MLlib functions being exposed to SparkR in 2.0? or just kmeans (already have glm).
- How do I access the temp table created in R via Scala/Spark-Shell? I mean how do I get same Spark Context in R and Scala?
- The attached notebook references the ETL notebook for instructions on how to get the songsTable. Is there a link somewhere to the ETL notebook?