This is a guest post from Neil Dewar, a senior data science manager at a global asset management firm. In this blog, Neil shares lessons learned using R and Apache Spark.
If you know how to use R and are learning Apache Spark, then this blog post and notebook contain key tips to smooth out your road ahead.
Try this notebook in Databricks
As the notebook explains:
I’m an R user. Certainly not an object oriented programmer, and no experience of distributed computing. As my team starts to explore options for distributed processing of big data, I took the task to evaluate SparkR.
After much exploration, I eventually figured out that what's missing is the contextual advice for people who already know R, to help them understand what's different about SparkR and how to adapt your thinking to make best use of it. That's the purpose of this blog and notebook -- to document the "aha!" moments in a journey from R to SparkR. I hope my hard-earned discovery helps you get there faster!
The notebook lists 10 key pieces of knowledge, with code snippets and explanations, tailored for R users. Here is the list in brief; check out the notebook to learn more!
[btn href="https://databricks-prod-cloudfront.cloud.databricks.com/public/4027ec902e239c93eaaa8714f173bcfc/8599738367597028/1792412399382575/3601578643761083/latest.html?utm_campaign=Open%20Source&utm_source=Databricks%20Blog" target="_blank"]View this Notebook[/btn]