by Brian Dirking, Hossein Falaki and Denny Lee
At Databricks, we are thrilled to announce the integration of RStudio with the Databricks Unified Analytics Platform. You can try it out now with this RMarkdown notebook (Rmd | HTML) or visit us at databricks.com/partners/rstudio.
For R practitioners looking at scaling out R-based advanced analytics to big data, Databricks provides a Unified Analytics Platform that gets up and running in seconds, integrates with RStudio to provide ease of use, and enables you to automatically run and execute R workloads at unprecedented scale across single or multiple nodes.
Integrating Databricks and RStudio together allows data scientists to address a number of challenges including:
Introducing Databricks RStudio Integration
With Databricks RStudio Integration, both popular R packages for interacting with Apache Spark, SparkR or sparklyr can be used the inside the RStudio IDE on Databricks. When multiple users use a cluster, each creates a separate SparkR Context or sparklyr connection, but they are all talking to a single Databricks managed Spark application allowing unique opportunities for collaboration between users. Together, RStudio can take advantage of Databricks’ cluster management and Apache Spark to perform such as a massive model selection as noted in the figure below.
You can run this demo on your own using this k-nearest neighbors (KNN) RMarkdown regression demo (Rmd | HTML).
Next Steps
Our goal is to make R-based analytics easier to use and more scalable with RStudio and Databricks. To dive deeper into the RStudio integration architecture, technical details on how users can access RStudio on Databricks clusters, and examples of the power of distributed computing and the interactivity of RStudio - and to get started today, visit databricks.com/partners/rstudio.