Skip to main content
<
Page 2

Announcing RStudio and Databricks Integration

At Databricks, we are thrilled to announce the integration of RStudio with the Databricks Unified Analytics Platform. You can try it out now...

Accelerating R Workflows on Databricks

October 6, 2017 by Hossein Falaki in
At Databricks we strive to make our Unified Analytics Platform the best place to run big data analytics. For big data, Apache Spark...

On-Demand Webinar and FAQ: Parallelize R Code Using Apache Spark

August 21, 2017 by Hossein Falaki and Jules Damji in
On August 15th, Data Science Central hosted a live webinar—Parallelize R Code Using Apache Spark—with Databricks’ Hossein Falaki . This webinar introduced SparkR...

Shell Oil Use Case: Parallelizing Large Simulations with Apache SparkR on Databricks

This blog post is a joint engineering effort between Shell’s Data Science Team ( Wayne W. Jones and Dennis Vallinga ) and Databricks...

Using sparklyr in Databricks

May 25, 2017 by Hossein Falaki in
Try this notebook on Databricks with all instructions as explained in this post notebook In September 2016, RStudio announced sparklyr , a new...

SparkR Tutorial at useR 2016

AMPLab and Databricks gave a tutorial on SparkR at the useR conference. The conference was held from June 27 - June 30 at...

Approximate Algorithms in Apache Spark: HyperLogLog and Quantiles

Introduction Apache Spark is fast, but applications such as preliminary data exploration need to be even faster and are willing to sacrifice some...

Introducing R Notebooks in Databricks

July 13, 2015 by Hossein Falaki in
Apache Spark 1.4 was released on June 11 and one of the exciting new features was SparkR . I am happy to announce...

Statistics Functionality in Apache Spark 1.1

August 27, 2014 by Doris Xin, Burak Yavuz and Hossein Falaki in
One of our philosophies in Apache Spark is to provide rich and friendly built-in libraries so that users can easily assemble data pipelines. With Spark, and MLlib in particular, quickly gaining traction among data scientists and machine learning practitioners, we’re observing a growing demand for data analysis support outside of model fitting. To address this need, we have started to add scalable implementations of common statistical functions to facilitate v