In this talk you will learn how to easily configure Apache Arrow with R on Apache Spark, which will allow you to gain speed improvements and expand the scope of your data science workflows; for instance, by enabling data to be efficiently transferred between your local environment and Apache Spark. This talk will present use cases for running R at scale on Apache Spark.
It will also introduce the Apache Arrow project and recent developments that enable running R with Apache Arrow on Apache Spark to significantly improve performance and efficiency. We will end this talk by discussing performance and recent development in this space.
Javier is the author of "Mastering Spark with R", sparklyr, mlflow and many other R packages for deep learning and data science. He holds a double degree in Math and Software Engineer and decades of industry experience with a focus on data analysis. He currently works in RStudio and previously in Microsoft Research and SAP.