Javier is the author of “Mastering Spark with R”, sparklyr, mlflow and many other R packages for deep learning and data science. He holds a double degree in Math and Software Engineer and decades of industry experience with a focus on data analysis. He currently works in RStudio and previously in Microsoft Research and SAP.
We provide an update on developments in the intersection of the R and the broader machine learning ecosystems. These collections of packages enable R users to leverage the latest technologies for big data analytics and deep learning in their existing workflows, and also facilitate collaboration within multidisciplinary data science teams. Topics covered include - MLflow: managing the ML lifecycle with improved dependency management and more deployment targets - TensorFlow: TF 2.0 update and probabilistic (deep) machine learning with TensorFlow Probability - Spark: latest improvements and extensions, including text processing at scale with SparkNLP
In this talk you will learn how to easily configure Apache Arrow with R on Apache Spark, which will allow you to gain speed improvements and expand the scope of your data science workflows; for instance, by enabling data to be efficiently transferred between your local environment and Apache Spark. This talk will present use cases for running R at scale on Apache Spark. It will also introduce the Apache Arrow project and recent developments that enable running R with Apache Arrow on Apache Spark to significantly improve performance and efficiency. We will end this talk by discussing performance and recent development in this space.
This session will start with a recap of what sparklyr is, and how it can be used to analyze, visualize and perform machine learning in Spark from R. We will walk through installation, configuration, data wrangling with SQL or dplyr, modeling in MLlib or H2O, and extending sparklyr by calling Scala functions from R or writing Scala modules accessible from R. You'll then get a detailed update on new sparklyr features. After sparklyr 0.4 was released to CRAN last year, RStudio released 0.5, which implements new connections, features and architecture changes worth reviewing. We will wrap up with a discussion of uses cases relevant in the R ecosystem. The uses cases will demonstrate how to model data using popular frameworks in the R ecosystem that in seamless interactions between Spark and R using sparklyr. Session hashtag: #SFdd8Learn more: