Marius van Niekerk - Databricks

Marius van Niekerk

Engineer and Data Scientist, Maxpoint

I am an engineer / data scientist passionate about building tools to make the lives of analysts and data scientists easier.I am an active contributor to open source in the big data space, notably Apache Toree (incubating).


Apache Toree: A Jupyter Kernel for Spark

Many data scientists are already making heavy usage of the Jupyter ecosystem for analyzing data using interactive notebooks. Apache Toree (incubating) is a Jupyter kernel designed to act as a gateway to Spark by enabling users Spark from standard Jupyter notebooks. This allows users to easily integrate Spark into their existing Jupyter deployments, This allows users to easily move between languages and contexts without needing to switch to a different set of tools. Apache Toree is designed expressly for interactive work. It supports interpreters in Scala, Python, and R. In this talk, I will cover the design of Toree, how it interacts with the Jupyter ecosystem and various ways in which users can extend the functionality of Apache Toree via a powerful plugin system.