Niall Turbitt - Databricks

Niall Turbitt

Data Scientist, Databricks

Niall Turbitt is a Data Scientist on the Machine Learning Practice team at Databricks. Working with Databricks customers, he builds and deploys machine learning solutions, as well as delivers training classes focused on machine learning with Spark. He received his MS in Statistics from University College Dublin and has previous experience building scalable data science solutions across a range of domains, from e-commerce to supply chain and logistics.

UPCOMING SESSIONS

PAST SESSIONS

Koalas: Pandas on Apache SparkSummit Europe 2019

In this tutorial we will present Koalas, a new open source project that we announced at the Spark + AI Summit in April. Koalas is an open-source Python package that implements the pandas API on top of Apache Spark, to make the pandas API scalable to big data. Using Koalas, data scientists can make the transition from a single machine to a distributed environment without needing to learn a new framework.

We will demonstrate Koalas' new functionalities since its initial release, discuss its roadmaps, and how we think Koalas could become the standard API for large scale data science.

What you will learn:

  • How to get started with Koalas
  • Easy transition from Pandas to Koalas on Apache Spark
  • Similarities between Pandas and Koalas APIs for DataFrame transformation and feature engineering
  • Single machine Pandas vs distributed environment of Koalas

Prerequisites:

  • A fully-charged laptop (8-16GB memory) with Chrome or Firefox
  • Python 3 and pip pre-installed
  • pip install koalas from PyPI
  • Pre-register for Databricks Community Edition
  • Read koalas docs

Koalas: Pandas on Apache Spark (continued)Summit Europe 2019

In this tutorial we will present Koalas, a new open source project that we announced at the Spark + AI Summit in April. Koalas is an open-source Python package that implements the pandas API on top of Apache Spark, to make the pandas API scalable to big data. Using Koalas, data scientists can make the transition from a single machine to a distributed environment without needing to learn a new framework.

We will demonstrate Koalas' new functionalities since its initial release, discuss its roadmaps, and how we think Koalas could become the standard API for large scale data science.

What you will learn:

  • How to get started with Koalas
  • Easy transition from Pandas to Koalas on Apache Spark
  • Similarities between Pandas and Koalas APIs for DataFrame transformation and feature engineering
  • Single machine Pandas vs distributed environment of Koalas

Prerequisites:

  • A fully-charged laptop (8-16GB memory) with Chrome or Firefox
  • Python 3 and pip pre-installed
  • pip install koalas from PyPI
  • Read koalas docs