Niall Turbitt

Senior Data Scientist, Databricks

Niall Turbitt is a Senior Data Scientist on the Machine Learning Practice team at Databricks. Working with Databricks customers, he builds and deploys machine learning solutions, as well as delivers training classes focused on machine learning with Spark. He received his MS in Statistics from University College Dublin and has previous experience building scalable data science solutions across a range of domains, from e-commerce to supply chain and logistics.

Past sessions

Summit 2021 Drifting Away: Testing ML Models in Production

May 27, 2021 11:35 AM PT

Deploying machine learning models has become a relatively frictionless process. However, properly deploying a model with a robust testing and monitoring framework is a vastly more complex task. There is no one-size-fits-all solution when it comes to productionizing ML models, oftentimes requiring custom implementations utilising multiple libraries and tools. There are however, a set of core statistical tests and metrics one should have in place to detect phenomena such as data and concept drift to prevent models from becoming unknowingly stale and detrimental to the business.

 

Combining our experiences from working with Databricks customers, we do a deep dive on how to test your ML models in production using open source tools such as MLflow, SciPy and statsmodels. You will come away from this talk armed with knowledge of the key tenets for testing both model and data validity in production, along with a generalizable demo which uses MLflow to assist with the reproducibility of this process.

In this session watch:
Chengyin Eng, Data Science Consultant, Databricks
Niall Turbitt, Senior Data Scientist, Databricks

[daisna21-sessions-od]

Summit Europe 2020 Scaling Machine Learning with Apache Spark

November 17, 2020 04:00 PM PT

Spark has become synonymous with big data processing, however the majority of data scientists still build models using single machine libraries. This talk will explore the multitude of ways Spark can be used to scale machine learning applications. In particular, we will guide you through distributed solutions for training and inference, distributed hyperparameter search, deployment issues, and new features for Machine Learning in Apache Spark 3.0. Niall Turbitt and Holly Smith combine their years of experience working with Spark to summarize best practices for scaling ML solutions.

Speakers: Holly Smith and Niall Turbitt

Summit 2020 Koalas: Pandas on Apache Spark NA

June 25, 2020 05:00 PM PT

In this hands on tutorial we will present Koalas, a new open source project. Koalas is an open source Python package that implements the pandas API on top of Apache Spark, to make the pandas API scalable to big data. Using Koalas, data scientists can make the transition from a single machine to a distributed environment without needing to learn a new framework.

We will demonstrate Koalas’ new functionalities since its initial release, including Apache Spark 3.0, discuss its roadmaps, and how we think Koalas could become the standard API for large scale data science.

What you will learn:

  • How to get started with Koalas
  • Show easy transition from Pandas to Koalas on Apache Spark
  • Demonstrate similarities between Pandas and Koalas APIs for DataFrame transformation and feature engineering
  • Use it for single machine Pandas vs distributed environment of Koalas
Summit Europe 2019 Koalas: Pandas on Apache Spark EU

October 15, 2019 05:00 PM PT

In this tutorial we will present Koalas, a new open source project that we announced at the Spark + AI Summit in April. Koalas is an open-source Python package that implements the pandas API on top of Apache Spark, to make the pandas API scalable to big data. Using Koalas, data scientists can make the transition from a single machine to a distributed environment without needing to learn a new framework.

We will demonstrate Koalas' new functionalities since its initial release, discuss its roadmaps, and how we think Koalas could become the standard API for large scale data science.

What you will learn:

  • How to get started with Koalas
  • Easy transition from Pandas to Koalas on Apache Spark
  • Similarities between Pandas and Koalas APIs for DataFrame transformation and feature engineering
  • Single machine Pandas vs distributed environment of Koalas

Prerequisites:

  • A fully-charged laptop (8-16GB memory) with Chrome or Firefox
  • Python 3 and pip pre-installed
  • pip install koalas from PyPI
  • Pre-register for Databricks Community Edition
  • Read koalas docs
Summit Europe 2019 Koalas: Pandas on Apache Spark (continued)

October 15, 2019 05:00 PM PT

In this tutorial we will present Koalas, a new open source project that we announced at the Spark + AI Summit in April. Koalas is an open-source Python package that implements the pandas API on top of Apache Spark, to make the pandas API scalable to big data. Using Koalas, data scientists can make the transition from a single machine to a distributed environment without needing to learn a new framework.

We will demonstrate Koalas' new functionalities since its initial release, discuss its roadmaps, and how we think Koalas could become the standard API for large scale data science.

What you will learn:

  • How to get started with Koalas
  • Easy transition from Pandas to Koalas on Apache Spark
  • Similarities between Pandas and Koalas APIs for DataFrame transformation and feature engineering
  • Single machine Pandas vs distributed environment of Koalas

Prerequisites:

  • A fully-charged laptop (8-16GB memory) with Chrome or Firefox
  • Python 3 and pip pre-installed
  • pip install koalas from PyPI
  • Read koalas docs