Brooke Wenig - Databricks

Brooke Wenig

Principal Consultant, Databricks

Brooke Wenig is the Machine Learning Practice Lead at Databricks. She advises and implements machine learning pipelines for customers, as well as educates them on how to use Spark for Machine Learning and Deep Learning. She received an MS in Computer Science from UCLA with a focus on distributed machine learning. She speaks Mandarin Chinese fluently and enjoys cycling.

UPCOMING SESSIONS

PAST SESSIONS

New Developments in the Open Source Ecosystem: Apache Spark 3.0, Delta Lake, and KoalasSummit Europe 2019

In this talk, we will highlight major efforts happening in the Spark ecosystem. In particular, we will dive into the details of adaptive and static query optimizations in Spark 3.0 to make Spark easier to use and faster to run. We will also demonstrate how new features in Koalas, an open source library that provides Pandas-like API on top of Spark, helps data scientists gain insights from their data quicker.

Koalas: Pandas on Apache SparkSummit Europe 2019

In this tutorial we will present Koalas, a new open source project that we announced at the Spark + AI Summit in April. Koalas is an open-source Python package that implements the pandas API on top of Apache Spark, to make the pandas API scalable to big data. Using Koalas, data scientists can make the transition from a single machine to a distributed environment without needing to learn a new framework.

We will demonstrate Koalas' new functionalities since its initial release, discuss its roadmaps, and how we think Koalas could become the standard API for large scale data science.

What you will learn:

  • How to get started with Koalas
  • Easy transition from Pandas to Koalas on Apache Spark
  • Similarities between Pandas and Koalas APIs for DataFrame transformation and feature engineering
  • Single machine Pandas vs distributed environment of Koalas

Prerequisites:

  • A fully-charged laptop (8-16GB memory) with Chrome or Firefox
  • Python 3 and pip pre-installed
  • pip install koalas from PyPI
  • Pre-register for Databricks Community Edition
  • Read koalas docs

Koalas: Pandas on Apache Spark (continued)Summit Europe 2019

In this tutorial we will present Koalas, a new open source project that we announced at the Spark + AI Summit in April. Koalas is an open-source Python package that implements the pandas API on top of Apache Spark, to make the pandas API scalable to big data. Using Koalas, data scientists can make the transition from a single machine to a distributed environment without needing to learn a new framework.

We will demonstrate Koalas' new functionalities since its initial release, discuss its roadmaps, and how we think Koalas could become the standard API for large scale data science.

What you will learn:

  • How to get started with Koalas
  • Easy transition from Pandas to Koalas on Apache Spark
  • Similarities between Pandas and Koalas APIs for DataFrame transformation and feature engineering
  • Single machine Pandas vs distributed environment of Koalas

Prerequisites:

  • A fully-charged laptop (8-16GB memory) with Chrome or Firefox
  • Python 3 and pip pre-installed
  • pip install koalas from PyPI
  • Read koalas docs

Official Announcement of Koalas Open Source ProjectSummit 2019

Keynote from Spark + AI Summit 2019: Reynold Xin, Databricks, Brooke Wenig, Databricks

The Pursuit of Happiness: Building a Scalable Pipeline Using Apache Spark and NLP to Measure Customer Service QualitySummit 2019

How do we get better than good enough? Leveraging NLP techniques, we can determine the general sentiment of a sentence, phrase, or a paragraph of text. We can mine the world of social data to get a sense of what is being said. But, how do you get control of the factors that create happiness? How do you become proactive in making end-users happy? Chatbots, human chats, and conversations are the means we are using to express our ideas to each other. NLP is great for helping us process and understand this data but can fall short.

In our session, we will explore how to expand NLP/sentiment analysis to investigate the intense interactions that can occur between humans and humans or humans and robots. We will show how to pinpoint the things that work to improve quality and how to use those data points to measure the effectiveness of chatbots. Learn how we have applied popular NLP frameworks such as NLTK, Stanford CoreNLP and John Snow Labs NLP to financial customer service data. Explore techniques to analyze conversations for actionable insights. Leave with an understanding of how to influence your customers' happiness.

A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, & PyTorchSummit Europe 2018

We all know what they say – the bigger the data, the better. But when the data gets really big, how do you mine it and what deep learning framework to use? This talk will survey, with a developer’s perspective, three of the most popular deep learning frameworks—TensorFlow, Keras, and PyTorch—as well as when to use their distributed implementations.

We’ll compare code samples from each framework and discuss their integration with distributed computing engines such as Apache Spark (which can handle massive amounts of data) as well as help you answer questions such as:

  • As a developer how do I pick the right deep learning framework?

  • Do I want to develop my own model or should I employ an existing one?

  • How do I strike a trade-off between productivity and control through low-level APIs?

  • What language should I choose?

In this session, we will explore how to build a deep learning application with Tensorflow, Keras, or PyTorch in under 30 minutes. After this session, you will walk away with the confidence to evaluate which framework is best for you.

Session hashtag: #SAISDL3

 

A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, and Deep Learning PipelinesSummit 2018

We all know what they say - the bigger the data, the better. But when the data gets really big, how do you use it? This talk will cover three of the most popular deep learning frameworks: TensorFlow, Keras, and Deep Learning Pipelines, and when, where, and how to use them. We'll also discuss their integration with distributed computing engines such as Apache Spark (which can handle massive amounts of data), as well as help you answer questions such as: - As a developer how do I pick the right deep learning framework for me? - Do I want to develop my own model or should I employ an existing one - How do I strike a trade-off between productivity and control through low-level APIs? In this session, we will show you how easy it is to build an image classifier with Tensorflow, Keras, and Deep Learning Pipelines in under 30 minutes. After this session, you will walk away with the confidence to evaluate which framework is best for you, and perhaps with a better sense for how to fool an image classifier! Session hashtag: #DL4SAIS