Andreas Neumann - Databricks

Andreas Neumann

Software Engineer, Databricks

Andreas Neumann is a software engineer at Databricks, where he focuses on Structured Streaming and Delta Lake. He has previously built big data systems at Google, Cask Data, Yahoo! and IBM. Andreas holds a PhD in computer science from the University of Trier, Germany.

UPCOMING SESSIONS

PAST SESSIONS

Building Reliable Data Lakes at Scale with Delta LakeSummit Europe 2019

Most data practitioners grapple with data reliability issues—it's the bane of their existence. Data engineers, in particular, strive to design, deploy, and serve reliable data in a performant manner so that their organizations can make the most of their valuable corporate data assets.

Delta Lake is an open-source storage layer that brings ACID transactions to Apache Spark™ and big data workloads. Built on open standards, Delta Lake employs co-designed compute and storage and is compatible with Spark API’s. It powers high data reliability and query performance to support big data use cases, from batch and streaming ingests, fast interactive queries to machine learning. In this tutorial we will discuss the requirements of modern data engineering, the challenges data engineers face when it comes to data reliability and performance and how Delta Lake can help. Through presentation, code examples and notebooks, we will explain these challenges and the use of Delta Lake to address them. You will walk away with an understanding of how you can apply this innovation to your data architecture and the benefits you can gain.

This tutorial will be both instructor-led and hands-on interactive session. Instructions on how to get tutorial materials will be covered in class.

What you’ll learn:

  • Understand the key data reliability challenges
  • How Delta Lake brings reliability to data lakes at scale
  • Understand how Delta Lake fits within an Apache Spark™ environment
  • How to use Delta Lake to realize data reliability improvements

Prerequisites

  • A fully-charged laptop (8-16GB memory) with Chrome or Firefox
  • Pre-register for Databricks Community Edition

Building Reliable Data Lakes at Scale with Delta LakeSummit Europe 2019

Most data practitioners grapple with data reliability issues—it's the bane of their existence. Data engineers, in particular, strive to design, deploy, and serve reliable data in a performant manner so that their organizations can make the most of their valuable corporate data assets.

Delta Lake is an open-source storage layer that brings ACID transactions to Apache Spark™ and big data workloads. Built on open standards, Delta Lake employs co-designed compute and storage and is compatible with Spark API’s. It powers high data reliability and query performance to support big data use cases, from batch and streaming ingests, fast interactive queries to machine learning. In this tutorial we will discuss the requirements of modern data engineering, the challenges data engineers face when it comes to data reliability and performance and how Delta Lake can help. Through presentation, code examples and notebooks, we will explain these challenges and the use of Delta Lake to address them. You will walk away with an understanding of how you can apply this innovation to your data architecture and the benefits you can gain.

This tutorial will be both instructor-led and hands-on interactive session. Instructions on how to get tutorial materials will be covered in class.

What you’ll learn:

  • Understand the key data reliability challenges
  • How Delta Lake brings reliability to data lakes at scale
  • Understand how Delta Lake fits within an Apache Spark™ environment
  • How to use Delta Lake to realize data reliability improvements

Prerequisites

  • A fully-charged laptop (8-16GB memory) with Chrome or Firefox
  • Pre-register for Databricks Community Edition

Building Reliable Data Lakes at Scale with Delta LakeSummit Europe 2019

Most data practitioners grapple with data reliability issues—it's the bane of their existence. Data engineers, in particular, strive to design, deploy, and serve reliable data in a performant manner so that their organizations can make the most of their valuable corporate data assets.

Delta Lake is an open-source storage layer that brings ACID transactions to Apache Spark™ and big data workloads. Built on open standards, Delta Lake employs co-designed compute and storage and is compatible with Spark API’s. It powers high data reliability and query performance to support big data use cases, from batch and streaming ingests, fast interactive queries to machine learning. In this tutorial we will discuss the requirements of modern data engineering, the challenges data engineers face when it comes to data reliability and performance and how Delta Lake can help. Through presentation, code examples and notebooks, we will explain these challenges and the use of Delta Lake to address them. You will walk away with an understanding of how you can apply this innovation to your data architecture and the benefits you can gain.

This tutorial will be both instructor-led and hands-on interactive session. Instructions on how to get tutorial materials will be covered in class.

What you’ll learn:

  • Understand the key data reliability challenges
  • How Delta Lake brings reliability to data lakes at scale
  • Understand how Delta Lake fits within an Apache Spark™ environment
  • How to use Delta Lake to realize data reliability improvements

Prerequisites

  • A fully-charged laptop (8-16GB memory) with Chrome or Firefox
  • Pre-register for Databricks Community Edition

Building Reliable Data Lakes at Scale with Delta LakeSummit Europe 2019

Most data practitioners grapple with data reliability issues—it's the bane of their existence. Data engineers, in particular, strive to design, deploy, and serve reliable data in a performant manner so that their organizations can make the most of their valuable corporate data assets.

Delta Lake is an open-source storage layer that brings ACID transactions to Apache Spark™ and big data workloads. Built on open standards, Delta Lake employs co-designed compute and storage and is compatible with Spark API’s. It powers high data reliability and query performance to support big data use cases, from batch and streaming ingests, fast interactive queries to machine learning. In this tutorial we will discuss the requirements of modern data engineering, the challenges data engineers face when it comes to data reliability and performance and how Delta Lake can help. Through presentation, code examples and notebooks, we will explain these challenges and the use of Delta Lake to address them. You will walk away with an understanding of how you can apply this innovation to your data architecture and the benefits you can gain.

This tutorial will be both instructor-led and hands-on interactive session. Instructions on how to get tutorial materials will be covered in class.

What you’ll learn:

  • Understand the key data reliability challenges
  • How Delta Lake brings reliability to data lakes at scale
  • Understand how Delta Lake fits within an Apache Spark™ environment
  • How to use Delta Lake to realize data reliability improvements

Prerequisites

  • A fully-charged laptop (8-16GB memory) with Chrome or Firefox
  • Pre-register for Databricks Community Edition