Neil Conway

Co-Founder and CTO, Determined AI

Neil Conway is co-founder and CTO of Determined AI, a startup that builds software to dramatically accelerate deep learning model development. Neil was previously a technical lead at Mesosphere and a major developer of both Apache Mesos and PostgreSQL. Neil holds a PhD in Computer Science from UC Berkeley, where he did research on large-scale data management, distributed systems, and programming languages.

UPCOMING SESSIONS

PAST SESSIONS

Deep Learning at Scale with Apache Spark and DeterminedSummit 2020

Despite its enormous potential to enable new applications, deep learning remains prohibitively expensive, difficult, and time-consuming for the vast majority of companies. Training DL models at scale is particularly challenging: training a single model can take days or weeks, and DL engineers are often forced to spend much of their time doing DevOps or writing boilerplate code to handle routine tasks like data loading, distributed training, or fault tolerance.

In this talk, we introduce Determined, an open source platform that enables deep learning teams to train models more quickly, easily share GPU resources, and effectively collaborate. This talk will include an overview of the problems that Determined aims to solve, the high-level architecture of the system, and show how Determined and Spark can be used together effectively. We’ll also dive deep on some key technical features, such as:

  • Distributed training without changing your model code
  • Intelligent hyperparameter search
  • Flexible GPU scheduling, including automatic management of cloud GPU instances
  • Automatic fault tolerance and checkpoint management
  • Seamless integration into the Spark ecosystem, e.g., for performing ETL or model inference.