Neil Conway is co-founder and CTO of Determined AI, a startup that builds software to dramatically accelerate deep learning model development. Neil was previously a technical lead at Mesosphere and a major developer of both Apache Mesos and PostgreSQL. Neil holds a PhD in Computer Science from UC Berkeley, where he did research on large-scale data management, distributed systems, and programming languages.
Despite its enormous potential to enable new applications, deep learning remains prohibitively expensive, difficult, and time-consuming for the vast majority of companies. Training DL models at scale is particularly challenging: training a single model can take days or weeks, and DL engineers are often forced to spend much of their time doing DevOps or writing boilerplate code to handle routine tasks like data loading, distributed training, or fault tolerance.
In this talk, we introduce Determined, an open source platform that enables deep learning teams to train models more quickly, easily share GPU resources, and effectively collaborate. This talk will include an overview of the problems that Determined aims to solve, the high-level architecture of the system, and show how Determined and Spark can be used together effectively. We’ll also dive deep on some key technical features, such as: