Gaël Varoquaux is an Inria faculty researcher working on data science and brain imaging. He has a joint position at Inria (French Computer Science National research) and in the Neurospin brain research institute. His research focuses on using data and machine learning for scientific inference, applying it to brain-imaging data to understand cognition, as well as developing tools that make it easier for non-specialists to use machine learning. Years before the NSA, he was hoping to make bleeding-edge data processing available across new fields, and he has been working on a mastermind plan building easy-to-use open-source software in Python. He is a core developer of scikit-learn, joblib, Mayavi and nilearn, a nominated member of the PSF, and often teaches scientific computing with Python using the scipy lecture notes.
Once an obscure branch of applied mathematics, machine learning is now the darling of tech. I will talk about lessons learned democratizing machine learning. How libraries like scikit-learn were designed to empower users: simplifying but avoiding ambiguous behaviors. How the Python data ecosystem was built from scientific computing tools: the importance of good numerics. How some machine-learning patterns easily provide value to real-world situations. I will also discuss remain challenges to address and the progresses that we are making. Scaling up brings different bottlenecks to numerics. Integrating data in the statistical models, a hurdle to data-science practice requires to rethink data cleaning pipelines.
This talk will drawn from my experience as a scikit-learn developer, but also as a researcher in machine learning and applications.