Sue Ann is a software engineer on the machine learning team at Databricks. Before Databricks, she worked at Facebook on Ads Targeting and Commerce. Sue Ann holds a PhD in computer science, specializing in machine learning from Carnegie Mellon University.
Clemens Mewal - Next Generation Data Science Workspace (Databricks) - 9:06 Lauren Richie - DEMO: Next Generation Data Science Workspace (Databricks) - 17:55 Matei Zaharia - MLflow Community and Product Updates (Databricks) - 27:40 Sue Ann Hong - DEMO: MLflow (Databricks) - 42:57 Rohan Kumar - Responsible ML (Microsoft) - 51:52 Sarah Bird - DEMO: Responsible ML (Microsoft) - 1:00:21 Anurag Sehgal - Data and AI (Credit Suisse) - 1:12:58
Introducing the Next Generation Data Science Workspace
Ali Ghodsi, Clemens Mewald and Lauren Richie
It is no longer a secret that data driven insights and decision making are essential in any company’s strategy to keep up with today’s rapid pace of change and remain relevant. Although we take this realization for granted, we are still in the very early stage of enabling data teams to deliver on their promise. One of the reasons is that we haven’t equipped this profession with the modern toolkit they deserve.
Existing solutions leave data teams with impossible trade-offs. Giving Data Scientists the freedom to use any open source tools on their laptops doesn’t provide a clear path to production and governance. Simply hosting those same tools in the Cloud may solve some of the data privacy and security issues, but doesn’t improve productivity nor collaboration. On the other hand, most robust and scalable production environments hinder innovation and experimentation by slowing Data Scientists down.
In this talk, we will unveil the next generation of the Databricks Data Science Workspace: An open and unified experience for modern data teams specifically designed to address these hard tradeoffs. We will introduce new features that leverage the open source tools you are familiar with to give you a laptop-like experience that provides the flexibility to experiment and the robustness to create reliable and reproducible production solutions.
Simplifying Model Development and Management with MLflow
Matei Zaharia and Sue Ann Hong
As organizations continue to develop their machine learning (ML) practice, the need for robust and reliable platforms capable of handling the entire ML lifecycle is becoming crucial for successful outcomes. Building models is difficult enough to do once, but deploying them into production in a reproducible, agile, and predictable way is exponentially harder due to the dependencies on parameters, environments, and the ever changing nature of data and business needs.
Introduced by Databricks in 2018, MLflow is the most widely used open source platform for managing the full ML lifecycle. With over 2 million PyPI downloads a month and over 200 contributors, the growing support from the developer community demonstrates the need for an open source approach to standardize tools, processes, and frameworks involved throughout the ML lifecycle. MLflow significantly simplifies the complex process of standardizing MLOps and productionizing ML models. In this talk, we’ll cover what’s new in MLflow, including simplified experiment tracking, new innovations to the model format to improve portability, new features to manage and compare model schemas, and new capabilities for deploying models faster.
Responsible ML - Bringing Accountability to Data Science
Rohan Kumar and Sarah Bird
Responsible ML is the most talked about field in AI at the moment. With the growing importance of ML, it is even more important for us to exercise ethical AI practices and ensure that the models we create live up to the highest standards of inclusiveness and transparency. Join Rohan Kumar, as he talks about how Microsoft brings cutting-edge research into the hands of customers to make them more accountable for their models and responsible in their use of AI. For the AI community, this is an open invitation to collaborate and contribute to shape the future of Responsible ML.
How Credit Suisse Is Leveraging Open Source Data and AI Platforms to Drive Digital Transformation, Innovation and Growth
Despite the increasing embrace of big data and AI, most financial services companies still experience significant challenges around data types, privacy, and scale. Credit Suisse is overcoming these obstacles by standardizing on open, cloud-based platforms, including Azure Databricks, to increase the speed and scale of operations, and the democratization of ML across the organization. Now, Credit Suisse is leading the way by successfully employing data and analytics to drive digital transformation, delivering new products to market faster, and driving business growth and operational efficiency.
2017 continues to be an exciting year for Apache Spark. I will talk about new updates in two major areas in the Spark community this year: stream processing with Structured Streaming, and deep learning with high-level libraries such as Deep Learning Pipelines and TensorFlowOnSpark. In both areas, the community is making powerful new functionality available in the same high-level APIs used in the rest of the Spark ecosystem (e.g., DataFrames and ML Pipelines), and improving both the scalability and ease of use of stream processing and machine learning.