Spark + AI Summit 2020: Thursday Morning Keynotes

Clemens Mewal – Next Generation Data Science Workspace (Databricks) – 9:06
Lauren Richie – DEMO: Next Generation Data Science Workspace (Databricks) – 17:55
Matei Zaharia – MLflow Community and Product Updates (Databricks) – 27:40
Sue Ann Hong – DEMO: MLflow (Databricks) – 42:57
Rohan Kumar – Responsible ML (Microsoft) – 51:52
Sarah Bird – DEMO: Responsible ML (Microsoft) – 1:00:21
Anurag Sehgal – Data and AI (Credit Suisse) – 1:12:58


Introducing the Next Generation Data Science Workspace
Ali Ghodsi, Clemens Mewald and Lauren Richie

It is no longer a secret that data driven insights and decision making are essential in any company’s strategy to keep up with today’s rapid pace of change and remain relevant. Although we take this realization for granted, we are still in the very early stage of enabling data teams to deliver on their promise. One of the reasons is that we haven’t equipped this profession with the modern toolkit they deserve.

Existing solutions leave data teams with impossible trade-offs. Giving Data Scientists the freedom to use any open source tools on their laptops doesn’t provide a clear path to production and governance. Simply hosting those same tools in the Cloud may solve some of the data privacy and security issues, but doesn’t improve productivity nor collaboration. On the other hand, most robust and scalable production environments hinder innovation and experimentation by slowing Data Scientists down.

In this talk, we will unveil the next generation of the Databricks Data Science Workspace: An open and unified experience for modern data teams specifically designed to address these hard tradeoffs. We will introduce new features that leverage the open source tools you are familiar with to give you a laptop-like experience that provides the flexibility to experiment and the robustness to create reliable and reproducible production solutions.


Simplifying Model Development and Management with MLflow
Matei Zaharia and Sue Ann Hong

As organizations continue to develop their machine learning (ML) practice, the need for robust and reliable platforms capable of handling the entire ML lifecycle is becoming crucial for successful outcomes. Building models is difficult enough to do once, but deploying them into production in a reproducible, agile, and predictable way is exponentially harder due to the dependencies on parameters, environments, and the ever changing nature of data and business needs.

Introduced by Databricks in 2018, MLflow is the most widely used open source platform for managing the full ML lifecycle. With over 2 million PyPI downloads a month and over 200 contributors, the growing support from the developer community demonstrates the need for an open source approach to standardize tools, processes, and frameworks involved throughout the ML lifecycle. MLflow significantly simplifies the complex process of standardizing MLOps and productionizing ML models. In this talk, we’ll cover what’s new in MLflow, including simplified experiment tracking, new innovations to the model format to improve portability, new features to manage and compare model schemas, and new capabilities for deploying models faster.


Responsible ML – Bringing Accountability to Data Science
Rohan Kumar and Sarah Bird

Responsible ML is the most talked about field in AI at the moment. With the growing importance of ML, it is even more important for us to exercise ethical AI practices and ensure that the models we create live up to the highest standards of inclusiveness and transparency. Join Rohan Kumar, as he talks about how Microsoft brings cutting-edge research into the hands of customers to make them more accountable for their models and responsible in their use of AI. For the AI community, this is an open invitation to collaborate and contribute to shape the future of Responsible ML.


How Credit Suisse Is Leveraging Open Source Data and AI Platforms to Drive Digital Transformation, Innovation and Growth
Anurag Sehgal

Despite the increasing embrace of big data and AI, most financial services companies still experience significant challenges around data types, privacy, and scale. Credit Suisse is overcoming these obstacles by standardizing on open, cloud-based platforms, including Azure Databricks, to increase the speed and scale of operations, and the democratization of ML across the organization. Now, Credit Suisse is leading the way by successfully employing data and analytics to drive digital transformation, delivering new products to market faster, and driving business growth and operational efficiency.


 
Try Databricks
« back
About Ali Ghodsi

Databricks

Ali Ghodsi is the CEO and co-founder of Databricks, responsible for the growth and international expansion of the company. He previously served as the VP of Engineering and Product Management before taking the role of CEO in January 2016. In addition to his work at Databricks, Ali serves as an adjunct professor at UC Berkeley and is on the board at UC Berkeley’s RiseLab. Ali was one of the original creators of open source project, Apache Spark, and ideas from his academic research in the areas of resource management and scheduling and data caching have been applied to Apache Mesos and Apache Hadoop. Ali received his MBA from Mid-Sweden University in 2003 and PhD from KTH/Royal Institute of Technology in Sweden in 2006 in the area of Distributed Computing.

About Clemens Mewald

Databricks

Clemens Mewald is the director of product management, machine learning and data science at Databricks, where he leads the product team. Previously, he spent four years on the Google Brain team building ML infrastructure for Google, Google Cloud, and open source users, including TensorFlow and TensorFlow Extended (TFX). Clemens holds an MSc in computer science from UAS Wiener Neustadt, Austria, and an MBA from MIT Sloan.

About Lauren Richie

Databricks

Lauren Richie is a software engineer on the workspace team at Databricks, where she works to enable interactive data science and engineering workflows to collaborate and scale. Before Databricks, she worked in wildlife conservation and holds a Master of Environmental Management from the Yale School of Forestry and Environmental Studies.

About Matei Zaharia

Databricks

Matei Zaharia is an Assistant Professor of Computer Science at Stanford University and Chief Technologist at Databricks. He started the Apache Spark project during his PhD at UC Berkeley in 2009, and has worked broadly in datacenter systems, co-starting the Apache Mesos project and contributing as a committer on Apache Hadoop. Today, Matei tech-leads the MLflow development effort at Databricks in addition to other aspects of the platform. Matei’s research work was recognized through the 2014 ACM Doctoral Dissertation Award for the best PhD dissertation in computer science, an NSF CAREER Award, and the US Presidential Early Career Award for Scientists and Engineers (PECASE).

About Sue Ann Hong

Databricks

Sue Ann is a software engineer on the machine learning team at Databricks. Before Databricks, she worked at Facebook on Ads Targeting and Commerce. Sue Ann holds a PhD in computer science, specializing in machine learning from Carnegie Mellon University.

About Rohan Kumar

Microsoft

As the Corporate Vice President of Azure Data, Rohan is the engineering leader responsible for the product strategy, technical vision, long range planning, design, development/implementation, and engineering process involving the certification and release of SQL Server and all Azure Data Services, including SQL DB, Cosmos DB, Database for MySQL, Database for PostgreSQL, Database for Maria DB, SQL Data Warehouse, Azure Databricks, Azure Data Lake, HDInsight, Azure Stream Analytics, Azure Data Factory, Azure Data Catalog and Microsoft’s Analytics Platform System (APS).

As part of his charter, Rohan is focused on delivering core data platform services for Microsoft that allow IT professionals, DBAs, Data Scientists, Data Engineers and Developers to successfully develop, deploy, and manage data applications across Azure Data Services and SQL Server workloads.

Rohan joined Microsoft in July 1998 as a software development engineer in the core Windows file systems and storage team. He contributed to Windows XP and Windows Server 2003 before moving to the SQL Server team in July 2003. He has held various levels of engineering leadership roles in SQL Server since then and has contributed to SQL Server 2005, SQL Server 2008, SQL Server 2008 R2, SQL Server 2012, SQL Server 2014, SQL Azure, HD Insight Service and APS.

Rohan graduated with a Bachelor of Technology degree in Computer Science and Engineering from Indian Institute of Technology, BHU and a Master of Science degree in Computer Science from University of Massachusetts at Amherst.

About Sarah Bird

Microsoft

Sarah leads research and emerging technology strategy for Azure AI. Sarah works to accelerate the adoption and impact of AI by bringing together the latest innovations research with the best of open source and product expertise to create new tools and technologies. Sarah is currently leading the development of responsible AI tools in Azure Machine Learning. She is also an active member of the Microsoft AETHER committee, where she works to develop and drive company-wide adoption of responsible AI principles, best practices, and technologies. Sarah was one of the founding researchers in the Microsoft FATE research group and prior to joining Microsoft worked on AI fairness in Facebook.

Sarah is active contributor to the open source ecosystem, she co-founded ONNX, Fairlearn, and WhiteNoise and was a leader in the Pytorch 1.0 and InterpretML projects. She was an early member of the machine learning systems research community and has been active in growing and forming the community. She co-founded the MLSys research conference and the Learning Systems workshops. She has a Ph.D. in computer science from UC Berkeley advised by Dave Patterson, Krste Asanovic, and Burton Smith.

About Anurag Sehgal

Credit Suisse

In his role as Head of Data Analytics, AI & Digital Innovation at Credit Suisse - Global Markets division, Anurag is responsible for driving Data, AI & Digital opportunities to transform, enhance and create new data driven businesses and enable business growth, operational efficiency and Client focus. Anurag has worked for Credit Suisse for over 15 years on a diverse set of roles across Risk, Finance, Banking & Capital Markets, Client, Sales, Trading & Research and has a deep understanding of front to back business processes and Technology.