Rohan Kumar

Corporate Vice President, Azure Data, Microsoft

As the Corporate Vice President of Azure Data, Rohan is the engineering leader responsible for the product strategy, technical vision, long range planning, design, development/implementation, and engineering process involving the certification and release of SQL Server and all Azure Data Services, including SQL DB, Cosmos DB, Database for MySQL, Database for PostgreSQL, Database for Maria DB, SQL Data Warehouse, Azure Databricks, Azure Data Lake, HDInsight, Azure Stream Analytics, Azure Data Factory, Azure Data Catalog and Microsoft’s Analytics Platform System (APS).

As part of his charter, Rohan is focused on delivering core data platform services for Microsoft that allow IT professionals, DBAs, Data Scientists, Data Engineers and Developers to successfully develop, deploy, and manage data applications across Azure Data Services and SQL Server workloads.

Rohan joined Microsoft in July 1998 as a software development engineer in the core Windows file systems and storage team. He contributed to Windows XP and Windows Server 2003 before moving to the SQL Server team in July 2003. He has held various levels of engineering leadership roles in SQL Server since then and has contributed to SQL Server 2005, SQL Server 2008, SQL Server 2008 R2, SQL Server 2012, SQL Server 2014, SQL Azure, HD Insight Service and APS.

Rohan graduated with a Bachelor of Technology degree in Computer Science and Engineering from Indian Institute of Technology, BHU and a Master of Science degree in Computer Science from University of Massachusetts at Amherst.

 

Watch this speaker at Data + AI Summit 2021

Past sessions

Join the Wednesday morning keynote to hear from Databricks co-founders and original creators of popular projects Apache Spark, Delta Lake, and MLflow on how the open source community is tackling the biggest challenges in data.

Stay tuned for them to reveal some of the latest innovations in data engineering and data analytics to simplify and scale your work.

[daisna21-sessions-od]

Thursday Morning Keynote

November 18, 2020 04:00 PM PT

Welcome from Ali Ghodsi, Databricks


Taking Machine Learning to Production with New Features in MLflow

Matei Zaharia
Assistant Professor of Computer Science Original Creator of Apache Spark & MLflow, Databricks

Deploying and operating machine learning applications is challenging because they are highly dependent on input data and can fail in complex ways. Problems such as training/inference differences in data format, data skew, and misconfigured software environments can easily sneak into a production application and impact its quality. To address these types of problems, organizations are adopting ML Platform software and MLOps practices specifically for managing machine learning applications.

In this talk, I’ll present some of the latest functionality added for productionizing machine learning in MLflow, the popular open source machine learning platform started by Databricks in 2018. These include built-in support for model management and review using the Model Registry, APIs for automatic Continuous Integration and Delivery (CI/CD), model schemas to catch differences in a model’s expected data format, and integration with model explainability tools. I’ll also talk about other work happening in the open source MLflow community, including deep integration with PyTorch and its growing ecosystem of model productionization tools.


Demo: CI/CD and MLOps with MLflow

Kasey Uhlenhuth
Sr Product Manager, Machine Learning, Databricks


PyTorch and MLflow, from Research to Production

Lin Qiao
Engineering Director, PyTorch, Facebook

Lin Qiao, engineering director on the Facebook AI team, talks about bringing machine learning to production at scale, including the PyTorch integration with MLflow. She talks about the guiding principles for PyTorch and the goals set back in 2016 during initial development through the present day, with a focus on ecosystem compatibility.

Lin reviews the PyTorch production ecosystem and discusses how MLflow and PyTorch are integrated for tracking, models and model serving.


Introducing the Next Generation Data Science Workspace

Clemens Mewald
Director of Product Management, Data Science and Machine Learning, Databricks

It is no longer a secret that data driven insights and decision making are essential in any company’s strategy to keep up with today’s rapid pace of change and remain relevant. Although we take this realization for granted, we are still in the very early stage of enabling data teams to deliver on their promise. One of the reasons is that we haven’t equipped this profession with the modern toolkit they deserve.

Existing solutions leave data teams with impossible trade-offs. Giving Data Scientists the freedom to use any open source tools on their laptops doesn’t provide a clear path to production and governance. Simply hosting those same tools in the Cloud may solve some of the data privacy and security issues, but doesn’t improve productivity nor collaboration. On the other hand, most robust and scalable production environments hinder innovation and experimentation by slowing Data Scientists down.

In this talk we will give an update on the next generation Data Science Workspace on Databricks, originally unveiled at Spark + AI Summit 2020. Specifically, we will cover new capabilities added to Databricks Notebooks as well as Git-based Databricks Projects. Until now, the industry has assumed that collaborative notebooks are for experimentation only, and not for production. Our approach solved for these challenges and, for the first time, provides a single platform for data teams to rapidly and confidently move from experimentation to production.

In this talk, we will unveil the next generation of the Databricks Data Science Workspace: An open and unified experience for modern data teams specifically designed to address these hard tradeoffs. We will introduce new features that leverage the open source tools you are familiar with to give you a laptop-like experience that provides the flexibility to experiment and the robustness to create reliable and reproducible production solutions.


Discussion with Daimler

Stephan Schwarz
Production Planning: Manager Smart Data Processing (Mercedes Operations), Daimler

Sebastian Findeisen
Data Scientist, Daimler

When we think about luxury cars, what first comes to mind is often the end product-- the sleek design, how fast it goes, and so on. But we often overlook the enormous amount of effort it takes before that car rolls off the assembly line. In this talk, Daimler will give us a peek into how data and ML is playing a critical role to drive car production automation, with MLOps and tools like MLflow being leveraged to automate a number of complex processes, and provide insights that create production efficiencies.


Responsible ML – Bringing Accountability to Data Science Keynote

Rohan Kumar
Corporate Vice President, Azure Data, Microsoft

Responsible ML is the most talked about field in AI at the moment. With the growing importance of ML, it is even more important for us to exercise ethical AI practices and ensure that the models we create live up to the highest standards of inclusiveness and transparency. Join Rohan Kumar, as he talks about how Microsoft brings cutting-edge research into the hands of customers to make them more accountable for their models and responsible in their use of AI. For the AI community, this is an open invitation to collaborate and contribute to shape the future of Responsible ML. This keynote is brought to you as an encore presentation from the global Summit.


Demo: Azure Tools for Responsible AI

Sarah Bird
Principal Program Manager, Microsoft Azure AI


Pursuing the Extraordinary: A Data Revolution

Keynote from Mae Jemison
First woman of color in the world to go into space, former NASA astronaut

Exploration of the opportunities and obstacles encountered and clarity of purpose needed to achieve an extraordinary future -- such as human interstellar travel or a sustainable human existence on planet Earth -- and what roles can big data and advancing IT play.

Summit 2020 Spark + AI Summit 2020: Thursday Morning Keynotes

June 24, 2020 05:00 PM PT

Clemens Mewal - Next Generation Data Science Workspace (Databricks) - 9:06
Lauren Richie - DEMO: Next Generation Data Science Workspace (Databricks) - 17:55
Matei Zaharia - MLflow Community and Product Updates (Databricks) - 27:40
Sue Ann Hong - DEMO: MLflow (Databricks) - 42:57
Rohan Kumar - Responsible ML (Microsoft) - 51:52
Sarah Bird - DEMO: Responsible ML (Microsoft) - 1:00:21
Anurag Sehgal - Data and AI (Credit Suisse) - 1:12:58


Introducing the Next Generation Data Science Workspace
Ali Ghodsi, Clemens Mewald and Lauren Richie

It is no longer a secret that data driven insights and decision making are essential in any company’s strategy to keep up with today’s rapid pace of change and remain relevant. Although we take this realization for granted, we are still in the very early stage of enabling data teams to deliver on their promise. One of the reasons is that we haven’t equipped this profession with the modern toolkit they deserve.

Existing solutions leave data teams with impossible trade-offs. Giving Data Scientists the freedom to use any open source tools on their laptops doesn’t provide a clear path to production and governance. Simply hosting those same tools in the Cloud may solve some of the data privacy and security issues, but doesn’t improve productivity nor collaboration. On the other hand, most robust and scalable production environments hinder innovation and experimentation by slowing Data Scientists down.

In this talk, we will unveil the next generation of the Databricks Data Science Workspace: An open and unified experience for modern data teams specifically designed to address these hard tradeoffs. We will introduce new features that leverage the open source tools you are familiar with to give you a laptop-like experience that provides the flexibility to experiment and the robustness to create reliable and reproducible production solutions.


Simplifying Model Development and Management with MLflow
Matei Zaharia and Sue Ann Hong

As organizations continue to develop their machine learning (ML) practice, the need for robust and reliable platforms capable of handling the entire ML lifecycle is becoming crucial for successful outcomes. Building models is difficult enough to do once, but deploying them into production in a reproducible, agile, and predictable way is exponentially harder due to the dependencies on parameters, environments, and the ever changing nature of data and business needs.

Introduced by Databricks in 2018, MLflow is the most widely used open source platform for managing the full ML lifecycle. With over 2 million PyPI downloads a month and over 200 contributors, the growing support from the developer community demonstrates the need for an open source approach to standardize tools, processes, and frameworks involved throughout the ML lifecycle. MLflow significantly simplifies the complex process of standardizing MLOps and productionizing ML models. In this talk, we’ll cover what’s new in MLflow, including simplified experiment tracking, new innovations to the model format to improve portability, new features to manage and compare model schemas, and new capabilities for deploying models faster.


Responsible ML - Bringing Accountability to Data Science
Rohan Kumar and Sarah Bird

Responsible ML is the most talked about field in AI at the moment. With the growing importance of ML, it is even more important for us to exercise ethical AI practices and ensure that the models we create live up to the highest standards of inclusiveness and transparency. Join Rohan Kumar, as he talks about how Microsoft brings cutting-edge research into the hands of customers to make them more accountable for their models and responsible in their use of AI. For the AI community, this is an open invitation to collaborate and contribute to shape the future of Responsible ML.


How Credit Suisse Is Leveraging Open Source Data and AI Platforms to Drive Digital Transformation, Innovation and Growth
Anurag Sehgal

Despite the increasing embrace of big data and AI, most financial services companies still experience significant challenges around data types, privacy, and scale. Credit Suisse is overcoming these obstacles by standardizing on open, cloud-based platforms, including Azure Databricks, to increase the speed and scale of operations, and the democratization of ML across the organization. Now, Credit Suisse is leading the way by successfully employing data and analytics to drive digital transformation, delivering new products to market faster, and driving business growth and operational efficiency.

Summit 2019 Rohan Kumar, Microsoft | Keynote Spark + AI Summit

April 23, 2019 05:00 PM PT

Rohan Kumar, Microsoft | Keynote Spark + AI Summit

Summit Europe 2018 Developing for the Intelligent Cloud and Intelligent Edge SAIS EU

June 24, 2021 11:33 AM PT

It seems as if there are multiple stories daily about the various ways AI is impacting organizations and people across the world. Whether it’s intelligent applications making data-driven recommendations to customers, or machine learning being used to detect potential health risks- it’s hard to be surprised these days. Cloud computing has been a fundamental force driving our ability to use data science and machine learning to solve for scenarios that were once believed only to be achievable in science fiction. But, what about scenarios in remote places with little to no connectivity and with devices that have limited processing power? What if we could infuse this intelligence not only into applications in the cloud, but also onto the devices themselves right where the action is? This talk introduces the concept of how the intelligent cloud and intelligent edge maintained as one computing fabric can extend the reach of AI.

It seems as if there are multiple stories daily about the various ways AI is impacting organizations and people across the world. Whether it’s intelligent applications making data-driven recommendations to customers, or machine learning being used to detect potential health risks- it’s hard to be surprised these days. Cloud computing has been a fundamental force driving our ability to use data science and machine learning to solve for scenarios that were once believed only to be achievable in science fiction.

But, what about scenarios in remote places with little to no connectivity and with devices that have limited processing power? What if we could infuse this intelligence not only into applications in the cloud, but also onto the devices themselves right where the action is? This talk introduces the concept of how the intelligent cloud and intelligent edge maintained as one computing fabric can extend the reach of AI.

Organized by Databricks

If you have questions, or would like information on sponsoring a Spark + AI Summit, please contact organizers@spark-summit.org.

Apache, Apache Spark, Spark, and the Spark logo are trademarks of the Apache Software Foundation. The Apache Software Foundation has no affiliation with and does not endorse the materials provided at this event.