2018 Spark Summit + AI Keynotes - Databricks

KEYNOTES June 5th, 2018

The Power of Unified Analytics

Ali Ghodsi (Databricks)

Ali is the CEO and co-founder of Databricks, responsible for the growth and international expansion of the company. Ali was one of the original creators of open source project, Apache Spark, and ideas from his academic research in the areas of resource management and scheduling and data caching have been applied to Apache Mesos and Apache Hadoop. Ali received his MBA from Mid-Sweden University in 2003 and PhD from KTH/Royal Institute of Technology in Sweden in 2006 in the area of Distributed Computing

Threat Detection and Response at Scale

Dominique Brezinski (Apple), Michael Armbrust (Databricks)

Security monitoring and threat response has diverse processing demands on large volumes of log and telemetry data. Processing requirements span from low-latency stream processing to interactive queries over months of data. To make things more challenging, we must keep the data accessible for a retention window measured in years. Having tackled this problem before in a massive-scale environment using Apache Spark, when it came time to do it again, there were a few things I knew worked and a few wrongs I wanted to right.

We approached Databricks with a set of challenges to collaborate on: provide a stable and optimized platform for Unified Analytics that allows our team to focus on value delivery using streaming, SQL, graph, and ML; leverage decoupled storage and compute while delivering high performance over a broad set of workloads; use S3 notifications instead of list operations; remove Hive Metastore from the write path; and approach indexed response times for our more common search cases, without hard-to-scale index maintenance, over our entire retention window. This is about the fruit of that collaboration.

Infrastructure for the Complete ML Lifecycle

Matei Zaharia (Databricks)

Data is the key ingredient to building high-quality, production AI applications. It comes in during the training phase, where more and higher-quality training data enables better models, as well as during the production phase, were understanding the model’s behavior in production and detecting changes in the predictions and input data is critical to maintaining a production application. However, so far most data management and machine learning tools have been largely separate. In this presentation, I’ll talk about several efforts from Databricks, in Apache Spark as well as other open source projects, to unify data and AI in order to make it significantly simpler to build production AI applications.

Project Hydrogen: Unifying State-of-the-art AI and Big Data in Apache Spark

Reynold Xin (Databricks)

Big data and AI are joined at the hip: the best AI applications require massive amounts of constantly updated training data to build state-of-the-art models AI has always been one of the most exciting applications of big data and Apache Spark. Increasingly Spark users want to integrate Spark with distributed deep learning and machine learning frameworks built for state-of-the-art training. This talk introduces a new project that substantially improves the performance and fault-recovery of distributed deep learning and machine learning frameworks on Spark.

Developing for the Intelligent Cloud and Intelligent Edge

Rohan Kumar (Microsoft)

It seems as if there are multiple stories daily about the various ways AI is impacting organizations and people across the world. Whether it’s intelligent applications making data-driven recommendations to customers, or machine learning being used to detect potential health risks- it’s hard to be surprised these days. Cloud computing has been a fundamental force driving our ability to use data science and machine learning to solve for scenarios that were once believed only to be achievable in science fiction.

Accelerating Disruptive Innovation in the Construction Industry

Justin Leto (Bechtel Corporation)

Construction-related spending accounts for 13% of the world’s GDP, yet the industry is stuck. Where nearly every other industry is progressing, productivity in construction has advanced only 1% over the past 20 years. Bechtel—a 120-year old, privately held company focused on global engineering, construction, and project management—builds some of the most massive and highly complex projects humanity has ever attempted.

As with other industries, deep learning and AI offers the greatest potential for disruption in the construction industry. As a result, the applied knowledge of data scientists is now the limited resource. Organizations must recognize and adapt to the reality that their future success may depend on the success of their data scientists and that reducing or eliminating constraints, limitations, and roadblocks data scientists face every day in their tooling is one important way to accelerate innovation. Bechtel’s Big Data & Analytics Center of Excellence leverages the Databricks Unified Analytics Platform to realize dramatic efficiency gains that enable their data scientists to achieve results faster.

The Future of AI and Security

Dawn Song (UC Berkeley)

In this talk, I will talk about the challenges and exciting new opportunities at the intersection of AI and Security, including how AI and deep learning can enable better security, and how Security can enable better AI. I will also give an overview on challenges and new techniques to enable privacy-preserving shared machine learning. Finally, I will talk about our recent project on confidentiality-preserving smart contracts and democratization of AI.