Reynold is an Apache Spark PMC member and the top contributor to the project. He initiated and led efforts such as DataFrames and Project Tungsten. He is also a co-founder and Chief Architect at Databricks.
Ali Ghodsi - Intro to Lakehouse, Delta Lake (Databricks) - 46:40 Matei Zaharia - Spark 3.0, Koalas 1.0 (Databricks) - 17:03 Brooke Wenig - DEMO: Koalas 1.0, Spark 3.0 (Databricks) - 35:46 Reynold Xin - Introducing Delta Engine (Databricks) - 1:01:50 Arik Fraimovich - Redash Overview & DEMO (Databricks) - 1:27:25 Vish Subramanian - Brewing Data at Scale (Starbucks) - 1:39:50
Realizing the Vision of the Data Lakehouse
Data warehouses have a long history in decision support and business intelligence applications. But, data warehouses were not well suited to dealing with the unstructured, semi-structured, and streaming data common in modern enterprises. This led to organizations building data lakes of raw data about a decade ago. But, they also lacked important capabilities. The need for a better solution has given rise to the data lakehouse, which implements similar data structures and data management features to those in a data warehouse, directly on the kind of low cost storage used for data lakes.
This keynote by Databricks CEO, Ali Ghodsi, explains why the open source Delta Lake project takes the industry closer to realizing the full potential of the data lakehouse, including new capabilities within the Databricks Unified Data Analytics platform to significantly accelerate performance. In addition, Ali will announce new open source capabilities to collaboratively run SQL queries against your data lake, build live dashboards, and alert on important changes to make it easier for all data teams to analyze and understand their data.
Introducing Apache Spark 3.0:
A retrospective of the Last 10 Years, and a Look Forward to the Next 10 Years to Come.
Matei Zaharia and Brooke Wenig
In this keynote from Matei Zaharia, the original creator of Apache Spark, we will highlight major community developments with the release of Apache Spark 3.0 to make Spark easier to use, faster, and compatible with more data sources and runtime environments. Apache Spark 3.0 continues the project’s original goal to make data processing more accessible through major improvements to the SQL and Python APIs and automatic tuning and optimization features to minimize manual configuration. This year is also the 10-year anniversary of Spark’s initial open source release, and we’ll reflect on how the project and its user base has grown, as well as how the ecosystem around Spark (e.g. Koalas, Delta Lake and visualization tools) is evolving to make large-scale data processing simpler and more powerful.
Delta Engine: High Performance Query Engine for Delta Lake
How Starbucks is Achieving its 'Enterprise Data Mission' to Enable Data and ML at Scale and Provide World-Class Customer Experiences
Starbucks makes sure that everything we do is through the lens of humanity – from our commitment to the highest quality coffee in the world, to the way we engage with our customers and communities to do business responsibly. A key aspect to ensuring those world-class customer experiences is data. This talk highlights the Enterprise Data Analytics mission at Starbucks that helps making decisions powered by data at tremendous scale. This includes everything ranging from processing data at petabyte scale with governed processes, deploying platforms at the speed-of-business and enabling ML across the enterprise. This session will detail how Starbucks has built world-class Enterprise data platforms to drive world-class customer experiences.
Keynote from Spark + AI Summit 2019: Reynold Xin, Databricks, Brooke Wenig, Databricks
Big data and AI are joined at the hip: the best AI applications require massive amounts of constantly updated training data to build state-of-the-art models AI has always been on of the most exciting applications of big data and Apache Spark. Increasingly Spark users want to integrate Spark with distributed deep learning and machine learning frameworks built for state-of-the-art training. This talk introduces a new project that substantially improves the performance and fault-recovery of distributed deep learning and machine learning frameworks on Spark.