Vishwanath Subramanian is Director of Data and Analytics Engineering at Starbucks. He has over 15 years of experience with a background in applied analytics, distributed systems, data warehouses, product management and software development. At Starbucks, his key focus is providing Next Generation Analytics for the enterprise, enabling large scale data processing across various platforms and powering Machine Learning workflows for amazing customer experiences.
Ali Ghodsi - Intro to Lakehouse, Delta Lake (Databricks) - 46:40 Matei Zaharia - Spark 3.0, Koalas 1.0 (Databricks) - 17:03 Brooke Wenig - DEMO: Koalas 1.0, Spark 3.0 (Databricks) - 35:46 Reynold Xin - Introducing Delta Engine (Databricks) - 1:01:50 Arik Fraimovich - Redash Overview & DEMO (Databricks) - 1:27:25 Vish Subramanian - Brewing Data at Scale (Starbucks) - 1:39:50
Realizing the Vision of the Data Lakehouse
Data warehouses have a long history in decision support and business intelligence applications. But, data warehouses were not well suited to dealing with the unstructured, semi-structured, and streaming data common in modern enterprises. This led to organizations building data lakes of raw data about a decade ago. But, they also lacked important capabilities. The need for a better solution has given rise to the data lakehouse, which implements similar data structures and data management features to those in a data warehouse, directly on the kind of low cost storage used for data lakes.
This keynote by Databricks CEO, Ali Ghodsi, explains why the open source Delta Lake project takes the industry closer to realizing the full potential of the data lakehouse, including new capabilities within the Databricks Unified Data Analytics platform to significantly accelerate performance. In addition, Ali will announce new open source capabilities to collaboratively run SQL queries against your data lake, build live dashboards, and alert on important changes to make it easier for all data teams to analyze and understand their data.
Introducing Apache Spark 3.0:
A retrospective of the Last 10 Years, and a Look Forward to the Next 10 Years to Come.
Matei Zaharia and Brooke Wenig
In this keynote from Matei Zaharia, the original creator of Apache Spark, we will highlight major community developments with the release of Apache Spark 3.0 to make Spark easier to use, faster, and compatible with more data sources and runtime environments. Apache Spark 3.0 continues the project’s original goal to make data processing more accessible through major improvements to the SQL and Python APIs and automatic tuning and optimization features to minimize manual configuration. This year is also the 10-year anniversary of Spark’s initial open source release, and we’ll reflect on how the project and its user base has grown, as well as how the ecosystem around Spark (e.g. Koalas, Delta Lake and visualization tools) is evolving to make large-scale data processing simpler and more powerful.
Delta Engine: High Performance Query Engine for Delta Lake
How Starbucks is Achieving its 'Enterprise Data Mission' to Enable Data and ML at Scale and Provide World-Class Customer Experiences
Starbucks makes sure that everything we do is through the lens of humanity – from our commitment to the highest quality coffee in the world, to the way we engage with our customers and communities to do business responsibly. A key aspect to ensuring those world-class customer experiences is data. This talk highlights the Enterprise Data Analytics mission at Starbucks that helps making decisions powered by data at tremendous scale. This includes everything ranging from processing data at petabyte scale with governed processes, deploying platforms at the speed-of-business and enabling ML across the enterprise. This session will detail how Starbucks has built world-class Enterprise data platforms to drive world-class customer experiences.
In addition to the many data engineering initiatives at Starbucks, we are also working on many interesting data science initatives. The business scenarios involved in our deep learning initatives include (but are not limited to) planogram analysis (layout of our stores for efficient partner and customer flow) to predicting product pairings (e.g. purchase a caramel machiato and perhaps you would like caramel brownie) via the product components using graph convolutional networks. For this session, we will be focusing on how we can run distributed Keras (TensorFlow backend) training to perform image analytics. This will be combined with MLflow to showcase the data science lifecycle and how Databricks + MLflow simplifies it.