Caryl Yuhas is a Sr. Manager of Field Engineering at Databricks, where her team provides consultative and technical support for companies looking to optimize their data and machine learning pipelines. Caryl previously worked as a Product Manager at MediaMath, where she first began to work with distributed data processing and developed a passion for Spark and cloud computing. She studied at the University of Pennsylvania, where she received a degree in Chemical and Biomolecular Engineering.
November 17, 2020 04:00 PM PT
Co-founder & Chief Architect, Databricks
In this keynote from Reynold Xin, the top contributor to Apache Spark and PMC member, we will review the state of the project and highlight major community developments in the 10th anniversary release and beyond. Reynold will review how the recent Spark 3.0 release focused on making it easier to use, faster, and more ANSI standard compliant. With Python representing nearly 70% of notebook commands, he’ll focus on the development of Project Zen - the community effort to make Spark more Pythonic. This includes improvements in development tooling, API design, error handling and more, to make data scientists and engineers more productive with data.
Sr. Manager, Field Engineering, Databricks
Co-founder & CEO
Original Creator of Apache Spark, Databricks
Data warehouses have a long history in decision support and business intelligence applications. But, data warehouses were not well suited to dealing with the unstructured, semi-structured, and streaming data common in modern enterprises. This led to organizations building data lakes of raw data about a decade ago. But, they also lacked important capabilities. The need for a better solution has given rise to lakehouse architecture, which implements similar data structures and data management features to those in a data warehouse, directly on the kind of low cost storage used for data lakes.
This keynote by Databricks CEO, Ali Ghodsi, explains how the open source Delta Lake project allows the industry to realize the full potential of lakehouse architecture. Additionally, Ali will discuss the newly announced SQL Analytics service that allows users to run traditional analytics on their data lake, instead of moving data out to data warehouses, without sacrificing performance, security, or quality. This service completes the vision of lakehouse architecture to allow the data lake to be a single source of truth of all data workloads.
Chief Product Officer, Tableau Software
Machine Learning Practice Lead, Databricks
Co-founder & Chief Architect, Databricks
In this keynote, Reynold Xin, Co-founder and Chief Architect at Databricks, will explore how SQL Analytics brings a new level of performance to data lakes for analytics workloads. Traditionally, data lakes have struggled with analytics, because they struggle to deliver the fast query performance wiht low latency at high user concurrency. Reynold will provide a techical deep dive of how Databricks has addresssed these challenges. First, Delta Engine, Databricks' polymorphic vectorized execution engine, delivers extremely fast single query throughput. Second, the new auto-scaling SQL-optimized clusters in SQL Analytics make it easy to match compute capacity to user load. And third, optimizations in the new SQL Analytics Endpoints reduce the time required to get query results by up to 6x. Altogether, SQL Analytics is able to provide users with data warehousing performance at data lake economics for their analytics workloads.
Professor, CWI & Vrije Universiteit Amsterdam
Head of Architecture, Information and Analytics, Unilever
In this talk, we’ll discuss how the Lakehouse architecture has become a critical part of Unilever’s information management infrastructure to limit traditional enterprise data silos, and enable agile access to data both up and downstream that’s needed for faster decision making. As a result, IT is helping Unilever to deliver higher quality predictions in many areas of the business, thereby building trust in AI throughout the company.
Best-selling author, journalist, and podcast host
Imagine what a data-driven response to the Covid-19 pandemic would have looked like — if we could set aside politics and ego. Award-winning author and journalist Malcolm Gladwell discusses the lessons we can learn from the current crisis, and how data and data teams will be critical in solving the world’s toughest problems – including future pandemic outbreaks. He also reveals the essential role that data teams play in his own work every day.