SESSION

Streaming Data Pipelines: From Supernovas to LLMs

OVERVIEW

EXPERIENCEIn Person
TYPEBreakout
TRACKData Engineering and Streaming
INDUSTRYHealth and Life Sciences, Public Sector
TECHNOLOGIESAI/Machine Learning, Developer Experience, ETL
SKILL LEVELIntermediate
DURATION40

In this fun, hands-on, and in-depth HowTo, we use live streaming data for a comprehensive use case with the Databricks Intelligence Platform. The focus of this session is on data engineering. We will tackle the challenge of analyzing real-time data from collapsing supernovas that emit gamma-ray bursts provided by NASA with their GCN project. You'll learn to ingest data from message buses and decide between Delta Live Tables, DBSQL, or Databricks Workflows for stream processing. Understand how to code ETL pipelines in SQL, including Kafka ingestion. Once we have the cleaned data stream, I'll demonstrate how Databricks Data Rooms offer natural language analytics and compare it to a notebook streaming data into a Vector Database for open source LLMs with RAG. This session is ideal for data engineers, data architects who like code, genAI enthusiasts, and anyone fascinated by sparkling stars. Learn when and how to use which Databricks products. The demo is easy to replicate at home.

SESSION SPEAKERS

IMAGE COMING SOON

Frank Munz

/Principal TMM
Databricks