Spark Streaming — Advanced

Demo Type

Product Tutorial

Duration

Self-paced

What you’ll learn

The Databricks Lakehouse Platform dramatically simplifies data streaming to deliver real-time analytics, machine learning and applications on one platform. In this demo, we’ll present how the Databricks Lakehouse provides streaming capabilities to ingest and analyze clickstream data (typically from message queues such as Kafka).

Sessionization is the process of finding time-bounded user sessions from a flow of events, grouping all events happening around the same time (e.g., number of clicks, pages most viewed, etc.)

Understanding sessions is critical for a lot of use cases:

Detect cart abandonment in your online shot, and automatically trigger marketing actions as a follow-up to increase your sales
Build better attribution models for your affiliation, based on the user actions during each session
Understand the user journey in your website, and provide a better experience to increase your user retention

In this demo, we will:

Ingest data from Kafka
Save the data as Delta tables, ensuring quality and performance at scale
Compute user sessions based on activity

To install the demo, get a free Databricks workspace and execute the following two commands in a Python notebook

Dbdemos is a Python library that installs complete Databricks demos in your workspaces. Dbemos will load and start notebooks, DLT pipelines, clusters, Databricks SQL dashboards, warehouse models … See how to use dbdemos

Dbdemos is distributed as a GitHub project.

For more details, please view the GitHub README.md file and follow the documentation.
Dbdemos is provided as is. See the License and Notice for more information.
Databricks does not offer official support for dbdemos and the associated assets.
For any issue, please open a ticket and the demo team will have a look on a best-effort basis.

Note - at Data + AI Summit in June 2025, Databricks released Lakeflow. Lakeflow unifies Data Engineering with Lakeflow Connect, Spark Declarative Pipelines (previously known as DLT), and Lakeflow Jobs (previously known as Workflows).

Ready to get started?

Try Databricks for free

Spark Streaming — Advanced

What you’ll learn

To install the demo, get a free Databricks workspace and execute the following two commands in a Python notebook

Dbdemos is distributed as a GitHub project.

Recommended

Tutorial

Tutorial

Tutorial

Ready to get started?