Lakeflow pipelines and data freshness
Lakeflow Spark Declarative Pipelines (SDP) populate the analytical tables your app reads. Authoring pipelines is a data engineering task you almost never touch as an AppKit dev. Your job is the read side: displaying pipeline output, and answering "is this fresh enough to show?" before you render it.
Two SQL signals answer it: per-table refresh metadata for materialized views and streaming tables, and the pipeline update timeline. Both go through the Analytics plugin you set up in Analytical reads.
What's under Lakeflow
Lakeflow groups these products:
- Lakeflow Connect for ingestion. Managed connectors for Salesforce, Workday, SQL Server, and others ingest data into Unity Catalog.
- Lakeflow Spark Declarative Pipelines for transformation. Authored in SQL or Python, runs on Databricks Runtime, produces materialized views and streaming tables.
- Lakeflow Jobs for orchestration. See Lakeflow Jobs for the app-trigger side.
- Lakeflow Designer for no-code visual pipeline building (Public Preview).
Two freshness signals
- Per-table refresh metadata:
DESCRIBE TABLE EXTENDED <name> AS JSONreturns arefresh_informationblock for materialized views and streaming tables. The block haslast_refreshed_at,last_refresh_type,latest_refresh_status,latest_refresh_link, andrefresh_schedule. See DESCRIBE TABLE for the full output schema. - Pipeline update timeline (Public Preview):
system.lakeflow.pipeline_update_timelinerecords every pipeline update withpipeline_id,update_id,period_start_time,period_end_time,result_state(one ofCOMPLETED,FAILED,CANCELED), and trigger details. Filter bypipeline_idandresult_state = 'COMPLETED'to find the most recent successful update for the pipeline that owns a table. See the system table reference for the full column list.
For deeper troubleshooting (per-flow status, expectation results, lineage events), use the pipeline event log via the event_log() table-valued function. The event log is the right place for "why did this update fail" questions, not "is this data fresh enough to show".
A "Last updated" badge query
Put this in config/queries/. It runs through the Analytics plugin like any other SQL file.
-- @param pipelineId STRING
SELECT period_end_time, result_state
FROM system.lakeflow.pipeline_update_timeline
WHERE pipeline_id = :pipelineId
AND result_state = 'COMPLETED'
ORDER BY period_end_time DESC
LIMIT 1;
Call the hook from React with the pipeline ID:
import { useMemo } from "react";
import { useAnalyticsQuery } from "@databricks/appkit-ui/react";
import { sql } from "@databricks/appkit-ui/js";
const params = useMemo(
() => ({ pipelineId: sql.string("ec2a0ff4-d2a5-4c8c-bf1d-d9f12f10e749") }),
[],
);
const { data } = useAnalyticsQuery("last_pipeline_update", params);
The .obo.sql filename runs the query as the signed-in user. If your app's service principal has SELECT on system.lakeflow.pipeline_update_timeline, drop the .obo and the query runs as the app. See Author SQL files for the full filename rule.
Triggering a refresh from the app
There's no dedicated AppKit Pipelines plugin. Call the SDK directly from your handler with w.pipelines.startUpdate({ pipelineId }), or wrap the pipeline in a Lakeflow Job and use the Jobs plugin.
Where to next
Try Medallion Architecture from CDC History Tables for the canonical SDP pipeline that produces these tables, or Operational Data Analytics for the end-to-end UC + CDC + medallion pattern.