Skip to main content
<
Page 2
>

Efficient Upserts into Data Lakes with Databricks Delta

March 19, 2019 by Tathagata Das and Prakash Chockalingam in
Get an early preview of O'Reilly's new ebook for the step-by-step guidance you need to start using Delta Lake. Simplify building big data...

Introducing Databricks Optimized Autoscaling on Apache Spark™

Databricks is thrilled to announce our new optimized autoscaling feature. The new Apache Spark™-aware resource manager leverages Spark shuffle and executor statistics to...

Introducing Low-latency Continuous Processing Mode in Structured Streaming in Apache Spark 2.3

Import this notebook on Databricks Structured Streaming in Apache Spark 2.0 decoupled micro-batch processing from its high-level APIs for a couple of reasons...

Introducing Stream-Stream Joins in Apache Spark 2.3

Since we introduced Structured Streaming in Apache Spark 2.0 , it has supported joins (inner join and some type of outer joins) between...

Do your Streaming ETL at Scale with Apache Spark’s Structured Streaming

September 1, 2017 by Tathagata Das in
At the Spark Summit in San Francisco in June , we announced that Apache Spark’s Structured Streaming is marked as production-ready and shared...

Event-time Aggregation and Watermarking in Apache Spark’s Structured Streaming

This is the fourth post in a multi-part series about how you can perform complex streaming analytics using Apache Spark. Continuous applications often...

Working with Complex Data Formats with Structured Streaming in Apache Spark 2.1

In part 1 of this series on Structured Streaming blog posts, we demonstrated how easy it is to write an end-to-end streaming ETL...

Real-time Streaming ETL with Structured Streaming in Apache Spark 2.1

Explore why lakehouses are the data architecture of the future with the father of the data warehouse, Bill Inmon. Try this notebook in...

Spark Structured Streaming

Apache Spark 2.0 adds the first version of a new higher-level API, Structured Streaming, for building continuous applications . The main goal is...

Faster Stateful Stream Processing in Apache Spark Streaming

February 1, 2016 by Tathagata Das and Shixiong Zhu in
Many complex stream processing pipelines must maintain state across a period of time. For example, if you are interested in understanding user behavior...