Articles by Tathagata Das - Databricks Blog

Page 2

Efficient Upserts into Data Lakes with Databricks Delta

March 19, 2019 by Tathagata Das and Prakash Chockalingam in Product

Get an early preview of O'Reilly's new ebook for the step-by-step guidance you need to start using Delta Lake. Simplify building big data...

Introducing Databricks Optimized Autoscaling on Apache Spark™

May 2, 2018 by Prakash Chockalingam, Eric Liang, Tathagata Das and Jean-Yves Stephan in Product

Databricks is thrilled to announce our new optimized autoscaling feature. The new Apache Spark™-aware resource manager leverages Spark shuffle and executor statistics to...

Introducing Low-latency Continuous Processing Mode in Structured Streaming in Apache Spark 2.3

March 20, 2018 by Joseph Torres, Michael Armbrust, Tathagata Das and Shixiong Zhu in Open Source

Import this notebook on Databricks Structured Streaming in Apache Spark 2.0 decoupled micro-batch processing from its high-level APIs for a couple of reasons...

Introducing Stream-Stream Joins in Apache Spark 2.3

March 13, 2018 by Tathagata Das and Joseph Torres in Engineering Blog

Since we introduced Structured Streaming in Apache Spark 2.0 , it has supported joins (inner join and some type of outer joins) between...

Do your Streaming ETL at Scale with Apache Spark’s Structured Streaming

September 1, 2017 by Tathagata Das in Announcements

At the Spark Summit in San Francisco in June , we announced that Apache Spark’s Structured Streaming is marked as production-ready and shared...

Event-time Aggregation and Watermarking in Apache Spark’s Structured Streaming

May 8, 2017 by Tathagata Das in Engineering Blog

This is the fourth post in a multi-part series about how you can perform complex streaming analytics using Apache Spark. Continuous applications often...

Working with Complex Data Formats with Structured Streaming in Apache Spark 2.1

February 23, 2017 by Burak Yavuz, Michael Armbrust, Tathagata Das and Tyson Condie in Engineering Blog

In part 1 of this series on Structured Streaming blog posts, we demonstrated how easy it is to write an end-to-end streaming ETL...

Real-time Streaming ETL with Structured Streaming in Apache Spark 2.1

January 19, 2017 by Tathagata Das, Michael Armbrust and Tyson Condie in Engineering Blog

Explore why lakehouses are the data architecture of the future with the father of the data warehouse, Bill Inmon. Try this notebook in...

Spark Structured Streaming

July 28, 2016 by Matei Zaharia, Tathagata Das, Michael Lumb and Reynold Xin in Engineering Blog

Apache Spark 2.0 adds the first version of a new higher-level API, Structured Streaming, for building continuous applications . The main goal is...

Faster Stateful Stream Processing in Apache Spark Streaming

February 1, 2016 by Tathagata Das and Shixiong Zhu in Engineering Blog

Many complex stream processing pipelines must maintain state across a period of time. For example, if you are interested in understanding user behavior...