Skip to main content
<
Page 4
>

Native Support of Session Window in Spark Structured Streaming

Apache Spark™ Structured Streaming allowed users to do aggregations on windows over event-time . Before Apache Spark 3.2™, Spark supported tumbling windows and...

Efficient Point in Polygon Joins via PySpark and BNG Geospatial Indexing

This is a collaborative post by Ordnance Survey, Microsoft and Databricks. We thank Charis Doidge, Senior Data Engineer, and Steve Kingston, Senior Data...

Pandas API on Upcoming Apache Spark™ 3.2

October 4, 2021 by Hyukjin Kwon and Xinrong Meng in
We're thrilled to announce that the pandas API will be part of the upcoming Apache Spark™ 3.2 release. pandas is a powerful, flexible...

Shiny and Environments for R Notebooks

At Databricks, we want the Lakehouse ecosystem widely accessible to all data practitioners, and R is a great interface language for this purpose...

How We Built Databricks on Google Kubernetes Engine (GKE)

August 6, 2021 by Frank Munz and Li Gao in
Our release of Databricks on Google Cloud Platform (GCP) was a major milestone toward a unified data, analytics and AI platform that is...

An Experimentation Pipeline for Extracting Topics From Text Data Using PySpark

This post is part of a series of posts on topic modeling. Topic modeling is the process of extracting topics from a set...

The Delta Between ML Today and Efficient ML Tomorrow

Delta Lake and MLflow both come up frequently in conversation but often as two entirely separate products. This blog will focus on the...

AML Solutions at Scale Using Databricks Lakehouse Platform

Anti-Money Laundering (AML) compliance has been undoubtedly one of the top agenda items for regulators providing oversight of financial institutions across the globe...

Get Your Free Copy of Delta Lake: The Definitive Guide (Early Release)

At the Data + AI Summit, we were thrilled to announce the early release of Delta Lake: The Definitive Guide , published by...

What’s New in Apache Spark™ 3.1 Release for Structured Streaming

Along with providing the ability for streaming processing based on Spark Core and SQL API, Structured Streaming is one of the most important...