Skip to main content
<
Page 4
>
Engineering blog

Efficient Point in Polygon Joins via PySpark and BNG Geospatial Indexing

This is a collaborative post by Ordnance Survey, Microsoft and Databricks. We thank Charis Doidge, Senior Data Engineer, and Steve Kingston, Senior Data...
Engineering blog

Pandas API on Upcoming Apache Spark™ 3.2

October 4, 2021 by Hyukjin Kwon and Xinrong Meng in Engineering Blog
We're thrilled to announce that the pandas API will be part of the upcoming Apache Spark™ 3.2 release. pandas is a powerful, flexible...
Engineering blog

Shiny and Environments for R Notebooks

At Databricks, we want the Lakehouse ecosystem widely accessible to all data practitioners, and R is a great interface language for this purpose...
Engineering blog

How We Built Databricks on Google Kubernetes Engine (GKE)

August 6, 2021 by Frank Munz and Li Gao in Engineering Blog
Our release of Databricks on Google Cloud Platform (GCP) was a major milestone toward a unified data, analytics and AI platform that is...
Engineering blog

An Experimentation Pipeline for Extracting Topics From Text Data Using PySpark

This post is part of a series of posts on topic modeling. Topic modeling is the process of extracting topics from a set...
Engineering blog

The Delta Between ML Today and Efficient ML Tomorrow

Delta Lake and MLflow both come up frequently in conversation but often as two entirely separate products. This blog will focus on the...
Engineering blog

AML Solutions at Scale Using Databricks Lakehouse Platform

Anti-Money Laundering (AML) compliance has been undoubtedly one of the top agenda items for regulators providing oversight of financial institutions across the globe...
Engineering blog

Get Your Free Copy of Delta Lake: The Definitive Guide (Early Release)

At the Data + AI Summit, we were thrilled to announce the early release of Delta Lake: The Definitive Guide , published by...
Engineering blog

What’s New in Apache Spark™ 3.1 Release for Structured Streaming

Along with providing the ability for streaming processing based on Spark Core and SQL API, Structured Streaming is one of the most important...
Engineering blog

Attack of the Delta Clones (Against Disaster Recovery Availability Complexity)

April 20, 2021 by Itai Weiss and Denny Lee in Engineering Blog
Get an early preview of O'Reilly's new ebook for the step-by-step guidance you need to start using Delta Lake. Notebook: Using Deep Clone...