Engineering Blog | Databricks Blog

Page 2

Data Exfiltration Protection with Azure Databricks

March 21, 2024 by Ganesh Rajagopal, Bruce Nelson and Bhavin Kukadia in Engineering Blog

In the previous blog , we discussed how to securely access Azure Data Services from Azure Databricks using Virtual Network Service Endpoints or...

Implementing LLM Guardrails for Safe and Responsible Generative AI Deployment on Databricks

March 13, 2024 by Debu Sinha, Margaret Qian and Jacqueline Li in Data Science and ML

Introduction Let’s explore a common scenario – your team is eager to leverage open source LLMs to build chatbots for customer support interactions...

Announcing the General Availability of Databricks Feature Serving

March 11, 2024 by Aakrati Talati, Mani Parkhe, Chenen Liang, Jasraj Dange, Mingyang Ge and Akhil Gupta in Data Science and ML

Today, we are excited to announce the general availability of Feature Serving. Features play a pivotal role in AI Applications, typically requiring considerable...

Databricks Expands Brickbuilder Program to Include Unity Catalog Accelerators

March 7, 2024 by Christine Gauthier in Partners

Today, we're excited to announce the launch of Brickbuilder Unity Catalog Accelerators. This is an expansion to the Brickbuilder Accelerator program , which...

Simplify PySpark testing with DataFrame equality functions

March 6, 2024 by Haejoon Lee, Allison Wang and Amanda Liu in Engineering Blog

The DataFrame equality test functions were introduced in Apache Spark™ 3.5 and Databricks Runtime 14.2 to simplify PySpark unit testing. The full set...

A Deep Dive into the Latest Performance Improvements of Stateful Pipelines in Apache Spark Structured Streaming

February 28, 2024 by Mojgan Mazouchi, Mrityunjay Kumar, Anish Shrigondekar and Karthikeyan Ramasamy in Engineering Blog

This post is the second part of our two-part series on the latest performance improvements of stateful pipelines. The first part of this...

Performance Improvements for Stateful Pipelines in Apache Spark Structured Streaming

February 28, 2024 by Mojgan Mazouchi, Mrityunjay Kumar, Anish Shrigondekar and Karthikeyan Ramasamy in Engineering Blog

Introduction Apache Spark™ Structured Streaming is a popular open-source stream processing platform that provides scalability and fault tolerance, built on top of the...

Databricks adds new migration Brickbuilder Solutions to help customers succeed with AI

February 15, 2024 by Christine Gauthier in Partners

For the past two years, Databricks has collaborated with leading consulting partners to build innovative solutions for industry, migration, and data and AI...

Announcing Ray Autoscaling support on Databricks and Apache Spark™

January 9, 2024 by Weichen Xu, Puneet Jain and Ben Wilson in Engineering Blog

Ray is an open-source unified compute framework that simplifies scaling AI and Python workloads in a distributed environment. Since we introduced support for...

Parameterized queries with PySpark

January 3, 2024 by Matthew Powers, Daniel Tenedorio and Hyukjin Kwon in Engineering Blog

PySpark has always provided wonderful SQL and Python APIs for querying data. As of Databricks Runtime 12.1 and Apache Spark 3.4, parameterized queries...