Engineering | Databricks Blog

Page 12

Introducing Apache Spark™ 3.4 for Databricks Runtime 13.0

April 14, 2023 by Xinrong Meng, Daniel Tenedorio, Martin Grund, Allan Folting, Hyukjin Kwon, Herman van Hövell, Wenchen Fan, Ying Xiong, Jungtaek Lim, Xiao Li and Reynold Xin in Engineering

Today, we are happy to announce the availability of Apache Spark™ 3.4 on Databricks as part of Databricks Runtime 13.0 . We extend...

How Collective Health uses Delta Live Tables and Structured Streaming for Data Integration

April 13, 2023 by Mragesh Khandelwal and Mahmoud Saleh in Customers

Collective Health is not an insurance company. We're a technology company that's fundamentally making health insurance work better for everyone— starting with the...

Synthetic Data for Better Machine Learning

April 11, 2023 by Sean Owen in Engineering

You've likely tried the buzziest advances in generative AI in the past year, tools like ChatGPT and DALL-E . They consume complex data...

Visual data modeling using erwin Data Modeler by Quest on the Databricks Lakehouse Platform

April 5, 2023 by Vani Mishra, Abhishek Dey, Leo Mao, Soham Bhatt and Pradeep Anandapu in Platform

This is a collaborative post between Databricks and Quest Software. We thank Vani Mishra, Director of Product Management at Quest Software for her...

Saving Mothers with ML: How CareSource uses MLOps to Improve Healthcare in High-Risk Obstetrics

April 3, 2023 by Chengyin Eng, Russ Scoville, Arpit Gupta and Alvaro Aleman in Engineering

This blog post is in collaboration with Russ Scoville (Vice President of Enterprise Data Services), Arpit Gupta (Director of Predictive Analytics and Data...

Pandas-Profiling Now Supports Apache Spark

April 2, 2023 by Miriam Santos, Fabiana Clemente and Corey Abshire in Engineering

Data profiling is the process of collecting statistics and summaries of data to assess its quality and other characteristics. It is an essential...

Run SQL Queries on Databricks From Visual Studio Code

March 29, 2023 by Bilal Aslam, Fabian Jakobs and Shant Hovsepian in Platform

Today, we are excited to announce that users can now run SQL queries on Databricks from within Visual Studio Code via a preview...

Fine-Tuning Large Language Models with Hugging Face and DeepSpeed

March 20, 2023 by Sean Owen in Engineering

Large language models (LLMs) are currently in the spotlight following the sensational release of ChatGPT. Many are wondering how to take advantage of...

Building the Lakehouse for Healthcare and Life Sciences - Processing DICOM images at scale with ease

March 15, 2023 by Douglas Moore in Healthcare & Life Sciences

One of the biggest challenges in understanding patient health status and disease progression is unlocking insights from the vast amounts of semi-structured and...

Unsupervised Outlier Detection on Databricks

March 13, 2023 by Iliya Kostov, Milos Colic and Michele Caputo in Engineering

Kakapo ( KAH-kə-poh ) implements a standard set of APIs for outlier detection at scale on Databricks. It provides an integration of the...