Engineering | Databricks Blog

Page 14

Easy Ingestion to Lakehouse With COPY INTO

January 17, 2023 by Aemro Amare, Emma Liu, Amit Kara and Jasraj Dange in Platform

A new data management architecture known as the data lakehouse emerged independently across many organizations and use cases to support AI and BI...

Supercharging H3 for Geospatial Analytics

January 12, 2023 by Kent Marten, Michael Johns, Menelaos Karavelas and Desmond Cheong in Platform

On the heels of the initial release of H3 support in Databricks Runtime (DBR ), we are happy to share ground-breaking performance improvements...

Databricks Power BI Connector Now Supports Native Query

January 11, 2023 by Moe Derakhshani, Can Efeoglu, Mahesh Prakriya and Bob Zhang in Platform

This is a collaborative post from Databricks and Microsoft. We thank Mahesh Prakriya (Director in Intelligence Platform, Microsoft) and Bob Zhang (Sr. Technical...

Building Geospatial Data Products

January 5, 2023 by Milos Colic in Engineering

Geospatial data has been driving innovation for centuries, through use of maps, cartography and more recently through digital content. For example, the oldest...

Accelerating SIEM Migrations With the SPL to PySpark Transpiler

December 15, 2022 by Serge Smertin and Jason Trost in Engineering

In this blog post, we introduce transpiler , a Databricks Labs open-source project that automates the translation of Splunk Search Processing Language (SPL)...

Scalable Kubernetes Upgrade Using Operators

December 14, 2022 by Ziyuan Chen in Engineering

At Databricks, we run our compute infrastructure on AWS, Azure, and GCP. We orchestrate containerized services using Kubernetes clusters. We develop and manage...

Build Reliable and Cost Effective Streaming Data Pipelines With Delta Live Tables’ Enhanced Autoscaling

December 7, 2022 by Paul Lappas, Li Zhang, Alex Ott, Kiavash Kianfar, Yuhong Chen and Prashanth Babu Velanati Venkata in Platform

This year we announced the general availability of Delta Live Tables (DLT) , the first ETL framework to use a simple, declarative approach...

Spatial Analytics at Any Scale With H3 and Photon

December 7, 2022 by Kent Marten, Michael Johns and Menelaos Karavelas in Engineering

H3's global grid indexing system is driving new patterns for spatial analytics across a variety of geospatial use-cases. Recently, Databricks added built-in support...

Memory Profiling in PySpark

November 29, 2022 by Xinrong Meng, Takuya Ueshin and Allan Folting in Engineering

There are many factors in a PySpark program's performance. PySpark supports various profiling tools to expose tight loops of your program and allow...

Introducing Ingestion Time Clustering with Databricks SQL and Databricks Runtime 11.2

November 17, 2022 by Piyush Revuri, Tom van Bussel, Joe Widen, Bart Samwel, Sabir Akhadov and Carmen Kwan in Engineering

Databricks customers are processing over an exabyte of data every day on the Databricks Lakehouse platform using Delta Lake , a significant amount...