Skip to main content
<
Page 6
>

How Data Lakehouses Solve Common Issues With Data Warehouses

February 4, 2021 by Ryan Boyd in
Read Rise of the Data Lakehouse to explore why lakehouses are the data architecture of the future with the father of the data...

Ray & MLflow: Taking Distributed Machine Learning Applications to Production

This is a guest blog from software engineers Amog Kamsetty and Archit Kulkarni of Anyscale and contributors to Ray.io In this blog post...

Strategies for Modernizing Investment Data Platforms

January 29, 2021 by Ricardo Portilla in
The appetite for investment was at a historic high in 2020 for both individual and institutional investors. One study showed that "retail traders...

Burning Through Electronic Health Records in Real Time With Smolder

Check out the solution accelerator to download the notebook referred throughout this blog. In previous blogs , we looked at two separate workflows...

How to Manage Python Dependencies in PySpark

December 22, 2020 by Hyukjin Kwon in
Controlling the environment of an application is often challenging in a distributed computing environment - it is difficult to ensure all nodes have...

Natively Query Your Delta Lake With Scala, Java, and Python

Today, we’re happy to announce that you can natively query your Delta Lake with Scala and Java (via the Delta Standalone Reader) and...

Python Autocomplete Improvements for Databricks Notebooks

At Databricks, we strive to provide a world-class development experience for data scientists and engineers, and new features are constantly getting added to...

How to Train XGBoost With Spark

November 16, 2020 by Stephen Offer in
XGBoost is currently one of the most popular machine learning libraries and distributed training is becoming more frequently required to accommodate the rapidly...

Improving the Spark Exclusion Mechanism in Databricks

November 6, 2020 by Tianhan Hu, Xingbo Jiang and Xiao Li in
Ed Note: This article contains references to the term blacklist, a term that the Spark community is actively working to remove from Spark...

Faster SQL: Adaptive Query Execution in Databricks

October 21, 2020 by MaryAnn Xue and Allison Wang in
Earlier this year, Databricks wrote a blog on the whole new Adaptive Query Execution framework in Spark 3.0 and Databricks Runtime 7.0. The...