Skip to main content
<
Page 14
>

A Guide to Data Engineering Talks at Spark + AI Summit 2019

February 25, 2019 by Singh Garewal in
Selected highlights from the new track Big data practitioners grapple with data quality issues and data pipeline complexities—it's the bane of their existence...

How to Work with Avro, Kafka, and Schema Registry in Databricks

February 15, 2019 by Wenchen Fan and Michael Armbrust in
In the previous blog post , we introduced the new built-in Apache Avro data source in Apache Spark and explained how you can...

Databricks Runtime 5.2 ML Features Multi-GPU Workflow, Pregel API, and Performant GraphFrames

January 30, 2019 by Yifan Cao and Joseph Bradley in
We are excited to announce the release of Databricks Runtime 5.2 for Machine Learning. This release includes several new features and performance improvements...

5 Reasons to Become an Apache Spark Expert

January 15, 2019 by Michael Ortega in
Apache Spark ™ has fast become the most popular unified analytics engine for big data and machine learning. It was originally developed at...

Apparate: Managing Libraries in Databricks with CI/CD

January 15, 2019 by Hanna Torrence in
This is a guest blog from Hanna Torrence, Data Scientist at ShopRunner. Introduction As leveraging data becomes a more vital component of organizations'...

Kicking Off 2019 with an MLflow User Survey

January 8, 2019 by Matei Zaharia in
It’s been six months since we launched MLflow , an open source platform to manage the machine learning lifecycle, and the project has...

Introducing Databricks Runtime 5.1 for Machine Learning

Last week, we released Databricks Runtime 5.1 Beta for Machine Learning. As part of our commitment to provide developers with the latest deep...

Introducing Built-in Image Data Source in Apache Spark 2.4

December 10, 2018 by Tomas Nykodym and Weichen Xu in
Introduction With recent advances in deep learning frameworks for image classification and object detection, the demand for standard image processing in Apache Spark...

Apache Avro as a Built-in Data Source in Apache Spark 2.4

Try this notebook in Databricks Apache Avro is a popular data serialization format. It is widely used in the Apache Spark and Apache...

Announcing Databricks Runtime 5.0

November 18, 2018 by Todd Greenstein in
We’re excited to announce the general availability of Databricks Runtime 5.0. Included in this release is Spark 2.4 . This release offers substantial...