Open Source | Databricks Blog

Page 20

On-Demand Webinar and FAQ: Parallelize R Code Using Apache Spark

August 21, 2017 by Hossein Falaki and Jules Damji in Engineering

On August 15th, Data Science Central hosted a live webinar—Parallelize R Code Using Apache Spark—with Databricks’ Hossein Falaki . This webinar introduced SparkR...

Breaking the “curse of dimensionality” in Genomics using “wide” Random Forests

July 26, 2017 by Denis C. Bauer, Lynn Langit, Oscar Luo, Piotr Szul and Aidan O’Brien in Engineering

This is a guest blog from members of CSIRO’s transformational bioinformatics team in Sydney, Australia. CSIRO, Australia’s government research agency, is in the...

Integrating Apache Airflow with Databricks

July 19, 2017 by Andrew Chen in Engineering

This blog post is part of our series of internal engineering blogs on Databricks platform, infrastructure management, integration, tooling, monitoring, and provisioning. Today...

Benchmarking Big Data SQL Platforms in the Cloud

July 12, 2017 by Juliusz Sompolski and Reynold Xin in Engineering

For a deeper dive on these benchmarks, watch the webinar featuring Reynold Xin. Performance is often a key factor in choosing big data...

Introducing Apache Spark 2.2

July 11, 2017 by Michael Armbrust in Engineering

Today we are happy to announce the availability of Apache Spark 2.2.0 on Databricks as part of the Databricks Runtime 3.0. This release...

Declarative Infrastructure with the Jsonnet Templating Language

June 26, 2017 by Eric Liang and Aaron Davidson in Platform

This blog post is part of our series of internal engineering blogs on Databricks platform, infrastructure management, integration, tooling, monitoring, and provisioning. At...

Five Spark SQL Utility Functions to Extract and Explore Complex Data Types

June 13, 2017 by Jules Damji in Engineering

Try this notebook on Databricks For developers, often the how is as important as the why . While our in-depth blog explains the...

A Vision for Making Deep Learning Simple

June 6, 2017 by Sue Ann Hong, Tim Hunter and Reynold Xin in Engineering

Try this notebook on Databricks When MapReduce was introduced 15 years ago, it showed the world a glimpse into the future. For the...

Making Apache Spark the Fastest Open Source Streaming Engine

June 6, 2017 by Michael Lumb in Engineering

We started building Structured Streaming in Apache Spark one year ago as a new, simpler way to develop continuous applications . Not only...

Entropy-based Log Redaction for Apache Spark on Databricks

May 30, 2017 by Weiluo Ren and Yu Peng in Engineering

This blog post is part of our series of internal engineering blogs on Databricks platform, infrastructure management, tooling, monitoring, and provisioning. We love...