Databricks Blog

Page 185

Breaking the “curse of dimensionality” in Genomics using “wide” Random Forests

July 26, 2017 by Denis C. Bauer, Lynn Langit, Oscar Luo, Piotr Szul and Aidan O’Brien in Engineering

This is a guest blog from members of CSIRO’s transformational bioinformatics team in Sydney, Australia. CSIRO, Australia’s government research agency, is in the...

Integrating Apache Airflow with Databricks

July 19, 2017 by Andrew Chen in Engineering

This blog post is part of our series of internal engineering blogs on Databricks platform, infrastructure management, integration, tooling, monitoring, and provisioning. Today...

Serverless Continuous Delivery with Databricks and AWS CodePipeline

July 13, 2017 by Kevin Rasmussen in Product

Two characteristics commonly mark many companies' success. First, they quickly adapt to new technology. Second, as a result, they gain technological leadership and...

Benchmarking Big Data SQL Platforms in the Cloud

July 12, 2017 by Juliusz Sompolski and Reynold Xin in Engineering

For a deeper dive on these benchmarks, watch the webinar featuring Reynold Xin. Performance is often a key factor in choosing big data...

Introducing Apache Spark 2.2

July 11, 2017 by Michael Armbrust in Engineering

Today we are happy to announce the availability of Apache Spark 2.2.0 on Databricks as part of the Databricks Runtime 3.0. This release...

4 SQL High-Order and Lambda Functions to Examine Complex and Structured Data in Databricks

June 27, 2017 by Jules Damji in Product

Read Rise of the Data Lakehouse to explore why lakehouses are the data architecture of the future with the father of the data...

Declarative Infrastructure with the Jsonnet Templating Language

June 26, 2017 by Eric Liang and Aaron Davidson in Platform

This blog post is part of our series of internal engineering blogs on Databricks platform, infrastructure management, integration, tooling, monitoring, and provisioning. At...

Shell Oil Use Case: Parallelizing Large Simulations with Apache SparkR on Databricks

June 23, 2017 by Wayne W. Jones, Dennis Vallinga and Hossein Falaki in Product

This blog post is a joint engineering effort between Shell’s Data Science Team ( Wayne W. Jones and Dennis Vallinga ) and Databricks...

Managing and Securing Credentials in Databricks for Apache Spark Jobs

June 20, 2017 by Jason Pohl in Platform

Since Apache Spark separates compute from storage, every Spark Job requires a set of credentials to connect to disparate data sources. Storing those...

Analysing Metro Operations Using Apache Spark on Databricks

June 14, 2017 by Even Vinge, Senior Manager - EY Advisory, Data & Analytics in Product

This is a guest blog from EY Advisory Data & Analytics team, who have been working with Sporveien in Oslo building a platform...