Solutions | Databricks Blog

Page 18

Introducing the Natural Language Processing Library for Apache Spark

October 19, 2017 by David Talby in Solutions

This is a community blog and effort from the engineering team at John Snow Labs, explaining their contribution to an open-source Apache Spark...

Breaking the “curse of dimensionality” in Genomics using “wide” Random Forests

July 26, 2017 by Denis C. Bauer, Lynn Langit, Oscar Luo, Piotr Szul and Aidan O’Brien in Engineering Blog

This is a guest blog from members of CSIRO’s transformational bioinformatics team in Sydney, Australia. CSIRO, Australia’s government research agency, is in the...

Integrating Apache Airflow with Databricks

July 19, 2017 by Andrew Chen in Engineering Blog

This blog post is part of our series of internal engineering blogs on Databricks platform, infrastructure management, integration, tooling, monitoring, and provisioning. Today...

Declarative Infrastructure with the Jsonnet Templating Language

June 26, 2017 by Eric Liang and Aaron Davidson in Platform Blog

This blog post is part of our series of internal engineering blogs on Databricks platform, infrastructure management, integration, tooling, monitoring, and provisioning. At...

Using sparklyr in Databricks

May 25, 2017 by Hossein Falaki in Engineering Blog

Try this notebook on Databricks with all instructions as explained in this post notebook In September 2016, RStudio announced sparklyr , a new...

Detecting Abuse at Scale: Locality Sensitive Hashing at Uber Engineering

May 9, 2017 by Yun Ni, Kelvin Chu and Joseph Bradley in Solutions

This is a cross blog post effort between Databricks and Uber Engineering. Yun Ni is a software engineer on Uber’s Machine Learning Platform...

Analyse One Year of Radio Station Songs Aired with Apache Spark, Spark SQL, Spotify, and Databricks

March 27, 2017 by Paul Leclercq in Solutions

Read Rise of the Data Lakehouse to explore why lakehouses are the data architecture of the future with the father of the data...

Voice from CERN: Apache Spark 2.0 Performance Improvements Investigated With Flame Graphs

October 3, 2016 by Luca Canali in Engineering Blog

This is a guest post from CERN, the European Organization for Nuclear Research. In this blog, Luca Canali of CERN investigates performance improvements...

Apache Spark @Scale: A 60 TB+ production use case from Facebook

August 31, 2016 by Sital Kedia, Shuojie Wang and Avery Ching in Solutions

This is a guest Apache Spark community blog from Facebook Engineering . In this technical blog, Facebook shares their usage of Apache Spark...

Databricks Bi-Weekly Digest: 7/18/16

July 18, 2016 by Jules Damji in Engineering Blog

Today, we're kicking off a new series: the Databricks Bi-Weekly Digest. Our goal with this digest is to summarize Spark related content, compiled...