Open Source | Databricks Blog

Page 24

Introducing Apache Spark 2.0

July 26, 2016 by Reynold Xin, Michael Lumb and Matei Zaharia in Engineering

Today, we're excited to announce the general availability of Apache Spark 2.0 on Databricks. This release builds on what the community has learned...

Databricks Bi-Weekly Digest: 7/18/16

July 18, 2016 by Jules Damji in Engineering

Today, we're kicking off a new series: the Databricks Bi-Weekly Digest. Our goal with this digest is to summarize Spark related content, compiled...

A Tale of Three Apache Spark APIs: RDDs vs DataFrames and Datasets

July 14, 2016 by Jules Damji in Engineering

Of all the developers' delight, none is more attractive than a set of APIs that make developers productive, that is easy to use...

SQL Subqueries in Apache Spark 2.0

June 17, 2016 by Davies Liu and Herman van Hövell in Engineering

Try this notebook in Databricks In the upcoming Apache Spark 2.0 release, we have substantially expanded the SQL standard capabilities. In this brief...

Apache Spark 2.0: An Anthology of Technical Assets

June 1, 2016 by Jules Damji in Engineering

Older anthologies collated a collection of contributions from various authors around a theme—bounded then as a journal or periodical. Newer anthologies include multiple...

Genome Sequencing in a Nutshell

May 24, 2016 by Deborah Siegel in Engineering

This is a guest post from Deborah Siegel from the Northwest Genome Center and the University of Washington with Denny Lee from Databricks...

Parallelizing Genome Variant Analysis

May 24, 2016 by Deborah Siegel in Engineering

This is a guest post from Deborah Siegel from the Northwest Genome Center and the University of Washington with Denny Lee from Databricks...

Predicting Geographic Population using Genome Variants and K-Means

May 24, 2016 by Deborah Siegel in Engineering

Spark Summit 2016 will be held in San Francisco on June 6–8. Check out the full agenda and get your ticket This is...

Apache Spark as a Compiler: Joining a Billion Rows per Second on a Laptop

May 23, 2016 by Sameer Agarwal, Davies Liu and Reynold Xin in Engineering

When our team at Databricks planned our contributions to the upcoming Apache Spark 2.0 release, we set out with an ambitious goal by...

Approximate Algorithms in Apache Spark: HyperLogLog and Quantiles

May 19, 2016 by Tim Hunter, Hossein Falaki and Joseph Bradley in Solutions

Introduction Apache Spark is fast, but applications such as preliminary data exploration need to be even faster and are willing to sacrifice some...