Skip to main content
<
Page 24
>

Introducing Apache Spark 2.0

Today, we're excited to announce the general availability of Apache Spark 2.0 on Databricks. This release builds on what the community has learned...

Databricks Bi-Weekly Digest: 7/18/16

July 18, 2016 by Jules Damji in
Today, we're kicking off a new series: the Databricks Bi-Weekly Digest. Our goal with this digest is to summarize Spark related content, compiled...

A Tale of Three Apache Spark APIs: RDDs vs DataFrames and Datasets

July 14, 2016 by Jules Damji in
Of all the developers' delight, none is more attractive than a set of APIs that make developers productive, that is easy to use...

SQL Subqueries in Apache Spark 2.0

Try this notebook in Databricks In the upcoming Apache Spark 2.0 release, we have substantially expanded the SQL standard capabilities. In this brief...

Apache Spark 2.0: An Anthology of Technical Assets

June 1, 2016 by Jules Damji in
Older anthologies collated a collection of contributions from various authors around a theme—bounded then as a journal or periodical. Newer anthologies include multiple...

Genome Sequencing in a Nutshell

May 24, 2016 by Deborah Siegel in
This is a guest post from Deborah Siegel from the Northwest Genome Center and the University of Washington with Denny Lee from Databricks...

Parallelizing Genome Variant Analysis

May 24, 2016 by Deborah Siegel in
This is a guest post from Deborah Siegel from the Northwest Genome Center and the University of Washington with Denny Lee from Databricks...

Predicting Geographic Population using Genome Variants and K-Means

May 24, 2016 by Deborah Siegel in
Spark Summit 2016 will be held in San Francisco on June 6–8. Check out the full agenda and get your ticket This is...

Apache Spark as a Compiler: Joining a Billion Rows per Second on a Laptop

When our team at Databricks planned our contributions to the upcoming Apache Spark 2.0 release, we set out with an ambitious goal by...

Approximate Algorithms in Apache Spark: HyperLogLog and Quantiles

Introduction Apache Spark is fast, but applications such as preliminary data exploration need to be even faster and are willing to sacrifice some...