Skip to main content
<
Page 22
>

Voice from Facebook: Using Apache Spark for Large-Scale Language Model Training

February 28, 2017 by Tejas Patil and Jing Zheng in
This is a guest post from Facebook. Tejas Patil and Jing Zheng, software engineers in the Facebook engineering team, show how to use...

Working with Complex Data Formats with Structured Streaming in Apache Spark 2.1

In part 1 of this series on Structured Streaming blog posts, we demonstrated how easy it is to write an end-to-end streaming ETL...

Processing a Trillion Rows Per Second on a Single Machine: How Can Nested Loop Joins be this Fast?

This blog post describes our experience debugging a failing test case caused by a cross join query running “too fast.” Because the root...

Intel’s BigDL on Databricks

February 8, 2017 by Sue Ann Hong and Joseph Bradley in
Try this notebook on Databricks Intel recently released its BigDL project for distributed deep learning on Apache Spark. BigDL has native Spark integration...

Real-time Streaming ETL with Structured Streaming in Apache Spark 2.1

Explore why lakehouses are the data architecture of the future with the father of the data warehouse, Bill Inmon. Try this notebook in...

Top 10 Apache Spark Blog Posts from 2016

December 30, 2016 by Jules Damji in
Spark Summit will be held in Dublin, Ireland on Oct 24-26, 2017. Check out the get your ticket before it sells out! Here’s...

Introducing Apache Spark 2.1

December 28, 2016 by Reynold Xin in
Spark Summit will be held in Boston on Feb 7-9, 2017. Check out the full agenda and get your ticket before it sells...

10 Things I Wish I Knew Before Using Apache SparkR

December 28, 2016 by Neil Dewar in
This is a guest post from Neil Dewar , a senior data science manager at a global asset management firm. In this blog...

Scalable Partition Handling for Cloud-Native Architecture in Apache Spark 2.1

Apache Spark 2.1 is just around the corner: the community is going through voting process for the release candidates. This blog post discusses...

Databricks Bi-Weekly Apache Spark Digest: 11/16/16

November 15, 2016 by Jules Damji in
Spark Summit Talks and Apache Spark Roundup Databricks and partners set a new world record for CloudSort 2016 Benchmark using Apache Spark...