Skip to main content
<
Page 187
>

Voice from Facebook: Using Apache Spark for Large-Scale Language Model Training

February 28, 2017 by Tejas Patil and Jing Zheng in
This is a guest post from Facebook. Tejas Patil and Jing Zheng, software engineers in the Facebook engineering team, show how to use...

How Apache Spark on Databricks is Taming the Wild West of Wi-Fi

February 27, 2017 by Tomasz Magdanski in
iPass is the world’s largest Wi-Fi provider, yet we don’t own a single hotspot. You can think of us as the Uber of...

Working with Complex Data Formats with Structured Streaming in Apache Spark 2.1

In part 1 of this series on Structured Streaming blog posts, we demonstrated how easy it is to write an end-to-end streaming ETL...

Processing a Trillion Rows Per Second on a Single Machine: How Can Nested Loop Joins be this Fast?

This blog post describes our experience debugging a failing test case caused by a cross join query running “too fast.” Because the root...

Anonymizing Datasets at Scale Leveraging Databricks Interoperability

February 13, 2017 by Don Hillborn in
A key challenge for data-driven companies across a wide range of industries is how to leverage the benefits of analytics at scale when...

Spark Summit East 2017: Another Record-Setting Spark Summit

February 9, 2017 by Jules Damji, Wayne Chan and Dave Wang in
We’ve put together a short recap of the keynotes and highlights from Databricks’ speakers for Apache Spark enthusiasts who could not attend the...

Intel’s BigDL on Databricks

February 8, 2017 by Sue Ann Hong and Joseph Bradley in
Try this notebook on Databricks Intel recently released its BigDL project for distributed deep learning on Apache Spark. BigDL has native Spark integration...

Announcing the Spark Live 2017 World Tour

January 31, 2017 by Wayne Chan in
Due to the enthusiasm and positive feedback from last year’s Spark Live tour, we will be hitting the road again in 2017 to...

Integrating Your Central Apache Hive Metastore with Apache Spark on Databricks

January 30, 2017 by Miklos Christine in
Databricks provides a managed Apache Spark platform to simplify running production applications, real-time data exploration, and infrastructure complexity. A key piece of the...

Delivering Exceptional Care Through Data-Driven Medicine

January 25, 2017 by Jorge Caballero in
This is a guest blog from our friends at Distal. Today, 96% of U.S. health care providers use electronic health records (EHRs) -...