Voice from Facebook: Using Apache Spark for Large-Scale Language Model TrainingFebruary 28, 2017 by Tejas Patil and Jing Zheng in Engineering Blog This is a guest post from Facebook. Tejas Patil and Jing Zheng, software engineers in the Facebook engineering team, show how to use...
Working with Complex Data Formats with Structured Streaming in Apache Spark 2.1February 23, 2017 by Burak Yavuz, Michael Armbrust, Tathagata Das and Tyson Condie in Engineering Blog In part 1 of this series on Structured Streaming blog posts, we demonstrated how easy it is to write an end-to-end streaming ETL...
Processing a Trillion Rows Per Second on a Single Machine: How Can Nested Loop Joins be this Fast?February 16, 2017 by Reynold Xin, Ala Luszczak and Bogdan Raducanu in Engineering Blog This blog post describes our experience debugging a failing test case caused by a cross join query running “too fast.” Because the root...
Intel’s BigDL on DatabricksFebruary 8, 2017 by Sue Ann Hong and Joseph Bradley in Engineering Blog Try this notebook on Databricks Intel recently released its BigDL project for distributed deep learning on Apache Spark. BigDL has native Spark integration...
Real-time Streaming ETL with Structured Streaming in Apache Spark 2.1January 19, 2017 by Tathagata Das, Michael Armbrust and Tyson Condie in Engineering Blog Explore why lakehouses are the data architecture of the future with the father of the data warehouse, Bill Inmon. Try this notebook in...
Top 10 Apache Spark Blog Posts from 2016December 30, 2016 by Jules Damji in Engineering Blog Spark Summit will be held in Dublin, Ireland on Oct 24-26, 2017. Check out the get your ticket before it sells out! Here’s...
Introducing Apache Spark 2.1December 28, 2016 by Reynold Xin in Engineering Blog Spark Summit will be held in Boston on Feb 7-9, 2017. Check out the full agenda and get your ticket before it sells...
10 Things I Wish I Knew Before Using Apache SparkRDecember 28, 2016 by Neil Dewar in Engineering Blog This is a guest post from Neil Dewar , a senior data science manager at a global asset management firm. In this blog...
Scalable Partition Handling for Cloud-Native Architecture in Apache Spark 2.1December 15, 2016 by Eric Liang, Michael Allman and Wenchen Fan in Engineering Blog Apache Spark 2.1 is just around the corner: the community is going through voting process for the release candidates. This blog post discusses...
Databricks Bi-Weekly Apache Spark Digest: 11/16/16November 15, 2016 by Jules Damji in Engineering Blog Spark Summit Talks and Apache Spark Roundup Databricks and partners set a new world record for CloudSort 2016 Benchmark using Apache Spark...