Databricks Bi-Weekly Apache Spark Digest: 11/16/16

Published: November 16, 2016

Free Edition has replaced Community Edition, offering enhanced features at no cost. Start using Free Edition today.

Spark Summit Talks and Apache Spark Roundup

Databricks and partners set a new world record for CloudSort 2016 Benchmark using Apache Spark, wrote Reynold Xin, chief architect.
Databricks Chief Technologist Matei Zaharia delivered a keynote, “Simplifying Big Data Applications with Apache Spark 2.0,” at Spark Summit 2016 EU in Brussels, followed by a demo of continuous application by Databricks software engineer Greg Owen.
Databricks CEO Ali Ghodsi shared his vision of “Democratizing AI with Apache Spark” in his keynote at Spark Summit 2016 EU in Brussels.
Executive Chairman of Databricks Ion Stoica announced “The Next AmpLAB: Real-time, Intelligent, and Secure Computing,” in his keynote at Spark Summit 2016 EU in Brussels.
Sameer Agarwal, software engineer at Databricks, presented “Apache Spark’s Performance: Project Tungsten and Beyond,” at Spark Summit 2016 EU in Brussels.
Herman Van Hovell, software engineer at Databricks, gave a “Deep Dive into the Catalyst Optimizer” talk and a hands-on lab at Spark Summit 2016 EU in Brussels.
Echoing Ali Ghodsi’s keynote above, Tim Hunter, Databricks software engineer, showed how to use Apache Spark with TensorFlow: “TensorFrames: Deep Learning with TensorFlow on Apache Spark,” at Spark Summit 2016 EU in Brussels.
Databricks Solution Architect Mikos Christine shared challenges and pitfalls you can avoid with Spark Streaming in his talk “Paddling up the Stream,” at Spark Summit 2016 EU in Brussels.
Facebook’s Big Compute Team software engineer Sital Kedia described how Apache Spark scales in production in his talk: “Apache Spark at Scale: A 60 TB+ Production Use Case” at Spark Summit 2016 EU in Brussels.
Morning Paper blogger Adrian Colyer commented on Michael Armbrust et. al. article “Scaling Spark in the Real World: Performance and Usability.”
Matei Zaharia, Reynold Xin et.al. contributed to Communications of ACM: “Apache Spark: A Unified Engine for Big Data Processing.”
Tim Hunter, Databricks software engineer, participated on the panel "Modern Software Architectures and Data Pipelines" at Scala by the Bay.

Releases

GraphFrames 0.3.0 released as a spark package. Find out more from graphframes.github.io.
Apache Spark 1.6.3 Released. Try it on Databricks Community Edition.
Apache Spark 2.0.2 Released. Kafka 0.10 support and runtime metrics are the two notable features in Structured Streaming in this release. Try it on Databricks Community Edition.
Databricks released spark-redshift v3.0.0-preview1 spark package with usability improvements. Learn more about its improvements at Redshift Data Source for Apache Spark.

What’s Next

To stay abreast with what’s happening with Apache Spark, follow us on Twitter @databricks and visit SparkHub.

Spark Summit Talks and Apache Spark Roundup

A Hands-On Guide to Apps on Databricks

Releases

What’s Next

Never miss a Databricks post

Sign up