Engineering | Databricks Blog

Page 68

Understanding your Apache Spark Application Through Visualization

June 22, 2015 by Andrew Or in Engineering

The greatest value of a picture is when it forces us to notice what we never expected to see. - John Tukey In...

Announcing Apache Spark 1.4

June 11, 2015 by Patrick Wendell in Engineering

Today I’m excited to announce the general availability of Apache Spark 1.4! Spark 1.4 introduces SparkR, an R API targeted towards data scientists...

Announcing SparkR: R on Apache Spark

June 9, 2015 by Shivaram Venkataraman in Engineering

I am excited to announce that the upcoming Apache Spark 1.4 release will include SparkR, an R package that allows data scientists to...

Statistical and Mathematical Functions with DataFrames in Apache Spark

June 2, 2015 by Burak Yavuz and Reynold Xin in Engineering

We introduced DataFrames in Apache Spark 1.3 to make Apache Spark much easier to use. Inspired by data frames in R and Python...

Project Tungsten: Bringing Apache Spark Closer to Bare Metal

April 28, 2015 by Reynold Xin and Josh Rosen in Engineering

In a previous blog post , we looked back and surveyed performance improvements made to Apache Spark in the past year. In this...

Recent performance improvements in Apache Spark: SQL, Python, DataFrames, and More

April 24, 2015 by Reynold Xin in Engineering

Read Rise of the Data Lakehouse to explore why lakehouses are the data architecture of the future with the father of the data...

New MLlib Algorithms in Apache Spark 1.3: FP-Growth and Power Iteration Clustering

April 17, 2015 by Jacky Li, Fan Jiang, Youhua Zhang, Stephen Boesch and Bing Xiao in Engineering

This is a guest blog post from Huawei’s big data global team. Huawei, a Fortune Global 500 private company, has put together a...

Running Apache Spark GraphX algorithms on Library of Congress subject heading SKOS

April 14, 2015 by Bob DuCharme in Engineering

This is a guest post from Bob DuCharme. Original article appeared in: http://www.snee.com/bobdc.blog/2015/04/running-spark-graphx-algorithm.html Well, one algorithm, but a very cool one. Last month...

Deep Dive into Spark SQL's Catalyst Optimizer

April 13, 2015 by Michael Armbrust, Yin Huai, Cheng Liang, Reynold Xin and Matei Zaharia in Engineering

Check out the Why the Data Lakehouse is Your Next Data Warehouse ebook to discover the inner workings of the Databricks Lakehouse Platform...

Apache Spark 2.0: Rearchitecting Spark for Mobile Platforms

March 31, 2015 by Reynold Xin in Engineering

Yesterday, to celebrate Apache Spark’s 5 year old birthday, we looked back at the history of the project. Today, we are happy to...