Skip to main content

The Best of The Databricks Blog: Most Read Posts of 2015

Dave Wang

in

Share this post

Databricks developers are prolific blog authors when they are not writing code for the Databricks platform or Apache Spark. As 2015 draws to a close, we did a quick tally of page views across all the blog posts published during this year to understand what topics attracted the most interest amongst our readers.

The result indicates that people are most interested in announcements of new Spark features and practical guides on tuning Spark. While blog posts published earlier in the year have an advantage, there is a clear winner by a wide margin. So here we give you a countdown to the most popular Databricks blog posts of 2015.

The countdown

#10: Introducing Streaming K-Means in Spark 1.2

#9: ML Pipelines: A New High-Level API for MLlib

#8: Deep Dive into Spark SQL’s Catalyst Optimizer

#7: Tuning Java Garbage Collection for Spark Applications

#6: An Introduction to JSON Support in Spark SQL

#5: Using MongoDB with Spark

#4: Announcing Apache Spark 1.4

#3: Announcing SparkR: R on Spark

#2: Project Tungsten: Bringing Spark Closer to Bare Metal

And the winner is…

#1: Introducing DataFrames in Spark for Large Scale Data Science!

But wait, what about the classics?

A few blog posts written in 2014 remain extremely popular more than a year later, Spark SQL is the hands down winner in this category:

Shark, Spark SQL, Hive on Spark, and the future of SQL on Spark

Spark SQL: Manipulating Structured Data Using Spark

Spark and Hadoop: Working Together

Spark the fastest open source engine for sorting a petabyte

We promise to keep creating great content for the community in the upcoming year. Keep an eye out for tutorials and deep-dives on the upcoming release of Apache Spark here. Follow us on Twitter or LinkedIn to stay up to date!

Try Databricks for free

Related posts

Detecting Financial Fraud at Scale with Decision Trees and MLflow on Databricks

Try this notebook in Databricks Detecting fraudulent patterns at scale using artificial intelligence is a challenge, no matter the use case. The massive...

Real-Time End-to-End Integration with Apache Kafka in Apache Spark’s Structured Streaming

April 4, 2017 by Sunil Sitaula in
View the Notebook in Databricks Community Edition Structured Streaming APIs enable building end-to-end streaming applications called continuous applications in a consistent, fault-tolerant manner...

The Quest for Hidden Treasure: An Apache Spark Connector for the Riak NoSQL database

August 11, 2016 by Pavel Hardak in
View this notebook in Databricks This is a guest blog from our friends at Basho. Pavel Hardak is a director of product management...
See all Engineering Blog posts