Databricks developers are prolific blog authors when they are not writing code for the Databricks platform or Apache Spark. As 2015 draws to a close, we did a quick tally of page views across all the blog posts published during this year to understand what topics attracted the most interest amongst our readers.
The result indicates that people are most interested in announcements of new Spark features and practical guides on tuning Spark. While blog posts published earlier in the year have an advantage, there is a clear winner by a wide margin. So here we give you a countdown to the most popular Databricks blog posts of 2015.
The countdown
#10: Introducing Streaming K-Means in Spark 1.2
#9: ML Pipelines: A New High-Level API for MLlib
#8: Deep Dive into Spark SQL’s Catalyst Optimizer
#7: Tuning Java Garbage Collection for Spark Applications
#6: An Introduction to JSON Support in Spark SQL
#4: Announcing Apache Spark 1.4
#3: Announcing SparkR: R on Spark
#2: Project Tungsten: Bringing Spark Closer to Bare Metal
And the winner is…
#1: Introducing DataFrames in Spark for Large Scale Data Science!
But wait, what about the classics?
A few blog posts written in 2014 remain extremely popular more than a year later, Spark SQL is the hands down winner in this category:
Shark, Spark SQL, Hive on Spark, and the future of SQL on Spark
Spark SQL: Manipulating Structured Data Using Spark
Spark and Hadoop: Working Together
Spark the fastest open source engine for sorting a petabyte
We promise to keep creating great content for the community in the upcoming year. Keep an eye out for tutorials and deep-dives on the upcoming release of Apache Spark here. Follow us on Twitter or LinkedIn to stay up to date!