Guest blog: PMML Support in Spark MLlib

This is a guest blog from our friend Vincenzo Selvaggio. The recently released Apache Spark 1.4 introduces PMML support to MLlib for linear models and k-means clustering. This achievement is the result of active discussions from the community on JIRA (https://issues.apache.org/jira/browse/SPARK-1406) and GitHub (https://github.com/apache/spark/pull/3062) and embraces interoperability between Apache Spark and other platforms when it…

Read

MyFitnessPal Delivers New Feature, Speeds up Pipeline, and Boosts Team Productivity with Databricks

To learn more about how Databricks helped MyFitnessPal with analytics, check out an earlier article in Wall Street Journal (log-in required) or download the case study. We are excited to announce that MyFitnessPal (An Under Armour company) uses Databricks to build the production pipeline for its new “Verified Foods” feature, gaining many performance and productivity…

Read

Databricks Launches Second MOOC: Scalable Machine Learning

We have been working in collaboration with professors at UC Berkeley and UCLA to produce two freely available Massive Open Online Courses (MOOCs). The first MOOC was released earlier this month and has been a tremendous success, with over 60K students enrolled and a large number of active students.  We are excited to announce that…

Read

Understanding your Spark application through visualization

The greatest value of a picture is when it forces us to notice what we never expected to see. - John Tukey In the past, the Spark UI has been instrumental in helping users debug their applications. In the latest Spark 1.4 release, we are happy to announce that the data visualization wave has found…

Read

A Look Back at Spark Summit 2015

UPDATE: Slides and videos from the Summit are now available! Check them out now! We are delighted about the success of  Spark Summit 2015 in San Francisco on June 15th and 16th, with three different sold-out Spark Training sessions on June 17th.   This is the largest Spark Summit to date with more than 2000 attendees!…

Read

Guest blog: How Customers Win with Spark on Hadoop

This is a guest post from our friends at MapR.   This blog summarizes my conversations over the last few months with users who have deployed Spark in production on the MapR Distribution including Hadoop. My key observations overall are that Spark is indeed making inroads into our user community, which is leveraging not just…

Read

Guest blog: Zen and the Art of Spark Maintenance with Cassandra

This is a guest post from our friends at DataStax. Apache Cassandra™ is a fully distributed, highly scalable database that allows users to create online applications that are always-on and can process large amounts of data.  Apache Spark™ is a processing engine that enables applications in Hadoop clusters to run up to 100X faster in…

Read

Databricks is now Generally Available

We are excited to announce today, at Spark Summit 2015, the general availability of the Databricks – a hosted data platform from the team that created Apache Spark. With Databricks, you can effortlessly launch Spark clusters, explore data interactively, run production jobs, and connect third-party applications. We believe Databricks is the easiest way to use…

Read

Databricks and IBM Collaborate to Enhance Apache Spark Machine Learning

At today’s Spark Summit, Databricks and IBM announced a joint effort to contribute key machine learning capabilities to the Apache Spark Project.  Over the course of the next few months, Databricks and IBM will collaborate to expand Spark’s machine learning capabilities. The companies plan to introduce new domain specific algorithms to the Spark ecosystem and…

Read

Announcing Apache Spark 1.4

  Today I’m excited to announce the general availability of Spark 1.4! Spark 1.4 introduces SparkR, an R API targeted towards data scientists. It also evolves Spark’s DataFrame API with a large number of new features. Spark's ML pipelines API first introduced in Spark 1.3 graduates from an alpha component. Finally, Spark Streaming and Core…

Read