Databricks at Strata San Jose

The Strata + Hadoop World Conference in San Jose last week was abuzz with "putting data to work" in keeping with this year's conference theme. This was a significant shift from last year's event where organizations were highly focused on getting their arms around their big data projects and being steeped in evaluating the multitude of tools…

Read

Extending MemSQL Analytics with Spark

This is a guest blog from our one of our partners: MemSQL   Summary Coupling operational data with the most advanced analytics puts data-driven business ahead. The MemSQL Spark Connector enables such configurations. Meeting Transactional and Analytical Needs Transactional databases form the core of modern business operations. Whether that transaction is financial, physical in terms…

Read

Introducing DataFrames in Spark for Large Scale Data Science

Today, we are excited to announce a new DataFrame API designed to make big data processing even easier for a wider audience. When we first open sourced Spark, we aimed to provide a simple API for distributed data processing in general-purpose programming languages (Java, Python, Scala). Spark enabled distributed data processing through functional transformations on…

Read

Spark: A review of 2014 and looking ahead to 2015 priorities

2014 has been a year of tremendous growth for Apache Spark.  It became the most active open source project in the Big Data ecosystem with over 400 contributors, and was adopted by many platform vendors - including all of the major Hadoop distributors.  Through our ecosystem of products, partners, and training at Databricks, we also…

Read

Automatic Labs Selects Databricks Cloud for Primary Real-Time Data Processing

We're really excited to share that Automatic Labs has selected Databricks Cloud as its preferred big data processing platform. Press release: http://www.marketwired.com/press-release/automatic-labs-turns-databricks-cloud-faster-innovation-dramatic-cost-savings-1991316.htm Automatic Labs needed to run large and complex queries against their entire data set to explore and come up with new product ideas. Their prior solution using Postgres impeded the ability of Automatic’s team…

Read

“Learning Spark” book available from O’Reilly

Today we are happy to announce that the complete Learning Spark book is available from O’Reilly in e-book form with the print copy expected to be available February 16th. At Databricks, as the creators and driving force behind Spark, we have witnessed explosive growth in the interest and adoption of Spark, which has quickly become…

Read

Apache Spark selected for Infoworld 2015 Technology of the Year Award

Recently Infoworld unveiled the 2015 Technology of the Year Award winners, which range from open source software to stellar consumer technologies like the iPhone.  Being the creators and driving force behind Spark, Databricks is thrilled to see Spark in their ranks.  In fact, we built our flagship product, Databricks Cloud, on top of Spark with…

Read

An introduction to JSON support in Spark SQL

Note: Starting Spark 1.3, SchemaRDD will be renamed to DataFrame. In this blog post, we introduce Spark SQL’s JSON support, a feature we have been working on at Databricks to make it dramatically easier to query and create JSON data in Spark. With the prevalence of web and mobile applications, JSON has become the de-facto interchange format for web service API’s as well as long-term storage. With existing tools, users often engineer complex pipelines to read and write JSON data sets within analytical systems. Spark SQL’s JSON support, released in version 1.1 and enhanced in Spark 1.2, vastly simplifies the end-to-end-experience of working with JSON data.

Read

Introducing streaming k-means in Spark 1.2

Many real world data are acquired sequentially over time, whether messages from social media users, time series from wearable sensors, or — in a case we are particularly excited about — the firing of large populations of neurons. In these settings, rather than wait for all the data to be acquired before performing our analyses,…

Read

Big data projects are hungry for simpler and more powerful tools: Survey validates Apache Spark is gaining developer traction!

In partnership with Typesafe, we are excited to see the publication of the survey report representing the largest poll of Spark developers to date. Spark is currently the most active open source project in big data and has been rapidly gaining traction over the past few years. This survey of over 2100 respondents further validates…

Read