Spark Summit Europe Full Agenda available online

This October, join the Apache Spark community in Amsterdam at the Beurs Van Berlage for the very first Spark Summit in Europe! We are happy to announce that the full agenda is now finalized, you can find the full list of 39 community talks along with the first set of keynotes on Spark-Summit.org. Those looking to

Read

Spark 1.5 Preview Now Available in Databricks

We are excited to announce that starting today, Apache Spark 1.5.0 is available as a preview in Databricks. Our users can now choose to provision clusters with Spark 1.5 or previous Spark versions ready-to-go with a few clicks. Officially, Spark 1.5 is expected to be released in a few weeks, and the community is doing

Read

From Pandas to Apache Spark’s DataFrame

This is a cross-post from the blog of Olivier Girardot. Olivier is a software engineer and the co-founder of Lateral Thoughts, where he works on Machine Learning, Big Data, and DevOps solutions. With the introduction in Spark 1.4 of Window operations, you can finally port pretty much any relevant piece of Pandas’ DataFrame computation to Apache Spark parallel

Read

Announcing the Databricks Academic Partners Program

Databricks was born from academic research and today we are giving back to the academic community with the Databricks Academic Partners program. This program will provide academic instructors and researchers with free access to the Databricks platform for teaching and research. In collaboration with Amazon’s AWS in Education grants program, academics can also apply for

Read

Helping the Democratization of Big Data

When we started Databricks, we thought that extracting insights from big data was insanely difficult for no good reason. You almost needed an advanced degree to be able to get any meaningful work done. As a result, only a select few in each organization could ask questions from their big data, the people who set

Read

Guest blog: SequoiaDB Connector for Apache Spark

This is a guest blog from Tao Wang at SequoiaDB. He is the co-founder and CTO of SequoiaDB, leading its long-term technology vision, and is responsible for the leadership of advanced technology incubations. SequoiaDB is a JSON document-oriented transactional database. Why We Chose Spark SequoiaDB is a NoSQL database that has the capability to replicate

Read

Diving into Spark Streaming’s Execution Model

With so many distributed stream processing engines available, people often ask us about the unique benefits of Spark Streaming. From early on, Apache Spark has provided an unified engine that natively supports both batch and streaming workloads. This is different from other systems that either have a processing engine designed only for streaming, or have

Read

New Features in Machine Learning Pipelines in Spark 1.4

Spark 1.2 introduced Machine Learning (ML) Pipelines to facilitate the creation, tuning, and inspection of practical ML workflows.  Spark’s latest release, Spark 1.4, significantly extends the ML library.  In this post, we highlight  several new features in the ML Pipelines API, including: A stable API --- Pipelines have graduated from Alpha! New feature transformers Additional

Read

Using 3rd Party Libraries in Databricks: Spark Packages and Maven Libraries

In an earlier post, we described how you can easily integrate your favorite IDE with Databricks to speed up your application development. In this post, we will show you how to import 3rd party libraries, specifically Spark Packages, into Databricks by providing Maven coordinates. Background on Spark Packages Spark Packages (http://spark-packages.org) is a community package

Read

Yesware Deploys Production Data Pipeline in Record Time with Databricks

We are happy to announce that Yesware chose Databricks to build its production data pipeline, completing the project in record time -- in just under three weeks. Press release: http://www.marketwired.com/press-release/yesware-deploys-production-data-pipeline-in-record-time-with-databricks-2041188.htm Yesware, the leading sales acceleration software for sales teams at major enterprise companies such as eBay, New Relic, and IBM, enables sales professionals to have highly effective and

Read