Skip to main content

We are excited to announce today, at Spark Summit 2015, the general availability of the Databricks – a hosted data platform from the team that created Apache Spark. With Databricks, you can effortlessly launch Spark clusters, explore data interactively, run production jobs, and connect third-party applications. We believe Databricks is the easiest way to use big data.

Our vision at Databricks is to make big data simple and enable every organization to turn its data into value. We first unveiled Databricks at Spark Summit 2014, and launched it in limited availability in November. The excitement for the platform has been fantastic, with thousands of people requesting access and tremendous feedback from users. We’re now delighted to take the next step towards this vision by making Databricks available to everyone.

We want to enable that instant productivity for all of your data problems.

[embed]https://vimeo.com/130273206[/embed]

 

Why Databricks?

As many data scientists and engineers can attest, the majority of their time is spent not on the data analysis itself but on the supporting infrastructure. Key issues include deploying software, keeping production jobs up, and connecting disparate tools to process and visualize data. Equally problematic is the need for data engineers to re-implement the models developed by data scientists for production. With Databricks, data scientists and engineers can eliminate these issues and just spend their time focusing on their data.

 

Instant Spark Clusters

Instead of taking weeks to months to provision hardware, instantly launch and manage optimized Spark clusters on Amazon EC2. You can scale from a few nodes to hundreds, all with just a few clicks. You can also use Amazon spot instances to save on costs.

cluster-screen

 

Interactively Explore and Visualize Your Data

Databricks includes notebooks, an interactive and collaborative multi-user environment for exploration and visualization. You can combine text, code execution, visualization, and advanced analytics such as machine learning (MLlib) and graphs (GraphX) – all within the same notebook. You can write notebooks in SQL, Python, Scala, Java, and R. This way Databricks allows you to be instantly productive in your language of choice.

mobile devices by geo

 

Easily Deploy Production Pipelines

Databricks has a powerful Jobs feature for taking applications from prototype to production. With Jobs, you can run both notebooks and standalone Spark or Spark Streaming programs. Jobs include a flexible scheduler, email alerts, automatic retries, run history, and cluster reuse. Furthermore, Databricks Jobs are the first Software as a Service platform to support Spark Streaming, making it easy to deploy scalable, fault-tolerant streaming applications. Databricks Jobs are simply the easiest way to run Spark applications.

databricks_jobs_screenshot

 

New Features for GA

With the launch of General Availability today, we’re also releasing three new features that users have been requesting. These are immediately available in Databricks deployments:

  • Spark 1.4 support:  Choose Apache Spark 1.4 when provisioning a Databricks cluster.
  • Spark Streaming in notebooks: Experiment with Spark Streaming interactively in notebooks, or deploy it in production jobs.
  • Improved commenting: Comment on individual text selections within a notebook and respond to comments via the new sidebar.

new-cluster-version

 

What’s Coming Next?

In addition to general availability, at Spark Summit we have also announced several major new features that we are rolling out over the next few months. Expect to see these in your Databricks deployments soon:

  • R-language notebooks: Analyze data using R and SparkR, including all of R’s standard visualization and statistics packages.
  • Access control and private notebooks: Manage permissions to view and execute code at an individual level.
  • Version control and GitHub: Track changes to source code in Databricks, and store notebooks in GitHub to work with them from outside the platform.

 

How to Get Started

Databricks runs in your own Amazon Web Services account or Virtual Private Cloud. To try it out, sign up for a 14-day free trial today.

For more information on Databricks, check out:

Try Databricks for free

Related posts

Apache Spark Key Terms, Explained

June 22, 2016 by Jules Damji and Denny Lee in
This article was originally posted on KDnuggets The Spark Summit Europe call for presentations is open, submit your idea today As observed in...

A Deep Dive into the Latest Performance Improvements of Stateful Pipelines in Apache Spark Structured Streaming

This post is the second part of our two-part series on the latest performance improvements of stateful pipelines. The first part of this...

Apache Spark Turns Five Years Old!

March 31, 2015 by Matei Zaharia in
Today, we’re celebrating an important milestone for the Apache Spark project -- it’s now been five years since Spark was first open sourced...
See all Product posts