Databricks on AWS

One platform for all your workloads.

Unified Analytics Platform

Basic (Preview)

Run Apache Spark™ batch applications

$0.07/DBU ?

Databricks Unit (DBU)
A unit of processing capability per hour, billed on a per-second usage. View the instance types Databricks supports.

Data Engineering

Run batch applications on Databricks’ optimized runtime for higher reliability and performance

$0.20/DBU ?

Databricks Unit (DBU)
A unit of processing capability per hour, billed on a per-second usage. View the instance types Databricks supports.

Data Analytics

Use the Databricks workspace to collaborate on projects, notebooks, and experiments

$0.40/DBU ?

Databricks Unit (DBU)
A unit of processing capability per hour, billed on a per-second usage. View the instance types Databricks supports.

The pricing shown above is for Databricks services only. It does not include pricing for any required AWS resources (e.g. compute instances).
 

Try Databricks Free

Feature Comparison

 

Basic

X
Apache Spark on Databricks platform
  • Clusters for running production jobs
  • Alerting and monitoring with retries
X
Easy to run production jobs including streaming with monitoring
  • Scheduler for running libraries
  • Production streaming with monitoring
 
Ability to use Scala, Python, R and SQL notebooks and notebook workflows
  • Schedule Scala, Python, R, SQL notebooks
  • Notebook workflows
 
Easy to manage and cost-effective clusters
  • Optimized autoscaling of compute
  • Autoscaling of instance storage
  • Automatic start and termination of clusters
 
Out-of-the-box ML frameworks
  • Apache Spark / Horovod integration
  • XGBoost support
  • TensorFlow, PyTorch and Keras support
 
Run MLflow on Databricks platform to simplify the end-to-end ML lifecycle
  • MLflow remote execution on Databricks platform
  • Databricks hosted tracking server
 
Robust pipelines serving clean, quality data supporting high performance batch and streaming analytics
  • ACID transactions
  • Schema management
  • Batch/Stream read/write support
  • SSD caching
  • Indexing
  • Table snapshotting
High-concurrency mode for multiple users
  • Persistent clusters for analytics
  • High concurrency clusters for multi-user sharing
Highly productive work among analysts and with other colleagues
  • Scala, Python, SQL and R notebooks
  • One-click visualization
  • Interactive dashboards
  • Collaboration
  • Revision history
  • Version control systems integration (Github, Bitbucket)
 
Ability to work with RStudio® and a range of third party BI tools
  • RStudio integration
  • BI Integration through JDBC/ODBC
 

Data Engineering

X
 
Apache Spark on Databricks platform
  • Clusters for running production jobs
  • Alerting and monitoring with retries
X
 
Easy to run production jobs including streaming with monitoring
  • Scheduler for running libraries
  • Production streaming with monitoring
X
 
Ability to use Scala, Python, R and SQL notebooks and notebook workflows
  • Schedule Scala, Python, R, SQL notebooks
  • Notebook workflows
X
 
Easy to manage and cost-effective clusters
  • Optimized autoscaling of compute
  • Autoscaling of instance storage
  • Automatic start and termination of clusters
X
 
Out-of-the-box ML frameworks
  • Apache Spark / Horovod integration
  • XGBoost support
  • TensorFlow, PyTorch and Keras support
X
 
Run MLflow on Databricks platform to simplify the end-to-end ML lifecycle
  • MLflow remote execution on Databricks platform
  • Databricks hosted tracking server
 
 
High-concurrency mode for multiple users
  • Persistent clusters for analytics
  • High concurrency clusters for multi-user sharing
 
 
Highly productive work among analysts and with other colleagues
  • Scala, Python, SQL and R notebooks
  • One-click visualization
  • Interactive dashboards
  • Collaboration
  • Revision history
  • Version control systems integration (Github, Bitbucket)
 
 
Ability to work with RStudio® and a range of third party BI tools
  • RStudio integration
  • BI Integration through JDBC/ODBC
 

Data Analytics

X
 
Apache Spark on Databricks platform
  • Clusters for running production jobs
  • Alerting and monitoring with retries
X
 
Easy to run production jobs including streaming with monitoring
  • Scheduler for running libraries
  • Production streaming with monitoring
X
 
Ability to use Scala, Python, R and SQL notebooks and notebook workflows
  • Schedule Scala, Python, R, SQL notebooks
  • Notebook workflows
X
 
Easy to manage and cost-effective clusters
  • Optimized autoscaling of compute
  • Autoscaling of instance storage
  • Automatic start and termination of clusters
X
 
Out-of-the-box ML frameworks
  • Apache Spark / Horovod integration
  • XGBoost support
  • TensorFlow, PyTorch and Keras support
X
 
Run MLflow on Databricks platform to simplify the end-to-end ML lifecycle
  • MLflow remote execution on Databricks platform
  • Databricks hosted tracking server
X
 
High-concurrency mode for multiple users
  • Persistent clusters for analytics
  • High concurrency clusters for multi-user sharing
X
 
Highly productive work among analysts and with other colleagues
  • Scala, Python, SQL and R notebooks
  • One-click visualization
  • Interactive dashboards
  • Collaboration
  • Revision history
  • Version control systems integration (Github, Bitbucket)
X
 
Ability to work with RStudio® and a range of third party BI tools
  • RStudio integration
  • BI Integration through JDBC/ODBC

Add-Ons

Operational Security

$0.15/DBU

For those requiring enterprise
security capabilities

This add-on includes all of the following:

  • Role-based access control for notebooks, clusters, jobs, tables
  • Single sign on with SAML 2.0 support
  • JDBC / ODBC authentication

Custom Deployment

Custom pricing

For those requiring additional
customization

You can choose from one up to all of the following:

  • Single tenant deployment
  • AWS GovCloud
  • HIPAA compliant
  • Audit logs
  • No public IPs for worker nodes
  • Customized CIDR range
  • Restricted network access for end users

Databricks Delta
(for legacy customers)

$0.15/ DBU

Robust pipelines serving clean, quality data supporting high performance batch and streaming analytics

This add-on includes all of the following:

  • ACID transactions
  • Schema management
  • Batch/Stream read/write support
  • SSD caching
  • Indexing
  • Table snapshotting

Pricing Example

A customer is using Databricks for three types of workloads

 
Run basic analytics in
production
Cluster Type
Basic
Usage
Streaming 24 hours/day on 6 c4.4xlarge instances
$604.80 per month
 

  6 instances

x 2 DBUs per instance hour

x 24 hours per day

x 30 days per month

x $0.07 per DBU
(Basic rate)

 
High performance streaming
analytics
Cluster Type
Data Engineering
Usage
Runs 6 hours/day on 5 r4.4large instances
$720 per month

  5 instances

x 4 DBUs per instance hour

x 6 hours per day

x 30 days per month

x $0.20 per DBU
(Data Engineering rate)

 

  5 instances

x 4 DBUs per instance hour

x 6 hours per day

x 30 days per month

x $0.20 per DBU
(Data Engineering rate)

 
Collaborative model development
and analytics
Cluster Type
Data Analytics
Usage
Runs 8 hours/day on 4 c4.2large instances
$384 per month

  4 instances

x 1 DBU per instance hour

x 8 hours per day

x 30 days per month

x $0.40 per DBU
(Data Analytics rate)

 

  4 instances

x 1 DBU per instance hour

x 8 hours per day

x 30 days per month

x $0.40 per DBU
(Data Analytics rate)

Total
$1708.80 per month

FAQs

 

What is a DBU?

A Databricks Unit (“DBU”) is a unit of processing capability per hour, billed on per-second usage. Databricks supports many AWS EC2 instance types. The larger the instance is, the more DBUs you will be consuming on an hourly basis. For example, 1 DBU is the equivalent of Databricks running on a c4.2xlarge machine for an hour. See the full list of supported instances and details.

I see three different prices for different types of clusters – Basic, Data Engineering and Data Analytics. How can I pick the one that I want?

In the same workspace, you can run different workloads on different cluster types to meet the specific requirements and cost profile of your work. Databricks meters your usage of each cluster type and bills you based on the unit price of each cluster. For example, you may schedule complex jobs on a Data Engineering cluster at $0.20 per DBU, run overnight jobs on a Basic cluster at $0.07 per DBU, and perform notebook analytics on an Data Analytics cluster at $0.40 per DBU.

There are two cluster options for production jobs – Basic and Data Engineering. How do I decide which one to use?

Basic (in Preview) is Databricks’ equivalent of open source Apache Spark. It targets simple, non-critical workloads that don’t need the performance, reliability, or autoscaling benefits provided by Databricks’ proprietary technologies. In comparison, the Data Engineering cluster provides you with all of the aforementioned benefits to boost your team productivity and reduce your total cost of ownership.

What’s the difference between production and interactive analysis workloads?

Production workloads (automated workloads) are defined as jobs that both start and terminate the clusters on which they run. For example, a workload may be triggered by the Databricks Job Scheduler which launches a new Apache Spark cluster solely for the job and automatically terminates the cluster after the job is complete.

Interactive analysis workloads are workloads that are not automated workloads, e.g., running a command within Databricks notebooks. These commands run on Apache Spark clusters that may persist until manually terminated. Multiple users can share a cluster for doing interactive analysis in a collaborative way.

Databricks Operational Security add-on package has an additional charge for DBU usage. Does it apply to all workload types?

Yes, Databricks Operational Security applies to all types of clusters offered. When you choose to deploy Databricks with Databricks Operational Security, an additional charge of $0.15 per DBU will be applied on top of the cluster price. Please contact us for details.

What does the free trial include?

The 14-day free trial gives you access to all Databricks features except the Databricks Operational Security Package and Custom Deployment options. Contact us if you are interested in Databricks Operational Security and / or Custom Deployment options.

Note that during trial, AWS will bill you directly for the EC2 instances created in Databricks.

What happens after the free trial?

At the end of the trial, you are automatically subscribed to Databricks without Databricks Enterprise Security. You can cancel your subscription at any time.

What is Databricks Community Edition?

Databricks Community Edition is a free, limited functionality platform designed for anyone who wants to learn Spark. Sign up here.

How will I be billed?

By default, you will be billed monthly based on per-second usage on your credit card. Contact us for more billing options, such as billing by invoice or an annual plan.

Do you provide technical support?

We offer technical support with annual commitments. Contact us to learn more or get started.