Today, we are excited to announce the public preview of Databricks support for AWS Graviton2-based Amazon Elastic Compute Cloud (Amazon EC2) instances. The Graviton processors are custom designed and optimized by AWS to deliver the best price-performance for cloud workloads running in Amazon EC2. When used with Photon, the high performance Databricks query engine, Graviton2-based Amazon EC2 instances can deliver up to 3x-4x better price-performance than comparable Amazon EC2 instances for your data lakehouse workloads. In this blog post, we will go over the price-performance of Photon with Graviton2, and also give you additional tips to further reduce your AWS infrastructure cost.
Price-performance with Photon and Graviton2
To determine the price-performance of Photon + Graviton2, we did a simple test running two different workloads (TPC-DS and a standard ETL workload with bulk inserts and merge statements) on an Graviton2-based R6gd EC2 instance and a comparable I3 EC2 instance. We found that just the Photon engine significantly improved the price-performance for an EC2 instance. But Photon on the Graviton2-based instance took it a step further and delivered 3.3x better price-performance for the ETL workload and 3.7x better price-performance for the TPC-DS workload compared to the previous Databricks runtime on the I3 instance. Customers who tried Graviton2-based instances have reported similar results and share our excitement! Here’s a quote from a Databricks customer who happens to know all about Arm-based Graviton instances
“Cloud computing is driving significant innovation in semiconductor design, and by moving our design workloads to Arm-based AWS Graviton2-based instances that provide significant price performance gains, we see first-hand the benefits enabled by the Arm Neoverse N1 platform,” said Mark Galbraith, VP of productivity engineering, Arm. “This is especially evident for Databricks on Graviton2 and we look forward to migrating our production use of Databricks to Graviton2 to further enhance user experience and reduce expenses.”
Additional cost savings with Amazon EC2 Spot Instances and Amazon EBS gp3 volumes support
In addition to Graviton2 and Photon, there are other ways to improve price-performance for your Databricks workloads on AWS. These include:
- Amazon EC2 Spot Instances – Spot Instances enables you to take advantage of spare EC2 capacity and are available at up to a 90% discount compared to On-Demand prices. Depending on the nature of your workload, you may be able to replace the On-Demand or Reserved EC2 instances in your Databricks cluster with Spot Instances and save cost.
- Amazon EBS gp3 volumes – Storage can be a big part of your cloud infrastructure cost. Databricks supports gp3 volumes. gp3 SSD volumes for Amazon Elastic Block Store (Amazon EBS) enable you to provision performance independent of storage capacity and can provide up to 20% better price-performance per GB than existing gp2 volumes.
To learn more about price-performance optimizations, please read our cluster best practices documentation.
Get Started with Graviton2
AWS Graviton2-based instance support in public preview is currently rolling out and will be available in all supported regions in the next few weeks. To get started and for guidance on migration to Graviton2 and Photon please read our Graviton documentation.