This is a collaborative post from Databricks and Intel. We thank the authors from Intel for their contributions.
Customers can now leverage Databricks Photon together with AWS i4i instance types, which have the latest Intel (Ice Lake) 3rd Gen Xeon scalable processors and Intel Advanced Vector Extensions 512 (Intel® AVX-512), to reduce costs and increase performance of their data processing, analytical and ML/AI workloads.
Improving the performance of workloads on Databricks reduces the total cost of ownership (TCO) and achieves lower time to insights. The latest innovations from the Databricks and Intel partnership allow companies to realize these improvements by simply enabling Photon and using the AWS i4i instance family. No code changes are necessary to benefit; users can continue to focus on delivering business value.
Here are the key characteristics of each instance type:
AWS i4i.2xlarge | AWS i3.2xlarge | |
---|---|---|
CPU family | 3rd Gen Intel Xeon® Scalable Processor (Ice Lake) | Intel® Xeon® E5-2686 v4 |
vCPUs | 8 | 8 |
Memory (GiB) | 64 | 61 |
Instance Storage (GiB) | 1 x 1,875 AWS Nitro SSD | 1 x 1900 NVMe SSD |
Network Bandwidth (Gbps) | Up to 12 | Up to 10 |
On-Demand hourly rate ($/hr)2 | $0.686/hr | $0.624/hr |
The baseline uses AWS i3.2xlarge instances, and Databricks Runtime (DBR) 11.0 (without Photon) enabled. The same workload was run on AWS i4i.2xlarge instances (without Photon). The change of instance family, from i3 to i4i, was responsible for a 1.4x speed improvement for both 1TB and 10TB dataset volumes.
After enabling Photon, the same workload was measured again.
With Photon and AWS i4i instances, up to a 5.3x performance speedup and a 2.5x price-performance improvement was observed when comparing to the non-Photon AWS i3 configurations.
Photon is able to lower the TCO of a workload, delivering price-performance improvements by reducing the necessary AWS instance uptime due to significantly accelerated query processing.
The Databricks Lakehouse platform makes it easy to take advantage of these performance improvements. When creating your Databricks cluster you simply select the "Use Photon Acceleration" option and choose AWS i4i instances for the worker type. There is no need to change any of your code as Photon is fully compatible with the Apache Spark API.
For these TPC-DS derived benchmarks, pairing Databricks Photon and AWS i4i instances with 3rd Gen Intel Ice Lake Xeon Scalable processors, resulted in up to a 2.5x cost improvement and 5.3x latency improvement compared to i3 instances without Photon. Follow the links below for additional information.
databricks.com/lakehouse
databricks.com/photon
intel.com/xeonscalable
intel.com/avx512
aws.amazon.com/ec2/instance-types/i4i/
Footnotes
1 Derived from the power test consisting of all 99 TPC-DS queries. These results are not comparable to an official, audited TPC benchmark. Databricks' official TPC-DS results can be found here.
2 On-demand EC2 pricing taken from us-east-1 as of 2022-09-07.