Skip to main content

Faster insights With Databricks Photon Using AWS i4i Instances With the Latest Intel Ice Lake Scalable Processors

Up to 2.5x price/performance benefits and 5.3x speed up!
Share this post

This is a collaborative post from Databricks and Intel. We thank the authors from Intel for their contributions.

 

Customers can now leverage Databricks Photon together with AWS i4i instance types, which have the latest Intel (Ice Lake) 3rd Gen Xeon scalable processors and Intel Advanced Vector Extensions 512 (Intel® AVX-512), to reduce costs and increase performance of their data processing, analytical and ML/AI workloads.

Improving the performance of workloads on Databricks reduces the total cost of ownership (TCO) and achieves lower time to insights. The latest innovations from the Databricks and Intel partnership allow companies to realize these improvements by simply enabling Photon and using the AWS i4i instance family. No code changes are necessary to benefit; users can continue to focus on delivering business value.

Measuring price/performance and speed-up

  • A collection of workloads derived from familiar benchmarking datasets, such as the industry standard TPC-DS1, were run during July 2022 at both 1TB and 10TB scales on 20 worker cluster sizes. To quantify the improvements, the cost and performance of Photon and non-Photon AWS i4i cluster configurations were measured and compared against a baseline configuration that utilized the AWS i3 instance family.

Here are the key characteristics of each instance type:

 AWS i4i.2xlargeAWS i3.2xlarge
CPU family3rd Gen Intel Xeon® Scalable Processor (Ice Lake)Intel® Xeon® E5-2686 v4
vCPUs88
Memory (GiB)6461
Instance Storage (GiB)1 x 1,875 AWS Nitro SSD1 x 1900 NVMe SSD
Network Bandwidth (Gbps)Up to 12Up to 10
On-Demand hourly rate ($/hr)2$0.686/hr$0.624/hr

The baseline uses AWS i3.2xlarge instances, and Databricks Runtime (DBR) 11.0 (without Photon) enabled. The same workload was run on AWS i4i.2xlarge instances (without Photon). The change of instance family, from i3 to i4i, was responsible for a 1.4x speed improvement for both 1TB and 10TB dataset volumes.

1.4x relative speed up of i4i instances against 13 instances
1.4x relative speed up of i4i instances against 13 instances

The power of Photon

After enabling Photon, the same workload was measured again.

5.3x relative speed up of i4i Photon against the i3 DBR
5.3x relative speed up of i4i Photon against the i3 DBR

With Photon and AWS i4i instances, up to a 5.3x performance speedup and a 2.5x price-performance improvement was observed when comparing to the non-Photon AWS i3 configurations.

Caption: 2.5x relative price-performance improvement of i4i Photon
Caption: 2.5x relative price-performance improvement of i4i Photon
Photon is able to lower the TCO of a workload, delivering price-performance improvements by reducing the necessary AWS instance uptime due to significantly accelerated query processing.

How to enable Photon and i4i instances

Configuration for Photon and i4i instances when creating a Databricks cluster
Configuration for Photon and i4i instances when creating a Databricks cluster

The Databricks Lakehouse platform makes it easy to take advantage of these performance improvements. When creating your Databricks cluster you simply select the "Use Photon Acceleration" option and choose AWS i4i instances for the worker type. There is no need to change any of your code as Photon is fully compatible with the Apache Spark API.

Summary of performance results

For these TPC-DS derived benchmarks, pairing Databricks Photon and AWS i4i instances with 3rd Gen Intel Ice Lake Xeon Scalable processors, resulted in up to a 2.5x cost improvement and 5.3x latency improvement compared to i3 instances without Photon. Follow the links below for additional information.

Learn more at

databricks.com/lakehouse
databricks.com/photon
intel.com/xeonscalable
intel.com/avx512
aws.amazon.com/ec2/instance-types/i4i/

Footnotes
1 Derived from the power test consisting of all 99 TPC-DS queries. These results are not comparable to an official, audited TPC benchmark. Databricks' official TPC-DS results can be found here.
2 On-demand EC2 pricing taken from us-east-1 as of 2022-09-07.

Try Databricks for free

Related posts

See all Platform Blog posts