Apache Spark is a popular data processing engine designed to execute advanced analytics on very large data sets which are common in today’s enterprise use cases. To enable Spark’s high performance for different workloads (e.g. machine-learning applications), in-memory data storage capabilities are built right in.
However, Spark’s in-memory capabilities are limited by the memory available in the server; it is common for computing resources to be idle during the execution of a Spark job, even though the system’s memory is saturated. To mitigate this limitation, Spark’s distributed architecture can run on a cluster of nodes, thus taking advantage of the memory available across all nodes. While employing additional nodes would solve the server DRAM capacity problem, it does so at an increased cost. Intel(R) Memory Drive Technology is a software-defned memory (SDM) technology, which combined with an Intel(R) Optane(TM) SSD, expands the system’s memory.
This combination of Intel(R) Optane(TM) SSD with Intel Memory Drive Technology alleviates those memory limitations that are inherent to Spark, by making more memory available to the operating system and to Spark jobs, transparently.
Session hashtag: #HWCSAIS1
Optane(TM) and SSD Solutions Architect focusing on Big Data & AI. Currently responsible for producing technical collateral such as white papers, reference architecture documents and technical briefs that cover performance and benchmarks of Intel(R) Optane(TM) SSD's in Big Data solutions that use technologies such as Spark and Hadoop. Total experience of 20 years in the roles of Big Data & DataWarehouse Architect / Sr. Developer in technologies including Hadoop, Spark, Greenplum, Oracle, Informatica. Highly proficient in building Data Ingest pipelines for MPP platforms like Hadoop and Greenplum.