Iacovos G. Kolokasis

Graduate Student, University of Crete & ICS-FORTH

Iacovos G. Kolokasis is a graduate student in the Department of Computer Science at the University of Crete, working with Prof. Angelos Bilas and Prof. Polyvios Pratikakis. He is also a graduate research assistant in the Institute of Computer Science (ICS), at the Foundation of Research and Technology — Hellas (FORTH). During the summer of 2019 he was an intern research student at SAP HANA VORA, SAP SE, Germany. His main research interests fall within the general area of computer systems with an emphasis on memory and storage systems for high-performance data analytics frameworks. He is particularly interested in emerging memory technologies (e.g., non-volatile memories) and how they can be used in future data centers and for applications, such as machine learning and big-data analytics.

Past sessions

Summit Europe 2020 TeraCache: Efficient Caching Over Fast Storage Devices

November 18, 2020 04:00 PM PT

This talk will introduce TeraCache, a new scalable cache for Spark that avoids both garbage collection (GC) and serialization overheads. Existing Spark caching options incur either significant GC overheads for large managed heaps over persistent memory or significant serialization overheads to place objects off-heap on large storage devices. Our analysis shows that: (1) serialization increases execution time by up to 30% and (2) caching on the managed heap increases GC time by 20%. In addition, these overheads become worse as datasets grow.

TeraCache eliminates serialization and GC overhead for cached objects. To achieve this, TeraCache extends HotSpot JVM’s heap with a managed heap that resides on a memory-mapped fast storage device and is exclusively used for cached data. To avoid GC over TeraCache, we extend the Java runtime to use semantic hints from Spark allocating and freeing cached data objects. We modify the collector to not include cached objects, while maintaining safety. Preliminary results show that TeraCache can speed up ML workloads by up to 37% compared to the supported RDD storage levels.

Speaker: Iacovos G. Kolokasis