We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

cs.PL

Change to browse by:

cs

References & Citations

DBLP - CS Bibliography

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo

Computer Science > Programming Languages

Title: Garbage Collection or Serialization? Between a Rock and a Hard Place!

Abstract: Big data analytics frameworks, such as Spark and Giraph, need to process and cache massive amounts of data that do not always fit on the heap. Therefore, frameworks temporarily move long-lived objects outside the managed heap (off-heap) on a fast storage device. Unfortunately, this practice results in: (1) high serialization/deserialization (S/D) cost, and (2) high memory pressure when off-heap objects are moved back to the managed heap for processing. In this paper, we propose TeraHeap, a system that eliminates S/D overhead and expensive GC scans for a large portion of the objects in big data frameworks. TeraHeap relies on three concepts. (1) It eliminates S/D cost by extending the managed runtime (JVM) to use a second high-capacity heap (H2) over a fast storage device. (2) It reduces GC cost by fencing the garbage collector from scanning H2 objects. (3) It offers a simple hint-based interface, which allows frameworks to leverage knowledge about objects for populating H2. We implement TeraHeap in OpenJDK and evaluate it with 15 widely used applications in two real-world big data frameworks, Spark and Giraph. Our evaluation shows that for the same DRAM size, TeraHeap improves performance by up to 73% and 28% compared to native Spark and Giraph, respectively. Also, it provides better performance by consuming up to 8x and 1.2x less DRAM capacity than native Spark and Giraph, respectively. Finally, it outperforms Panthera, a garbage collector for hybrid memories, by up to 69%.
Comments: 17 pages, 12 figures, asplos23 submission revision
Subjects: Programming Languages (cs.PL)
ACM classes: D.3.3; D.3.4; B.3.2; C.5.5
Cite as: arXiv:2111.10589 [cs.PL]
  (or arXiv:2111.10589v3 [cs.PL] for this version)

Submission history

From: Polyvios Pratikakis [view email]
[v1] Sat, 20 Nov 2021 13:36:35 GMT (4407kb,D)
[v2] Sat, 17 Dec 2022 18:28:06 GMT (2736kb,D)
[v3] Mon, 9 Jan 2023 12:43:23 GMT (2086kb,D)

Link back to: arXiv, form interface, contact.