We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

cs.AR

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo ScienceWISE logo

Computer Science > Hardware Architecture

Title: pLUTo: Enabling Massively Parallel Computation In DRAM via Lookup Tables

Abstract: Data movement between main memory and the processor is a key contributor to the execution time and energy consumption of memory-intensive applications. This data movement bottleneck can be alleviated using Processing-in-Memory (PiM). One category of PiM is Processing-using-Memory (PuM), in which computation takes place inside the memory array by exploiting intrinsic analog properties of the memory device. PuM yields high throughput and efficiency, but supports a limited range of operations. As a result, PuM architectures cannot efficiently perform some complex operations (e.g., multiplication, division, exponentiation) without sizeable increases in chip area and design complexity.
To overcome this limitation in DRAM-based PuM architectures, we introduce pLUTo (processing-using-memory with lookup table [LUT] operations), a DRAM-based PuM architecture that leverages the high area density of DRAM to enable the massively parallel storing and querying of lookup tables (LUTs). The use of LUTs enables pLUTo to efficiently execute complex operations in-memory via memory reads (i.e., LUT queries) instead of relying on complex extra logic or performing long sequences of DRAM commands. pLUTo outperforms the optimized CPU and GPU baselines in performance/energy efficiency by an average of 1960$\times$/307$\times$ and 4.2$\times$/4$\times$ across the evaluated workloads, and by 33$\times$/8$\times$ and 110$\times$/80$\times$ for the LeNet-5 quantized neural network. pLUTo outperforms a state-of-the-art PiM baseline by 50$\times$/342$\times$ in performance/energy efficiency.
Subjects: Hardware Architecture (cs.AR); Distributed, Parallel, and Cluster Computing (cs.DC)
ACM classes: B.3.1; C.1.3
Cite as: arXiv:2104.07699 [cs.AR]
  (or arXiv:2104.07699v2 [cs.AR] for this version)

Submission history

From: João Dinis Ferreira [view email]
[v1] Thu, 15 Apr 2021 18:10:22 GMT (2084kb)
[v2] Thu, 25 Nov 2021 15:58:22 GMT (372kb,D)

Link back to: arXiv, form interface, contact.