We gratefully acknowledge support from
the Simons Foundation and member institutions.

Hardware Architecture

New submissions

[ total of 3 entries: 1-3 ]
[ showing up to 2000 entries per page: fewer | more ]

New submissions for Thu, 20 Jan 22

[1]  arXiv:2201.07498 [pdf, other]
Title: A Mixed Precision, Multi-GPU Design for Large-scale Top-K Sparse Eigenproblems
Subjects: Hardware Architecture (cs.AR)

Graph analytics techniques based on spectral methods process extremely large sparse matrices with millions or even billions of non-zero values. Behind these algorithms lies the Top-K sparse eigenproblem, the computation of the largest eigenvalues and their associated eigenvectors. In this work, we leverage GPUs to scale the Top-K sparse eigenproblem to bigger matrices than previously achieved while also providing state-of-the-art execution times. We can transparently partition the computation across multiple GPUs, process out-of-core matrices, and tune precision and execution time using mixed-precision floating-point arithmetic. Overall, we are 67 times faster than the highly optimized ARPACK library running on a 104-thread CPU and 1.9 times than a recent FPGA hardware design. We also determine how mixed-precision floating-point arithmetic improves execution time by 50% over double-precision, and is 12 times more accurate than single-precision floating-point arithmetic.

[2]  arXiv:2201.07634 [pdf, other]
Title: FAT: An In-Memory Accelerator with Fast Addition for Ternary Weight Neural Networks
Comments: 14 pages
Subjects: Hardware Architecture (cs.AR); Artificial Intelligence (cs.AI)

Convolutional Neural Networks (CNNs) demonstrate great performance in various applications but have high computational complexity. Quantization is applied to reduce the latency and storage cost of CNNs. Among the quantization methods, Binary and Ternary Weight Networks (BWNs and TWNs) have a unique advantage over 8-bit and 4-bit quantization. They replace the multiplication operations in CNNs with additions, which are favoured on In-Memory-Computing (IMC) devices. IMC acceleration for BWNs has been widely studied. However, though TWNs have higher accuracy and better sparsity, IMC acceleration for TWNs has limited research. TWNs on existing IMC devices are inefficient because the sparsity is not well utilized, and the addition operation is not efficient.
In this paper, we propose FAT as a novel IMC accelerator for TWNs. First, we propose a Sparse Addition Control Unit, which utilizes the sparsity of TWNs to skip the null operations on zero weights. Second, we propose a fast addition scheme based on the memory Sense Amplifier to avoid the time overhead of both carry propagation and writing back the carry to the memory cells. Third, we further propose a Combined-Stationary data mapping to reduce the data movement of both activations and weights and increase the parallelism of memory columns. Simulation results show that for addition operations at the Sense Amplifier level, FAT achieves 2.00X speedup, 1.22X power efficiency and 1.22X area efficiency compared with State-Of-The-Art IMC accelerator ParaPIM. FAT achieves 10.02X speedup and 12.19X energy efficiency compared with ParaPIM on networks with 80% sparsity

Cross-lists for Thu, 20 Jan 22

[3]  arXiv:2201.07375 (cross-list from cs.CR) [pdf, other]
Title: A 334uW 0.158mm$^2$ Saber Learning with Rounding based Post-Quantum Crypto Accelerator
Subjects: Cryptography and Security (cs.CR); Hardware Architecture (cs.AR)

National Institute of Standard & Technology (NIST) is currently running a multi-year-long standardization procedure to select quantum-safe or post-quantum cryptographic schemes to be used in the future. Saber is the only LWR based algorithm to be in the final of Round 3. This work presents a Saber ASIC which provides 1.37X power-efficient, 1.75x lower area, and 4x less memory implementation w.r.t. other SoA PQC ASIC. The energy-hungry multiplier block is 1.5x energyefficient than SoA.

[ total of 3 entries: 1-3 ]
[ showing up to 2000 entries per page: fewer | more ]

Disable MathJax (What is MathJax?)

Links to: arXiv, form interface, find, cs, recent, 2201, contact, help  (Access key information)