Adaptable Register File Organization for Vector Processors

Lazo, Cristóbal Ramírez; Reggiani, Enrico; Morales, Carlos Rojas; Bagué, Roger Figueras; Vargas, Luis Alfonso Villa; Salinas, Marco Antonio Ramírez; Cortés, Mateo Valero; Unsal, Osman Sabri; Cristal, Adrián

Full-text links:

Download:

Current browse context:

cs.AR

< prev | next >

new | recent | 2111

Change to browse by:

Computer Science > Hardware Architecture

Title: Adaptable Register File Organization for Vector Processors

Authors: Cristóbal Ramírez Lazo, Enrico Reggiani, Carlos Rojas Morales, Roger Figueras Bagué, Luis Alfonso Villa Vargas, Marco Antonio Ramírez Salinas, Mateo Valero Cortés, Osman Sabri Unsal, Adrián Cristal

(Submitted on 9 Nov 2021 (this version), latest version 29 May 2022 (v2))

Abstract: Modern scientific applications are getting more diverse, and the vector lengths in those applications vary widely. Contemporary Vector Processors (VPs) are designed either for short vector lengths, e.g., Fujitsu A64FX with 512-bit ARM SVE vector support, or long vectors, e.g., NEC Aurora Tsubasa with 16Kbits Maximum Vector Length (MVL). Unfortunately, both approaches have drawbacks. On the one hand, short vector length VP designs struggle to provide high efficiency for applications featuring long vectors with high Data Level Parallelism (DLP). On the other hand, long vector VP designs waste resources and underutilize the Vector Register File (VRF) when executing low DLP applications with short vector lengths. Therefore, those long vector VP implementations are limited to a specialized subset of applications, where relatively high DLP must be present to achieve excellent performance with high efficiency. To overcome these limitations, we propose an Adaptable Vector Architecture (AVA) that leads to having the best of both worlds. AVA is designed for short vectors (MVL=16 elements) and is thus area and energy-efficient. However, AVA has the functionality to reconfigure the MVL, thereby allowing to exploit the benefits of having a longer vector (up to 128 elements) microarchitecture when abundant DLP is present. We model AVA on the gem5 simulator and evaluate the performance with six applications taken from the RiVEC Benchmark Suite. To obtain area and power consumption metrics, we model AVA on McPAT for 22nm technology. Our results show that by reconfiguring our small VRF (8KB) plus our novel issue queue scheme, AVA yields a 2X speedup over the default configuration for short vectors. Additionally, AVA shows competitive performance when compared to a long vector VP, while saving 50% of area.

Comments:	This work is to appear at the 28th IEEE International Symposium on High-Performance Computer Architecture (HPCA 2022)
Subjects:	Hardware Architecture (cs.AR)
Cite as:	arXiv:2111.05301 [cs.AR]
	(or arXiv:2111.05301v1 [cs.AR] for this version)

Submission history

From: Cristóbal Ramírez Lazo [view email]
[v1] Tue, 9 Nov 2021 18:12:02 GMT (1281kb,D)
[v2] Sun, 29 May 2022 17:52:47 GMT (1282kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> cs > arXiv:2111.05301v1

Download:

Current browse context:

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

Computer Science > Hardware Architecture

Title: Adaptable Register File Organization for Vector Processors

Submission history