References & Citations
Computer Science > Distributed, Parallel, and Cluster Computing
Title: An Efficient Vectorization Scheme for Stencil Computation
(Submitted on 16 Mar 2021 (v1), last revised 18 Mar 2021 (this version, v2))
Abstract: Stencil computation is one of the most important kernels in various scientific and engineering applications. A variety of work has focused on vectorization and tiling techniques, aiming at exploiting the in-core data parallelism and data locality respectively. In this paper, the downsides of existing vectorization schemes are analyzed. Briefly, they either incur data alignment conflicts or hurt the data locality when integrated with tiling. Then we propose a novel transpose layout to preserve the data locality for tiling and reduce the data reorganization overhead for vectorization simultaneously. To further improve the data reuse at the register level, a time loop unroll-and-jam strategy is designed to perform multistep stencil computation along the time dimension. Experimental results on the AVX-2 and AVX-512 CPUs show that our approach obtains a competitive performance.
Submission history
From: Kun Li [view email][v1] Tue, 16 Mar 2021 03:22:52 GMT (2233kb,D)
[v2] Thu, 18 Mar 2021 02:56:58 GMT (2233kb,D)
Link back to: arXiv, form interface, contact.