We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

cs.DC

Change to browse by:

cs

References & Citations

DBLP - CS Bibliography

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo ScienceWISE logo

Computer Science > Distributed, Parallel, and Cluster Computing

Title: An Efficient Vectorization Scheme for Stencil Computation

Abstract: Stencil computation is one of the most important kernels in various scientific and engineering applications. A variety of work has focused on vectorization and tiling techniques, aiming at exploiting the in-core data parallelism and data locality respectively. In this paper, the downsides of existing vectorization schemes are analyzed. Briefly, they either incur data alignment conflicts or hurt the data locality when integrated with tiling. Then we propose a novel transpose layout to preserve the data locality for tiling and reduce the data reorganization overhead for vectorization simultaneously. To further improve the data reuse at the register level, a time loop unroll-and-jam strategy is designed to perform multistep stencil computation along the time dimension. Experimental results on the AVX-2 and AVX-512 CPUs show that our approach obtains a competitive performance.
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
Cite as: arXiv:2103.08825 [cs.DC]
  (or arXiv:2103.08825v2 [cs.DC] for this version)

Submission history

From: Kun Li [view email]
[v1] Tue, 16 Mar 2021 03:22:52 GMT (2233kb,D)
[v2] Thu, 18 Mar 2021 02:56:58 GMT (2233kb,D)

Link back to: arXiv, form interface, contact.