We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

cs.DC

Change to browse by:

cs

References & Citations

DBLP - CS Bibliography

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo

Computer Science > Distributed, Parallel, and Cluster Computing

Title: AutoTSMM: An Auto-tuning Framework for Building High-Performance Tall-and-Skinny Matrix-Matrix Multiplication on CPUs

Abstract: In recent years, general matrix-matrix multiplication with non-regular-shaped input matrices has been widely used in many applications like deep learning and has drawn more and more attention. However, conventional implementations are not suited for non-regular-shaped matrix-matrix multiplications, and few works focus on optimizing tall-and-skinny matrix-matrix multiplication on CPUs. This paper proposes an auto-tuning framework, AutoTSMM, to build high-performance tall-and-skinny matrix-matrix multiplication. AutoTSMM selects the optimal inner kernels in the install-time stage and generates an execution plan for the pre-pack tall-and-skinny matrix-matrix multiplication in the runtime stage. Experiments demonstrate that AutoTSMM achieves competitive performance comparing to state-of-the-art tall-and-skinny matrix-matrix multiplication. And, it outperforms all conventional matrix-matrix multiplication implementations.
Comments: 8 pages, 12 figures, published in IEEE ISPA 2021
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
ACM classes: D.1.3
DOI: 10.1109/ISPA-BDCloud-SocialCom-SustainCom52081.2021.00034
Cite as: arXiv:2208.08088 [cs.DC]
  (or arXiv:2208.08088v1 [cs.DC] for this version)

Submission history

From: Chendi Li [view email]
[v1] Wed, 17 Aug 2022 05:58:44 GMT (2275kb,D)

Link back to: arXiv, form interface, contact.