We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

cs.DC

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo

Computer Science > Distributed, Parallel, and Cluster Computing

Title: Hierarchical Roofline Analysis: How to Collect Data using Performance Tools on Intel CPUs and NVIDIA GPUs

Authors: Charlene Yang
Abstract: This paper surveys a range of methods to collect necessary performance data on Intel CPUs and NVIDIA GPUs for hierarchical Roofline analysis. As of mid-2020, two vendor performance tools, Intel Advisor and NVIDIA Nsight Compute, have integrated Roofline analysis into their supported feature set. This paper fills the gap for when these tools are not available, or when users would like a more customized workflow for certain analysis. Specifically, we will discuss how to use Intel Advisor, RRZE LIKWID, Intel SDE and Intel Amplifier on Intel architectures, and nvprof, Nsight Compute metrics, and Nsight Compute section files on NVIDIA architectures. These tools will be used to collect information for as many memory/cache levels in the memory hierarchy as possible in order to provide insights into application's data reuse and cache locality characteristics.
Comments: 5 pages, 7 figures
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Hardware Architecture (cs.AR); Performance (cs.PF)
Cite as: arXiv:2009.02449 [cs.DC]
  (or arXiv:2009.02449v4 [cs.DC] for this version)

Submission history

From: Charlene Yang [view email]
[v1] Sat, 5 Sep 2020 03:14:42 GMT (758kb,D)
[v2] Mon, 14 Sep 2020 05:27:51 GMT (758kb,D)
[v3] Tue, 22 Sep 2020 20:23:56 GMT (758kb,D)
[v4] Sun, 4 Oct 2020 17:04:40 GMT (758kb,D)

Link back to: arXiv, form interface, contact.