We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

cs.AR

Change to browse by:

cs

References & Citations

DBLP - CS Bibliography

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo ScienceWISE logo

Computer Science > Hardware Architecture

Title: Data Cache Prefetching with Perceptron Learning

Abstract: Cache prefetcher greatly eliminates compulsory cache misses, by fetching data from slower memory to faster cache before it is actually required by processors. Sophisticated prefetchers predict next use cache line by repeating program's historical spatial and temporal memory access pattern. However, they are error prone and the mis-predictions lead to cache pollution and exert extra pressure on memory subsystem. In this paper, a novel scheme of data cache prefetching with perceptron learning is proposed. The key idea is a two-level prefetching mechanism. A primary decision is made by utilizing previous table-based prefetching mechanism, e.g. stride prefetching or Markov prefetching, and then, a neural network, perceptron is taken to detect and trace program memory access patterns, to help reject those unnecessary prefetching decisions. The perceptron can learn from both local and global history in time and space, and can be easily implemented by hardware. This mechanism boost execution performance by ideally mitigating cache pollution and eliminating redundant memory request issued by prefetcher. Detailed evaluation and analysis were conducted based on SPEC CPU 2006 benchmarks. The simulation results show that generally the proposed scheme yields a geometric mean of 60.64%-83.84% decrease in prefetching memory requests without loss in instruction per cycle(IPC)(floating between -2.22% and 2.55%) and cache hit rate(floating between -1.67% and 2.46%). Though it is possible that perceptron may refuse useful blocks and thus cause minor raise in cache miss rate, lower memory request count can decrease average memory access latency, which compensate for the loss, and in the meantime, enhance overall performance in multi-programmed workloads.
Subjects: Hardware Architecture (cs.AR)
Cite as: arXiv:1712.00905 [cs.AR]
  (or arXiv:1712.00905v1 [cs.AR] for this version)

Submission history

From: Zhiwei Luo [view email]
[v1] Mon, 4 Dec 2017 05:01:30 GMT (570kb,D)

Link back to: arXiv, form interface, contact.