Accelerating Deep Learning Inference via Learned Caches

Balasubramanian, Arjun; Kumar, Adarsh; Liu, Yuhan; Cao, Han; Venkataraman, Shivaram; Akella, Aditya

Full-text links:

Download:

Current browse context:

cs.LG

< prev | next >

new | recent | 2101

Computer Science > Machine Learning

Title: Accelerating Deep Learning Inference via Learned Caches

Authors: Arjun Balasubramanian, Adarsh Kumar, Yuhan Liu, Han Cao, Shivaram Venkataraman, Aditya Akella

(Submitted on 18 Jan 2021)

Abstract: Deep Neural Networks (DNNs) are witnessing increased adoption in multiple domains owing to their high accuracy in solving real-world problems. However, this high accuracy has been achieved by building deeper networks, posing a fundamental challenge to the low latency inference desired by user-facing applications. Current low latency solutions trade-off on accuracy or fail to exploit the inherent temporal locality in prediction serving workloads.
We observe that caching hidden layer outputs of the DNN can introduce a form of late-binding where inference requests only consume the amount of computation needed. This enables a mechanism for achieving low latencies, coupled with an ability to exploit temporal locality. However, traditional caching approaches incur high memory overheads and lookup latencies, leading us to design learned caches - caches that consist of simple ML models that are continuously updated. We present the design of GATI, an end-to-end prediction serving system that incorporates learned caches for low-latency DNN inference. Results show that GATI can reduce inference latency by up to 7.69X on realistic workloads.

Subjects:	Machine Learning (cs.LG); Distributed, Parallel, and Cluster Computing (cs.DC); Performance (cs.PF)
Cite as:	arXiv:2101.07344 [cs.LG]
	(or arXiv:2101.07344v1 [cs.LG] for this version)

Submission history

From: Arjun Balasubramanian [view email]
[v1] Mon, 18 Jan 2021 22:13:08 GMT (1245kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> cs > arXiv:2101.07344

Download:

Current browse context:

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

Computer Science > Machine Learning

Title: Accelerating Deep Learning Inference via Learned Caches

Submission history