Enabling and Accelerating Dynamic Vision Transformer Inference for Real-Time Applications

Sreedhar, Kavya; Clemons, Jason; Venkatesan, Rangharajan; Keckler, Stephen W.; Horowitz, Mark

Full-text links:

Download:

Current browse context:

cs.CV

< prev | next >

new | recent | 2212

Computer Science > Computer Vision and Pattern Recognition

Title: Enabling and Accelerating Dynamic Vision Transformer Inference for Real-Time Applications

Authors: Kavya Sreedhar, Jason Clemons, Rangharajan Venkatesan, Stephen W. Keckler, Mark Horowitz

(Submitted on 6 Dec 2022 (v1), revised 23 Feb 2023 (this version, v2), latest version 15 Apr 2024 (v3))

Abstract: Many state-of-the-art deep learning models for computer vision tasks are based on the transformer architecture. Such models can be computationally expensive and are typically statically set to meet the deployment scenario. However, in real-time applications, the resources available for every inference can vary considerably and be smaller than what state-of-the-art models require. We can use dynamic models to adapt the model execution to meet real-time application resource constraints. While prior dynamic work primarily minimized resource utilization for less complex input images, we adapt vision transformers to meet system dynamic resource constraints, independent of the input image. We find that unlike early transformer models, recent state-of-the-art vision transformers heavily rely on convolution layers. We show that pretrained models are fairly resilient to skipping computation in the convolution and self-attention layers, enabling us to create a low-overhead system for dynamic real-time inference without extra training. Finally, we explore compute organization and memory sizes to find settings to efficiency execute dynamic vision transformers. We find that wider vector sizes produce a better energy-accuracy tradeoff across dynamic configurations despite limiting the granularity of dynamic execution, but scaling accelerator resources for larger models does not significantly improve the latency-area-energy-tradeoffs. Our accelerator saves 20% of execution time and 30% of energy with a 4% drop in accuracy with pretrained SegFormer B2 model in our dynamic inference approach and 57% of execution time for the ResNet-50 backbone with a 4.5% drop in accuracy with the Once-For-All approach.

Subjects:	Computer Vision and Pattern Recognition (cs.CV); Hardware Architecture (cs.AR)
Cite as:	arXiv:2212.02687 [cs.CV]
	(or arXiv:2212.02687v2 [cs.CV] for this version)

Submission history

From: Kavya Sreedhar [view email]
[v1] Tue, 6 Dec 2022 01:10:31 GMT (6467kb,D)
[v2] Thu, 23 Feb 2023 21:25:53 GMT (6955kb,D)
[v3] Mon, 15 Apr 2024 22:13:39 GMT (4066kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> cs > arXiv:2212.02687v2

Download:

Current browse context:

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

Computer Science > Computer Vision and Pattern Recognition

Title: Enabling and Accelerating Dynamic Vision Transformer Inference for Real-Time Applications

Submission history