HEAT: Head-level Parameter Efficient Adaptation of Vision Transformers with Taylor-expansion Importance Scores

Zhong, Yibo; Zhou, Yao

Full-text links:

Download:

Current browse context:

cs.CV

< prev | next >

new | recent | 2404

Computer Science > Computer Vision and Pattern Recognition

Title: HEAT: Head-level Parameter Efficient Adaptation of Vision Transformers with Taylor-expansion Importance Scores

Authors: Yibo Zhong, Yao Zhou

(Submitted on 13 Apr 2024)

Abstract: Prior computer vision research extensively explores adapting pre-trained vision transformers (ViT) to downstream tasks. However, the substantial number of parameters requiring adaptation has led to a focus on Parameter Efficient Transfer Learning (PETL) as an approach to efficiently adapt large pre-trained models by training only a subset of parameters, achieving both parameter and storage efficiency. Although the significantly reduced parameters have shown promising performance under transfer learning scenarios, the structural redundancy inherent in the model still leaves room for improvement, which warrants further investigation. In this paper, we propose Head-level Efficient Adaptation with Taylor-expansion importance score (HEAT): a simple method that efficiently fine-tuning ViTs at head levels. In particular, the first-order Taylor expansion is employed to calculate each head's importance score, termed Taylor-expansion Importance Score (TIS), indicating its contribution to specific tasks. Additionally, three strategies for calculating TIS have been employed to maximize the effectiveness of TIS. These strategies calculate TIS from different perspectives, reflecting varying contributions of parameters. Besides ViT, HEAT has also been applied to hierarchical transformers such as Swin Transformer, demonstrating its versatility across different transformer architectures. Through extensive experiments, HEAT has demonstrated superior performance over state-of-the-art PETL methods on the VTAB-1K benchmark.

Subjects:	Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Cite as:	arXiv:2404.08894 [cs.CV]
	(or arXiv:2404.08894v1 [cs.CV] for this version)

Submission history

From: Yibo Zhong [view email]
[v1] Sat, 13 Apr 2024 04:01:35 GMT (1133kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> cs > arXiv:2404.08894

Download:

Current browse context:

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

Computer Science > Computer Vision and Pattern Recognition

Title: HEAT: Head-level Parameter Efficient Adaptation of Vision Transformers with Taylor-expansion Importance Scores

Submission history