AMU-Tuning: Effective Logit Bias for CLIP-based Few-shot Learning

Tang, Yuwei; Lin, Zhenyi; Wang, Qilong; Zhu, Pengfei; Hu, Qinghua

Full-text links:

Download:

Current browse context:

cs.CV

< prev | next >

new | recent | 2404

Computer Science > Computer Vision and Pattern Recognition

Title: AMU-Tuning: Effective Logit Bias for CLIP-based Few-shot Learning

Authors: Yuwei Tang, Zhenyi Lin, Qilong Wang, Pengfei Zhu, Qinghua Hu

(Submitted on 13 Apr 2024)

Abstract: Recently, pre-trained vision-language models (e.g., CLIP) have shown great potential in few-shot learning and attracted a lot of research interest. Although efforts have been made to improve few-shot ability of CLIP, key factors on the effectiveness of existing methods have not been well studied, limiting further exploration of CLIP's potential in few-shot learning. In this paper, we first introduce a unified formulation to analyze CLIP-based few-shot learning methods from a perspective of logit bias, which encourages us to learn an effective logit bias for further improving performance of CLIP-based few-shot learning methods. To this end, we disassemble three key components involved in computation of logit bias (i.e., logit features, logit predictor, and logit fusion) and empirically analyze the effect on performance of few-shot classification. Based on analysis of key components, this paper proposes a novel AMU-Tuning method to learn effective logit bias for CLIP-based few-shot classification. Specifically, our AMU-Tuning predicts logit bias by exploiting the appropriate $\underline{\textbf{A}}$uxiliary features, which are fed into an efficient feature-initialized linear classifier with $\underline{\textbf{M}}$ulti-branch training. Finally, an $\underline{\textbf{U}}$ncertainty-based fusion is developed to incorporate logit bias into CLIP for few-shot classification. The experiments are conducted on several widely used benchmarks, and the results show AMU-Tuning clearly outperforms its counterparts while achieving state-of-the-art performance of CLIP-based few-shot learning without bells and whistles.

Comments:	Accepted by CVPR 2024
Subjects:	Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL); Machine Learning (cs.LG)
Cite as:	arXiv:2404.08958 [cs.CV]
	(or arXiv:2404.08958v1 [cs.CV] for this version)

Submission history

From: Yuwei Tang [view email]
[v1] Sat, 13 Apr 2024 10:46:11 GMT (826kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> cs > arXiv:2404.08958

Download:

Current browse context:

Change to browse by:

References & Citations

Bookmark

Computer Science > Computer Vision and Pattern Recognition

Title: AMU-Tuning: Effective Logit Bias for CLIP-based Few-shot Learning

Submission history