Discffusion: Discriminative Diffusion Models as Few-shot Vision and Language Learners

He, Xuehai; Feng, Weixi; Fu, Tsu-Jui; Jampani, Varun; Akula, Arjun; Narayana, Pradyumna; Basu, Sugato; Wang, William Yang; Wang, Xin Eric

Full-text links:

Download:

Current browse context:

cs.CV

< prev | next >

new | recent | 2305

Change to browse by:

Computer Science > Computer Vision and Pattern Recognition

Title: Discffusion: Discriminative Diffusion Models as Few-shot Vision and Language Learners

Authors: Xuehai He, Weixi Feng, Tsu-Jui Fu, Varun Jampani, Arjun Akula, Pradyumna Narayana, Sugato Basu, William Yang Wang, Xin Eric Wang

(Submitted on 18 May 2023 (v1), last revised 24 Apr 2024 (this version, v3))

Abstract: Diffusion models, such as Stable Diffusion, have shown incredible performance on text-to-image generation. Since text-to-image generation often requires models to generate visual concepts with fine-grained details and attributes specified in text prompts, can we leverage the powerful representations learned by pre-trained diffusion models for discriminative tasks such as image-text matching? To answer this question, we propose a novel approach, Discriminative Stable Diffusion (DSD), which turns pre-trained text-to-image diffusion models into few-shot discriminative learners. Our approach mainly uses the cross-attention score of a Stable Diffusion model to capture the mutual influence between visual and textual information and fine-tune the model via efficient attention-based prompt learning to perform image-text matching. By comparing DSD with state-of-the-art methods on several benchmark datasets, we demonstrate the potential of using pre-trained diffusion models for discriminative tasks with superior results on few-shot image-text matching.

Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2305.10722 [cs.CV]
	(or arXiv:2305.10722v3 [cs.CV] for this version)

Submission history

From: Xuehai He [view email]
[v1] Thu, 18 May 2023 05:41:36 GMT (969kb,D)
[v2] Wed, 15 Nov 2023 07:53:57 GMT (969kb,D)
[v3] Wed, 24 Apr 2024 23:10:17 GMT (4709kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> cs > arXiv:2305.10722

Download:

Current browse context:

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

Computer Science > Computer Vision and Pattern Recognition

Title: Discffusion: Discriminative Diffusion Models as Few-shot Vision and Language Learners

Submission history