We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

cs.CV

Change to browse by:

cs

References & Citations

DBLP - CS Bibliography

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo

Computer Science > Computer Vision and Pattern Recognition

Title: Discffusion: Discriminative Diffusion Models as Few-shot Vision and Language Learners

Abstract: Diffusion models, such as Stable Diffusion, have shown incredible performance on text-to-image generation. Since text-to-image generation often requires models to generate visual concepts with fine-grained details and attributes specified in text prompts, can we leverage the powerful representations learned by pre-trained diffusion models for discriminative tasks such as image-text matching? To answer this question, we propose a novel approach, Discriminative Stable Diffusion (DSD), which turns pre-trained text-to-image diffusion models into few-shot discriminative learners. Our approach mainly uses the cross-attention score of a Stable Diffusion model to capture the mutual influence between visual and textual information and fine-tune the model via efficient attention-based prompt learning to perform image-text matching. By comparing DSD with state-of-the-art methods on several benchmark datasets, we demonstrate the potential of using pre-trained diffusion models for discriminative tasks with superior results on few-shot image-text matching.
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Cite as: arXiv:2305.10722 [cs.CV]
  (or arXiv:2305.10722v3 [cs.CV] for this version)

Submission history

From: Xuehai He [view email]
[v1] Thu, 18 May 2023 05:41:36 GMT (969kb,D)
[v2] Wed, 15 Nov 2023 07:53:57 GMT (969kb,D)
[v3] Wed, 24 Apr 2024 23:10:17 GMT (4709kb,D)

Link back to: arXiv, form interface, contact.