We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

cs.CV

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo

Computer Science > Computer Vision and Pattern Recognition

Title: DiffusionCLIP: Text-Guided Diffusion Models for Robust Image Manipulation

Abstract: Recently, GAN inversion methods combined with Contrastive Language-Image Pretraining (CLIP) enables zero-shot image manipulation guided by text prompts. However, their applications to diverse real images are still difficult due to the limited GAN inversion capability. Specifically, these approaches often have difficulties in reconstructing images with novel poses, views, and highly variable contents compared to the training data, altering object identity, or producing unwanted image artifacts. To mitigate these problems and enable faithful manipulation of real images, we propose a novel method, dubbed DiffusionCLIP, that performs text-driven image manipulation using diffusion models. Based on full inversion capability and high-quality image generation power of recent diffusion models, our method performs zero-shot image manipulation successfully even between unseen domains and takes another step towards general application by manipulating images from a widely varying ImageNet dataset. Furthermore, we propose a novel noise combination method that allows straightforward multi-attribute manipulation. Extensive experiments and human evaluation confirmed robust and superior manipulation performance of our methods compared to the existing baselines. Code is available at this https URL
Comments: Accepted to CVPR 2022
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Cite as: arXiv:2110.02711 [cs.CV]
  (or arXiv:2110.02711v6 [cs.CV] for this version)

Submission history

From: Jong Chul Ye [view email]
[v1] Wed, 6 Oct 2021 12:59:39 GMT (9486kb,D)
[v2] Mon, 6 Dec 2021 09:20:17 GMT (41998kb,D)
[v3] Sun, 27 Mar 2022 14:28:31 GMT (28602kb,D)
[v4] Tue, 5 Apr 2022 05:32:57 GMT (27790kb,D)
[v5] Thu, 2 Jun 2022 06:07:29 GMT (27790kb,D)
[v6] Thu, 11 Aug 2022 13:36:19 GMT (27790kb,D)

Link back to: arXiv, form interface, contact.