References & Citations
Computer Science > Computer Vision and Pattern Recognition
Title: Towards Arbitrary Text-driven Image Manipulation via Space Alignment
(Submitted on 25 Jan 2023 (v1), last revised 21 Sep 2023 (this version, v3))
Abstract: The recent GAN inversion methods have been able to successfully invert the real image input to the corresponding editable latent code in StyleGAN. By combining with the language-vision model (CLIP), some text-driven image manipulation methods are proposed. However, these methods require extra costs to perform optimization for a certain image or a new attribute editing mode. To achieve a more efficient editing method, we propose a new Text-driven image Manipulation framework via Space Alignment (TMSA). The Space Alignment module aims to align the same semantic regions in CLIP and StyleGAN spaces. Then, the text input can be directly accessed into the StyleGAN space and be used to find the semantic shift according to the text description. The framework can support arbitrary image editing mode without additional cost. Our work provides the user with an interface to control the attributes of a given image according to text input and get the result in real time. Ex tensive experiments demonstrate our superior performance over prior works.
Submission history
From: Yunpeng Bai [view email][v1] Wed, 25 Jan 2023 16:20:01 GMT (34462kb,D)
[v2] Wed, 13 Sep 2023 14:57:44 GMT (34532kb,D)
[v3] Thu, 21 Sep 2023 03:14:18 GMT (34532kb,D)
Link back to: arXiv, form interface, contact.