Multimodal Semantic-Aware Automatic Colorization with Diffusion Prior

Wang, Han; Chai, Xinning; Wang, Yiwen; Zhang, Yuhong; Xie, Rong; Song, Li

Full-text links:

Download:

Current browse context:

cs.CV

< prev | next >

new | recent | 2404

Change to browse by:

Computer Science > Computer Vision and Pattern Recognition

Title: Multimodal Semantic-Aware Automatic Colorization with Diffusion Prior

Authors: Han Wang, Xinning Chai, Yiwen Wang, Yuhong Zhang, Rong Xie, Li Song

(Submitted on 25 Apr 2024)

Abstract: Colorizing grayscale images offers an engaging visual experience. Existing automatic colorization methods often fail to generate satisfactory results due to incorrect semantic colors and unsaturated colors. In this work, we propose an automatic colorization pipeline to overcome these challenges. We leverage the extraordinary generative ability of the diffusion prior to synthesize color with plausible semantics. To overcome the artifacts introduced by the diffusion prior, we apply the luminance conditional guidance. Moreover, we adopt multimodal high-level semantic priors to help the model understand the image content and deliver saturated colors. Besides, a luminance-aware decoder is designed to restore details and enhance overall visual quality. The proposed pipeline synthesizes saturated colors while maintaining plausible semantics. Experiments indicate that our proposed method considers both diversity and fidelity, surpassing previous methods in terms of perceptual realism and gain most human preference.

Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2404.16678 [cs.CV]
	(or arXiv:2404.16678v1 [cs.CV] for this version)

Submission history

From: Wang Han [view email]
[v1] Thu, 25 Apr 2024 15:28:22 GMT (1367kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> cs > arXiv:2404.16678

Download:

Current browse context:

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

Computer Science > Computer Vision and Pattern Recognition

Title: Multimodal Semantic-Aware Automatic Colorization with Diffusion Prior

Submission history