DiVAE: Photorealistic Images Synthesis with Denoising Diffusion Decoder

Shi, Jie; Wu, Chenfei; Liang, Jian; Liu, Xiang; Duan, Nan

Full-text links:

Download:

Current browse context:

cs.CV

< prev | next >

new | recent | 2206

Computer Science > Computer Vision and Pattern Recognition

Title: DiVAE: Photorealistic Images Synthesis with Denoising Diffusion Decoder

Authors: Jie Shi, Chenfei Wu, Jian Liang, Xiang Liu, Nan Duan

(Submitted on 1 Jun 2022)

Abstract: Recently most successful image synthesis models are multi stage process to combine the advantages of different methods, which always includes a VAE-like model for faithfully reconstructing embedding to image and a prior model to generate image embedding. At the same time, diffusion models have shown be capacity to generate high-quality synthetic images. Our work proposes a VQ-VAE architecture model with a diffusion decoder (DiVAE) to work as the reconstructing component in image synthesis. We explore how to input image embedding into diffusion model for excellent performance and find that simple modification on diffusion's UNet can achieve it. Training on ImageNet, Our model achieves state-of-the-art results and generates more photorealistic images specifically. In addition, we apply the DiVAE with an Auto-regressive generator on conditional synthesis tasks to perform more human-feeling and detailed samples.

Subjects:	Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2206.00386 [cs.CV]
	(or arXiv:2206.00386v1 [cs.CV] for this version)

Submission history

From: Jie Shi [view email]
[v1] Wed, 1 Jun 2022 10:39:12 GMT (2830kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> cs > arXiv:2206.00386

Download:

Current browse context:

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

Computer Science > Computer Vision and Pattern Recognition

Title: DiVAE: Photorealistic Images Synthesis with Denoising Diffusion Decoder

Submission history