Improving face generation quality and prompt following with synthetic captions

Tarasiou, Michail; Moschoglou, Stylianos; Deng, Jiankang; Zafeiriou, Stefanos

Full-text links:

Download:

Current browse context:

cs.CV

< prev | next >

new | recent | 2405

Computer Science > Computer Vision and Pattern Recognition

Title: Improving face generation quality and prompt following with synthetic captions

Authors: Michail Tarasiou, Stylianos Moschoglou, Jiankang Deng, Stefanos Zafeiriou

(Submitted on 17 May 2024)

Abstract: Recent advancements in text-to-image generation using diffusion models have significantly improved the quality of generated images and expanded the ability to depict a wide range of objects. However, ensuring that these models adhere closely to the text prompts remains a considerable challenge. This issue is particularly pronounced when trying to generate photorealistic images of humans. Without significant prompt engineering efforts models often produce unrealistic images and typically fail to incorporate the full extent of the prompt information. This limitation can be largely attributed to the nature of captions accompanying the images used in training large scale diffusion models, which typically prioritize contextual information over details related to the person's appearance. In this paper we address this issue by introducing a training-free pipeline designed to generate accurate appearance descriptions from images of people. We apply this method to create approximately 250,000 captions for publicly available face datasets. We then use these synthetic captions to fine-tune a text-to-image diffusion model. Our results demonstrate that this approach significantly improves the model's ability to generate high-quality, realistic human faces and enhances adherence to the given prompts, compared to the baseline model. We share our synthetic captions, pretrained checkpoints and training code.

Subjects:	Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Cite as:	arXiv:2405.10864 [cs.CV]
	(or arXiv:2405.10864v1 [cs.CV] for this version)

Submission history

From: Michael Tarasiou [view email]
[v1] Fri, 17 May 2024 15:50:53 GMT (34217kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> cs > arXiv:2405.10864

Download:

Current browse context:

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

Computer Science > Computer Vision and Pattern Recognition

Title: Improving face generation quality and prompt following with synthetic captions

Submission history