Speech Fusion to Face: Bridging the Gap Between Human's Vocal Characteristics and Facial Imaging

Bai, Yeqi; Ma, Tao; Wang, Lipo; Zhang, Zhenjie

Full-text links:

Download:

Current browse context:

cs.CV

< prev | next >

new | recent | 2006

Change to browse by:

Computer Science > Computer Vision and Pattern Recognition

Title: Speech Fusion to Face: Bridging the Gap Between Human's Vocal Characteristics and Facial Imaging

Authors: Yeqi Bai, Tao Ma, Lipo Wang, Zhenjie Zhang

(Submitted on 10 Jun 2020)

Abstract: While deep learning technologies are now capable of generating realistic images confusing humans, the research efforts are turning to the synthesis of images for more concrete and application-specific purposes. Facial image generation based on vocal characteristics from speech is one of such important yet challenging tasks. It is the key enabler to influential use cases of image generation, especially for business in public security and entertainment. Existing solutions to the problem of speech2face renders limited image quality and fails to preserve facial similarity due to the lack of quality dataset for training and appropriate integration of vocal features. In this paper, we investigate these key technical challenges and propose Speech Fusion to Face, or SF2F in short, attempting to address the issue of facial image quality and the poor connection between vocal feature domain and modern image generation models. By adopting new strategies on data model and training, we demonstrate dramatic performance boost over state-of-the-art solution, by doubling the recall of individual identity, and lifting the quality score from 15 to 19 based on the mutual information score with VGGFace classifier.

Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2006.05888 [cs.CV]
	(or arXiv:2006.05888v1 [cs.CV] for this version)

Submission history

From: Yeqi Bai [view email]
[v1] Wed, 10 Jun 2020 15:19:31 GMT (5550kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> cs > arXiv:2006.05888

Download:

Current browse context:

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

Computer Science > Computer Vision and Pattern Recognition

Title: Speech Fusion to Face: Bridging the Gap Between Human's Vocal Characteristics and Facial Imaging

Submission history