References & Citations
Electrical Engineering and Systems Science > Audio and Speech Processing
Title: Transformer Network for Semantically-Aware and Speech-Driven Upper-Face Generation
(Submitted on 9 Oct 2021 (v1), last revised 21 May 2022 (this version, v2))
Abstract: We propose a semantically-aware speech driven model to generate expressive and natural upper-facial and head motion for Embodied Conversational Agents (ECA). In this work, we aim to produce natural and continuous head motion and upper-facial gestures synchronized with speech. We propose a model that generates these gestures based on multimodal input features: the first modality is text, and the second one is speech prosody. Our model makes use of Transformers and Convolutions to map the multimodal features that correspond to an utterance to continuous eyebrows and head gestures. We conduct subjective and objective evaluations to validate our approach and compare it with state of the art.
Submission history
From: Nicolas Obin [view email][v1] Sat, 9 Oct 2021 09:38:40 GMT (954kb,D)
[v2] Sat, 21 May 2022 10:38:33 GMT (2115kb,D)
Link back to: arXiv, form interface, contact.