Multimodal generation of upper-facial and head gestures with a Transformer Network using speech and text

Fares, Mireille; Pelachaud, Catherine; Obin, Nicolas

Full-text links:

Download:

Current browse context:

eess.AS

< prev | next >

new | recent | 2110

Electrical Engineering and Systems Science > Audio and Speech Processing

Title: Multimodal generation of upper-facial and head gestures with a Transformer Network using speech and text

Authors: Mireille Fares, Catherine Pelachaud, Nicolas Obin

(Submitted on 9 Oct 2021 (this version), latest version 21 May 2022 (v2))

Abstract: We propose a semantically-aware speech driven method to generate expressive and natural upper-facial and head motion for Embodied Conversational Agents (ECA). In this work, we tackle two key challenges: produce natural and continuous head motion and upper-facial gestures. We propose a model that generates gestures based on multimodal input features: the first modality is text, and the second one is speech prosody. Our model makes use of Transformers and Convolutions to map the multimodal features that correspond to an utterance to continuous eyebrows and head gestures. We conduct subjective and objective evaluations to validate our approach.

Subjects:	Audio and Speech Processing (eess.AS)
Cite as:	arXiv:2110.04527 [eess.AS]
	(or arXiv:2110.04527v1 [eess.AS] for this version)

Submission history

From: Nicolas Obin [view email]
[v1] Sat, 9 Oct 2021 09:38:40 GMT (954kb,D)
[v2] Sat, 21 May 2022 10:38:33 GMT (2115kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> eess > arXiv:2110.04527v1

Download:

Current browse context:

Change to browse by:

References & Citations

Bookmark

Electrical Engineering and Systems Science > Audio and Speech Processing

Title: Multimodal generation of upper-facial and head gestures with a Transformer Network using speech and text

Submission history