Multimodal analysis of the predictability of hand-gesture properties

Kucherenko, Taras; Nagy, Rajmund; Neff, Michael; Kjellström, Hedvig; Henter, Gustav Eje

Full-text links:

Download:

Current browse context:

cs.HC

< prev | next >

new | recent | 2108

Computer Science > Human-Computer Interaction

Title: Multimodal analysis of the predictability of hand-gesture properties

Authors: Taras Kucherenko, Rajmund Nagy, Michael Neff, Hedvig Kjellström, Gustav Eje Henter

(Submitted on 12 Aug 2021 (v1), last revised 14 Jan 2022 (this version, v3))

Abstract: Embodied conversational agents benefit from being able to accompany their speech with gestures. Although many data-driven approaches to gesture generation have been proposed in recent years, it is still unclear whether such systems can consistently generate gestures that convey meaning. We investigate which gesture properties (phase, category, and semantics) can be predicted from speech text and/or audio using contemporary deep learning. In extensive experiments, we show that gesture properties related to gesture meaning (semantics and category) are predictable from text features (time-aligned FastText embeddings) alone, but not from prosodic audio features, while rhythm-related gesture properties (phase) on the other hand can be predicted from audio features better than from text. These results are encouraging as they indicate that it is possible to equip an embodied agent with content-wise meaningful co-speech gestures using a machine-learning model.

Comments:	Accepted at the International Conference on Autonomous Agents and Multiagent Systems (AAMAS) 2022
Subjects:	Human-Computer Interaction (cs.HC); Machine Learning (cs.LG); Multimedia (cs.MM)
Cite as:	arXiv:2108.05762 [cs.HC]
	(or arXiv:2108.05762v3 [cs.HC] for this version)

Submission history

From: Taras Kucherenko [view email]
[v1] Thu, 12 Aug 2021 14:16:00 GMT (4856kb,D)
[v2] Wed, 13 Oct 2021 08:46:37 GMT (5372kb,D)
[v3] Fri, 14 Jan 2022 10:34:25 GMT (5808kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> cs > arXiv:2108.05762

Download:

Current browse context:

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

Computer Science > Human-Computer Interaction

Title: Multimodal analysis of the predictability of hand-gesture properties

Submission history