We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

cs.CV

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo

Computer Science > Computer Vision and Pattern Recognition

Title: Self-Supervised Vision-Based Detection of the Active Speaker as Support for Socially-Aware Language Acquisition

Abstract: This paper presents a self-supervised method for visual detection of the active speaker in a multi-person spoken interaction scenario. Active speaker detection is a fundamental prerequisite for any artificial cognitive system attempting to acquire language in social settings. The proposed method is intended to complement the acoustic detection of the active speaker, thus improving the system robustness in noisy conditions. The method can detect an arbitrary number of possibly overlapping active speakers based exclusively on visual information about their face. Furthermore, the method does not rely on external annotations, thus complying with cognitive development. Instead, the method uses information from the auditory modality to support learning in the visual domain. This paper reports an extensive evaluation of the proposed method using a large multi-person face-to-face interaction dataset. The results show good performance in a speaker dependent setting. However, in a speaker independent setting the proposed method yields a significantly lower performance. We believe that the proposed method represents an essential component of any artificial cognitive system or robotic platform engaging in social interactions.
Comments: 10 pages, IEEE Transactions on Cognitive and Developmental Systems
Subjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL); Human-Computer Interaction (cs.HC); Machine Learning (cs.LG); Machine Learning (stat.ML)
ACM classes: I.2; I.4; I.5
DOI: 10.1109/TCDS.2019.2927941
Cite as: arXiv:1711.08992 [cs.CV]
  (or arXiv:1711.08992v2 [cs.CV] for this version)

Submission history

From: Kalin Stefanov [view email]
[v1] Fri, 24 Nov 2017 14:45:06 GMT (490kb,D)
[v2] Thu, 18 Jul 2019 17:55:38 GMT (2338kb,D)

Link back to: arXiv, form interface, contact.