In the Eye of the Beholder: Gaze and Actions in First Person Video

Li, Yin; Liu, Miao; Rehg, James M.

Full-text links:

Download:

Current browse context:

cs.CV

< prev | next >

new | recent | 2006

Change to browse by:

Computer Science > Computer Vision and Pattern Recognition

Title: In the Eye of the Beholder: Gaze and Actions in First Person Video

Authors: Yin Li, Miao Liu, James M. Rehg

(Submitted on 31 May 2020 (v1), last revised 31 Oct 2020 (this version, v2))

Abstract: We address the task of jointly determining what a person is doing and where they are looking based on the analysis of video captured by a headworn camera. To facilitate our research, we first introduce the EGTEA Gaze+ dataset. Our dataset comes with videos, gaze tracking data, hand masks and action annotations, thereby providing the most comprehensive benchmark for First Person Vision (FPV). Moving beyond the dataset, we propose a novel deep model for joint gaze estimation and action recognition in FPV. Our method describes the participant's gaze as a probabilistic variable and models its distribution using stochastic units in a deep network. We further sample from these stochastic units, generating an attention map to guide the aggregation of visual features for action recognition. Our method is evaluated on our EGTEA Gaze+ dataset and achieves a performance level that exceeds the state-of-the-art by a significant margin. More importantly, we demonstrate that our model can be applied to larger scale FPV dataset---EPIC-Kitchens even without using gaze, offering new state-of-the-art results on FPV action recognition.

Comments:	Submitted to TPAMI
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2006.00626 [cs.CV]
	(or arXiv:2006.00626v2 [cs.CV] for this version)

Submission history

From: Yin Li [view email]
[v1] Sun, 31 May 2020 22:06:06 GMT (6138kb,D)
[v2] Sat, 31 Oct 2020 05:00:32 GMT (10746kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> cs > arXiv:2006.00626

Download:

Current browse context:

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

Computer Science > Computer Vision and Pattern Recognition

Title: In the Eye of the Beholder: Gaze and Actions in First Person Video

Submission history