References & Citations
Computer Science > Computer Vision and Pattern Recognition
Title: Rescaling Egocentric Vision
(Submitted on 23 Jun 2020 (v1), last revised 17 Sep 2021 (this version, v4))
Abstract: This paper introduces the pipeline to extend the largest dataset in egocentric vision, EPIC-KITCHENS. The effort culminates in EPIC-KITCHENS-100, a collection of 100 hours, 20M frames, 90K actions in 700 variable-length videos, capturing long-term unscripted activities in 45 environments, using head-mounted cameras. Compared to its previous version, EPIC-KITCHENS-100 has been annotated using a novel pipeline that allows denser (54% more actions per minute) and more complete annotations of fine-grained actions (+128% more action segments). This collection enables new challenges such as action detection and evaluating the "test of time" - i.e. whether models trained on data collected in 2018 can generalise to new footage collected two years later. The dataset is aligned with 6 challenges: action recognition (full and weak supervision), action detection, action anticipation, cross-modal retrieval (from captions), as well as unsupervised domain adaptation for action recognition. For each challenge, we define the task, provide baselines and evaluation metrics
Submission history
From: Dima Damen [view email][v1] Tue, 23 Jun 2020 18:28:04 GMT (6678kb,D)
[v2] Thu, 14 Jan 2021 20:11:27 GMT (28944kb,D)
[v3] Sat, 13 Feb 2021 11:11:01 GMT (28943kb,D)
[v4] Fri, 17 Sep 2021 17:17:48 GMT (20037kb,D)
Link back to: arXiv, form interface, contact.