We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

cs.CV

Change to browse by:

cs

References & Citations

DBLP - CS Bibliography

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo

Computer Science > Computer Vision and Pattern Recognition

Title: Transformed ROIs for Capturing Visual Transformations in Videos

Abstract: Modeling the visual changes that an action brings to a scene is critical for video understanding. Currently, CNNs process one local neighbourhood at a time, thus contextual relationships over longer ranges, while still learnable, are indirect. We present TROI, a plug-and-play module for CNNs to reason between mid-level feature representations that are otherwise separated in space and time. The module relates localized visual entities such as hands and interacting objects and transforms their corresponding regions of interest directly in the feature maps of convolutional layers. With TROI, we achieve state-of-the-art action recognition results on the large-scale datasets Something-Something-V2 and EPIC-Kitchens-100.
Comments: CVIU 2022 - Computer Vision and Image Understanding
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Cite as: arXiv:2106.03162 [cs.CV]
  (or arXiv:2106.03162v2 [cs.CV] for this version)

Submission history

From: Fadime Sener [view email]
[v1] Sun, 6 Jun 2021 15:59:53 GMT (7946kb,D)
[v2] Sat, 5 Nov 2022 17:57:37 GMT (8003kb,D)

Link back to: arXiv, form interface, contact.