We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

cs

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo

Computer Science > Computer Vision and Pattern Recognition

Title: Region-based Non-local Operation for Video Classification

Abstract: Convolutional Neural Networks (CNNs) model long-range dependencies by deeply stacking convolution operations with small window sizes, which makes the optimizations difficult. This paper presents region-based non-local (RNL) operations as a family of self-attention mechanisms, which can directly capture long-range dependencies without using a deep stack of local operations. Given an intermediate feature map, our method recalibrates the feature at a position by aggregating the information from the neighboring regions of all positions. By combining a channel attention module with the proposed RNL, we design an attention chain, which can be integrated into the off-the-shelf CNNs for end-to-end training. We evaluate our method on two video classification benchmarks. The experimental results of our method outperform other attention mechanisms, and we achieve state-of-the-art performance on the Something-Something V1 dataset.
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Journal reference: ICPR2020
Cite as: arXiv:2007.09033 [cs.CV]
  (or arXiv:2007.09033v5 [cs.CV] for this version)

Submission history

From: Guoxi Huang [view email]
[v1] Fri, 17 Jul 2020 14:57:05 GMT (5414kb,D)
[v2] Fri, 24 Jul 2020 22:13:13 GMT (5494kb,D)
[v3] Wed, 21 Oct 2020 22:57:35 GMT (7088kb,D)
[v4] Mon, 21 Dec 2020 21:39:57 GMT (7088kb,D)
[v5] Tue, 2 Feb 2021 00:21:37 GMT (7088kb,D)

Link back to: arXiv, form interface, contact.