TSI: Temporal Saliency Integration for Video Action Recognition

Su, Haisheng; Li, Kunchang; Feng, Jinyuan; Wang, Dongliang; Gan, Weihao; Wu, Wei; Qiao, Yu

Full-text links:

Download:

Current browse context:

cs.CV

< prev | next >

new | recent | 2106

Change to browse by:

Computer Science > Computer Vision and Pattern Recognition

Title: TSI: Temporal Saliency Integration for Video Action Recognition

Authors: Haisheng Su, Kunchang Li, Jinyuan Feng, Dongliang Wang, Weihao Gan, Wei Wu, Yu Qiao

(Submitted on 2 Jun 2021 (v1), last revised 17 Dec 2021 (this version, v4))

Abstract: Efficient spatiotemporal modeling is an important yet challenging problem for video action recognition. Existing state-of-the-art methods exploit neighboring feature differences to obtain motion clues for short-term temporal modeling with a simple convolution. However, only one local convolution is incapable of handling various kinds of actions because of the limited receptive field. Besides, action-irrelated noises brought by camera movement will also harm the quality of extracted motion features. In this paper, we propose a Temporal Saliency Integration (TSI) block, which mainly contains a Salient Motion Excitation (SME) module and a Cross-perception Temporal Integration (CTI) module. Specifically, SME aims to highlight the motion-sensitive area through spatial-level local-global motion modeling, where the saliency alignment and pyramidal motion modeling are conducted successively between adjacent frames to capture motion dynamics with fewer noises caused by misaligned background. CTI is designed to perform multi-perception temporal modeling through a group of separate 1D convolutions respectively. Meanwhile, temporal interactions across different perceptions are integrated with the attention mechanism. Through these two modules, long short-term temporal relationships can be encoded efficiently by introducing limited additional parameters. Extensive experiments are conducted on several popular benchmarks (i.e., Something-Something V1 & V2, Kinetics-400, UCF-101, and HMDB-51), which demonstrate the effectiveness of our proposed method.

Comments:	Submitted to CVPR 2022
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2106.01088 [cs.CV]
	(or arXiv:2106.01088v4 [cs.CV] for this version)

Submission history

From: Haisheng Su [view email]
[v1] Wed, 2 Jun 2021 11:43:49 GMT (6867kb,D)
[v2] Wed, 8 Sep 2021 07:25:18 GMT (5751kb,D)
[v3] Wed, 15 Dec 2021 06:54:09 GMT (8298kb,D)
[v4] Fri, 17 Dec 2021 05:58:51 GMT (8298kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> cs > arXiv:2106.01088

Download:

Current browse context:

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

Computer Science > Computer Vision and Pattern Recognition

Title: TSI: Temporal Saliency Integration for Video Action Recognition

Submission history