We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

cs.CV

Change to browse by:

cs

References & Citations

DBLP - CS Bibliography

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo

Computer Science > Computer Vision and Pattern Recognition

Title: Weakly-Supervised Temporal Action Localization Through Local-Global Background Modeling

Abstract: Weakly-Supervised Temporal Action Localization (WS-TAL) task aims to recognize and localize temporal starts and ends of action instances in an untrimmed video with only video-level label supervision. Due to lack of negative samples of background category, it is difficult for the network to separate foreground and background, resulting in poor detection performance. In this report, we present our 2021 HACS Challenge - Weakly-supervised Learning Track solution that based on BaSNet to address above problem. Specifically, we first adopt pre-trained CSN, Slowfast, TDN, and ViViT as feature extractors to get feature sequences. Then our proposed Local-Global Background Modeling Network (LGBM-Net) is trained to localize instances by using only video-level labels based on Multi-Instance Learning (MIL). Finally, we ensemble multiple models to get the final detection results and reach 22.45% mAP on the test set
Comments: CVPR-2021 HACS Challenge - Weakly-supervised Learning Track champion solution (1st Place)
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Journal reference: CVPRW-2021
Cite as: arXiv:2106.11811 [cs.CV]
  (or arXiv:2106.11811v1 [cs.CV] for this version)

Submission history

From: Xiang Wang [view email]
[v1] Sun, 20 Jun 2021 02:58:45 GMT (281kb,D)

Link back to: arXiv, form interface, contact.