VideoDG: Generalizing Temporal Relations in Videos to Novel Domains

Yao, Zhiyu; Wang, Yunbo; Wang, Jianmin; Yu, Philip S.; Long, Mingsheng

Full-text links:

Download:

Current browse context:

cs.CV

< prev | next >

new | recent | 1912

Computer Science > Computer Vision and Pattern Recognition

Title: VideoDG: Generalizing Temporal Relations in Videos to Novel Domains

Authors: Zhiyu Yao, Yunbo Wang, Jianmin Wang, Philip S. Yu, Mingsheng Long

(Submitted on 8 Dec 2019 (v1), last revised 17 Sep 2021 (this version, v2))

Abstract: This paper introduces video domain generalization where most video classification networks degenerate due to the lack of exposure to the target domains of divergent distributions. We observe that the global temporal features are less generalizable, due to the temporal domain shift that videos from other unseen domains may have an unexpected absence or misalignment of the temporal relations. This finding has motivated us to solve video domain generalization by effectively learning the local-relation features of different timescales that are more generalizable, and exploiting them along with the global-relation features to maintain the discriminability. This paper presents the VideoDG framework with two technical contributions. The first is a new deep architecture named the Adversarial Pyramid Network, which improves the generalizability of video features by capturing the local-relation, global-relation, and cross-relation features progressively. On the basis of pyramid features, the second contribution is a new and robust approach of adversarial data augmentation that can bridge different video domains by improving the diversity and quality of augmented data. We construct three video domain generalization benchmarks in which domains are divided according to different datasets, different consequences of actions, or different camera views, respectively. VideoDG consistently outperforms the combinations of previous video classification models and existing domain generalization methods on all benchmarks.

Comments:	Accepted by IEEE TPAMI, 2021. Code: this https URL
Subjects:	Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Cite as:	arXiv:1912.03716 [cs.CV]
	(or arXiv:1912.03716v2 [cs.CV] for this version)

Submission history

From: Yunbo Wang [view email]
[v1] Sun, 8 Dec 2019 17:13:51 GMT (1844kb,D)
[v2] Fri, 17 Sep 2021 01:57:23 GMT (7965kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> cs > arXiv:1912.03716v2

Download:

Current browse context:

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

Computer Science > Computer Vision and Pattern Recognition

Title: VideoDG: Generalizing Temporal Relations in Videos to Novel Domains

Submission history