References & Citations
Computer Science > Computer Vision and Pattern Recognition
Title: VideoDG: Generalizing Temporal Relations in Videos to Novel Domains
(Submitted on 8 Dec 2019 (v1), last revised 17 Sep 2021 (this version, v2))
Abstract: This paper introduces video domain generalization where most video classification networks degenerate due to the lack of exposure to the target domains of divergent distributions. We observe that the global temporal features are less generalizable, due to the temporal domain shift that videos from other unseen domains may have an unexpected absence or misalignment of the temporal relations. This finding has motivated us to solve video domain generalization by effectively learning the local-relation features of different timescales that are more generalizable, and exploiting them along with the global-relation features to maintain the discriminability. This paper presents the VideoDG framework with two technical contributions. The first is a new deep architecture named the Adversarial Pyramid Network, which improves the generalizability of video features by capturing the local-relation, global-relation, and cross-relation features progressively. On the basis of pyramid features, the second contribution is a new and robust approach of adversarial data augmentation that can bridge different video domains by improving the diversity and quality of augmented data. We construct three video domain generalization benchmarks in which domains are divided according to different datasets, different consequences of actions, or different camera views, respectively. VideoDG consistently outperforms the combinations of previous video classification models and existing domain generalization methods on all benchmarks.
Submission history
From: Yunbo Wang [view email][v1] Sun, 8 Dec 2019 17:13:51 GMT (1844kb,D)
[v2] Fri, 17 Sep 2021 01:57:23 GMT (7965kb,D)
Link back to: arXiv, form interface, contact.