Learning Cross-modal Contrastive Features for Video Domain Adaptation

Kim, Donghyun; Tsai, Yi-Hsuan; Zhuang, Bingbing; Yu, Xiang; Sclaroff, Stan; Saenko, Kate; Chandraker, Manmohan

Full-text links:

Download:

Current browse context:

cs.CV

< prev | next >

new | recent | 2108

Change to browse by:

Computer Science > Computer Vision and Pattern Recognition

Title: Learning Cross-modal Contrastive Features for Video Domain Adaptation

Authors: Donghyun Kim, Yi-Hsuan Tsai, Bingbing Zhuang, Xiang Yu, Stan Sclaroff, Kate Saenko, Manmohan Chandraker

(Submitted on 26 Aug 2021)

Abstract: Learning transferable and domain adaptive feature representations from videos is important for video-relevant tasks such as action recognition. Existing video domain adaptation methods mainly rely on adversarial feature alignment, which has been derived from the RGB image space. However, video data is usually associated with multi-modal information, e.g., RGB and optical flow, and thus it remains a challenge to design a better method that considers the cross-modal inputs under the cross-domain adaptation setting. To this end, we propose a unified framework for video domain adaptation, which simultaneously regularizes cross-modal and cross-domain feature representations. Specifically, we treat each modality in a domain as a view and leverage the contrastive learning technique with properly designed sampling strategies. As a result, our objectives regularize feature spaces, which originally lack the connection across modalities or have less alignment across domains. We conduct experiments on domain adaptive action recognition benchmark datasets, i.e., UCF, HMDB, and EPIC-Kitchens, and demonstrate the effectiveness of our components against state-of-the-art algorithms.

Comments:	Accepted in ICCV'21
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2108.11974 [cs.CV]
	(or arXiv:2108.11974v1 [cs.CV] for this version)

Submission history

From: Donghyun Kim [view email]
[v1] Thu, 26 Aug 2021 18:14:18 GMT (12079kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> cs > arXiv:2108.11974

Download:

Current browse context:

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

Computer Science > Computer Vision and Pattern Recognition

Title: Learning Cross-modal Contrastive Features for Video Domain Adaptation

Submission history