Adapting Self-Supervised Vision Transformers by Probing Attention-Conditioned Masking Consistency

Prabhu, Viraj; Yenamandra, Sriram; Singh, Aaditya; Hoffman, Judy

Full-text links:

Download:

Current browse context:

cs.CV

< prev | next >

new | recent | 2206

Computer Science > Computer Vision and Pattern Recognition

Title: Adapting Self-Supervised Vision Transformers by Probing Attention-Conditioned Masking Consistency

Authors: Viraj Prabhu, Sriram Yenamandra, Aaditya Singh, Judy Hoffman

(Submitted on 16 Jun 2022)

Abstract: Visual domain adaptation (DA) seeks to transfer trained models to unseen, unlabeled domains across distribution shift, but approaches typically focus on adapting convolutional neural network architectures initialized with supervised ImageNet representations. In this work, we shift focus to adapting modern architectures for object recognition -- the increasingly popular Vision Transformer (ViT) -- and modern pretraining based on self-supervised learning (SSL). Inspired by the design of recent SSL approaches based on learning from partial image inputs generated via masking or cropping -- either by learning to predict the missing pixels, or learning representational invariances to such augmentations -- we propose PACMAC, a simple two-stage adaptation algorithm for self-supervised ViTs. PACMAC first performs in-domain SSL on pooled source and target data to learn task-discriminative features, and then probes the model's predictive consistency across a set of partial target inputs generated via a novel attention-conditioned masking strategy, to identify reliable candidates for self-training. Our simple approach leads to consistent performance gains over competing methods that use ViTs and self-supervised initializations on standard object recognition benchmarks. Code available at this https URL

Subjects:	Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Cite as:	arXiv:2206.08222 [cs.CV]
	(or arXiv:2206.08222v1 [cs.CV] for this version)

Submission history

From: Viraj Prabhu [view email]
[v1] Thu, 16 Jun 2022 14:46:10 GMT (7307kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> cs > arXiv:2206.08222

Download:

Current browse context:

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

Computer Science > Computer Vision and Pattern Recognition

Title: Adapting Self-Supervised Vision Transformers by Probing Attention-Conditioned Masking Consistency

Submission history