Where are my Neighbors? Exploiting Patches Relations in Self-Supervised Vision Transformer

Camporese, Guglielmo; Izzo, Elena; Ballan, Lamberto

Full-text links:

Download:

Computer Science > Computer Vision and Pattern Recognition

Title: Where are my Neighbors? Exploiting Patches Relations in Self-Supervised Vision Transformer

Authors: Guglielmo Camporese, Elena Izzo, Lamberto Ballan

(Submitted on 1 Jun 2022 (v1), last revised 13 Oct 2022 (this version, v2))

Abstract: Vision Transformers (ViTs) enabled the use of the transformer architecture on vision tasks showing impressive performances when trained on big datasets. However, on relatively small datasets, ViTs are less accurate given their lack of inductive bias. To this end, we propose a simple but still effective Self-Supervised Learning (SSL) strategy to train ViTs, that without any external annotation or external data, can significantly improve the results. Specifically, we define a set of SSL tasks based on relations of image patches that the model has to solve before or jointly the supervised task. Differently from ViT, our RelViT model optimizes all the output tokens of the transformer encoder that are related to the image patches, thus exploiting more training signals at each training step. We investigated our methods on several image benchmarks finding that RelViT improves the SSL state-of-the-art methods by a large margin, especially on small datasets. Code is available at: this https URL

Comments:	Accepted to BMVC 2022
Subjects:	Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Cite as:	arXiv:2206.00481 [cs.CV]
	(or arXiv:2206.00481v2 [cs.CV] for this version)

Submission history

From: Lamberto Ballan [view email]
[v1] Wed, 1 Jun 2022 13:25:32 GMT (4287kb,D)
[v2] Thu, 13 Oct 2022 14:11:34 GMT (6399kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> cs > arXiv:2206.00481

Download:

Current browse context:

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

Computer Science > Computer Vision and Pattern Recognition

Title: Where are my Neighbors? Exploiting Patches Relations in Self-Supervised Vision Transformer

Submission history