On Data Scaling in Masked Image Modeling

Xie, Zhenda; Zhang, Zheng; Cao, Yue; Lin, Yutong; Wei, Yixuan; Dai, Qi; Hu, Han

Full-text links:

Download:

Current browse context:

cs.CV

< prev | next >

new | recent | 2206

Change to browse by:

Computer Science > Computer Vision and Pattern Recognition

Title: On Data Scaling in Masked Image Modeling

Authors: Zhenda Xie, Zheng Zhang, Yue Cao, Yutong Lin, Yixuan Wei, Qi Dai, Han Hu

(Submitted on 9 Jun 2022)

Abstract: An important goal of self-supervised learning is to enable model pre-training to benefit from almost unlimited data. However, one method that has recently become popular, namely masked image modeling (MIM), is suspected to be unable to benefit from larger data. In this work, we break this misconception through extensive experiments, with data scales ranging from 10\% of ImageNet-1K to full ImageNet-22K, model sizes ranging from 49 million to 1 billion, and training lengths ranging from 125K iterations to 500K iterations. Our study reveals that: (i) Masked image modeling is also demanding on larger data. We observed that very large models got over-fitted with relatively small data; (ii) The length of training matters. Large models trained with masked image modeling can benefit from more data with longer training; (iii) The validation loss in pre-training is a good indicator to measure how well the model performs for fine-tuning on multiple tasks. This observation allows us to pre-evaluate pre-trained models in advance without having to make costly trial-and-error assessments of downstream tasks. We hope that our findings will advance the understanding of masked image modeling in terms of scaling ability.

Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2206.04664 [cs.CV]
	(or arXiv:2206.04664v1 [cs.CV] for this version)

Submission history

From: Zheng Zhang [view email]
[v1] Thu, 9 Jun 2022 17:58:24 GMT (8451kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> cs > arXiv:2206.04664

Download:

Current browse context:

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

Computer Science > Computer Vision and Pattern Recognition

Title: On Data Scaling in Masked Image Modeling

Submission history