RGMIM: Region-Guided Masked Image Modeling for Learning Meaningful Representation from X-Ray Images

Li, Guang; Togo, Ren; Ogawa, Takahiro; Haseyama, Miki

Full-text links:

Download:

Current browse context:

cs.CV

< prev | next >

new | recent | 2211

Computer Science > Computer Vision and Pattern Recognition

Title: RGMIM: Region-Guided Masked Image Modeling for Learning Meaningful Representation from X-Ray Images

Authors: Guang Li, Ren Togo, Takahiro Ogawa, Miki Haseyama

(Submitted on 1 Nov 2022 (v1), last revised 21 May 2023 (this version, v4))

Abstract: Purpose: Self-supervised learning has been gaining attention in the medical field for its potential to improve computer-aided diagnosis. One popular method of self-supervised learning is masked image modeling (MIM), which involves masking a subset of input pixels and predicting the masked pixels. However, traditional MIM methods typically use a random masking strategy, which may not be ideal for medical images that often have a small region of interest for disease detection. To address this issue, this work aims to improve MIM for medical images and evaluate its effectiveness in an open X-ray image dataset. Methods: In this paper, we present a novel method called region-guided masked image modeling (RGMIM) for learning meaningful representation from X-ray images. Our method adopts a new masking strategy that utilizes organ mask information to identify valid regions for learning more meaningful representations. The proposed method was contrasted with five self-supervised learning techniques (MAE, SKD, Cross, BYOL, and, SimSiam). We conduct quantitative evaluations on an open lung X-ray image dataset as well as masking ratio hyperparameter studies. Results: When using the entire training set, RGMIM outperformed other comparable methods, achieving a 0.962 lung disease detection accuracy. Specifically, RGMIM significantly improved performance in small data volumes, such as 5% and 10% of the training set (846 and 1,693 images) compared to other methods, and achieved a 0.957 detection accuracy even when only 50% of the training set was used. Conclusions: RGMIM can mask more valid regions, facilitating the learning of discriminative representations and the subsequent high-accuracy lung disease detection. RGMIM outperforms other state-of-the-art self-supervised learning methods in experiments, particularly when limited training data is used.

Subjects:	Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Image and Video Processing (eess.IV)
Cite as:	arXiv:2211.00313 [cs.CV]
	(or arXiv:2211.00313v4 [cs.CV] for this version)

Submission history

From: Guang Li [view email]
[v1] Tue, 1 Nov 2022 07:41:03 GMT (2232kb,D)
[v2] Fri, 17 Mar 2023 01:55:07 GMT (1367kb,D)
[v3] Thu, 20 Apr 2023 10:06:36 GMT (11176kb,D)
[v4] Sun, 21 May 2023 14:36:59 GMT (11182kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> cs > arXiv:2211.00313

Download:

Current browse context:

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

Computer Science > Computer Vision and Pattern Recognition

Title: RGMIM: Region-Guided Masked Image Modeling for Learning Meaningful Representation from X-Ray Images

Submission history