Dealing with Missing Modalities in the Visual Question Answer-Difference Prediction Task through Knowledge Distillation

Cho, Jae Won; Kim, Dong-Jin; Choi, Jinsoo; Jung, Yunjae; Kweon, In So

Full-text links:

Download:

Current browse context:

cs.CV

< prev | next >

new | recent | 2104

Computer Science > Computer Vision and Pattern Recognition

Title: Dealing with Missing Modalities in the Visual Question Answer-Difference Prediction Task through Knowledge Distillation

Authors: Jae Won Cho, Dong-Jin Kim, Jinsoo Choi, Yunjae Jung, In So Kweon

(Submitted on 13 Apr 2021)

Abstract: In this work, we address the issues of missing modalities that have arisen from the Visual Question Answer-Difference prediction task and find a novel method to solve the task at hand. We address the missing modality-the ground truth answers-that are not present at test time and use a privileged knowledge distillation scheme to deal with the issue of the missing modality. In order to efficiently do so, we first introduce a model, the "Big" Teacher, that takes the image/question/answer triplet as its input and outperforms the baseline, then use a combination of models to distill knowledge to a target network (student) that only takes the image/question pair as its inputs. We experiment our models on the VizWiz and VQA-V2 Answer Difference datasets and show through extensive experimentation and ablation the performances of our method and a diverse possibility for future research.

Comments:	To appear in CVPR MULA Workshop
Subjects:	Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL)
Cite as:	arXiv:2104.05965 [cs.CV]
	(or arXiv:2104.05965v1 [cs.CV] for this version)

Submission history

From: Jae Won Cho [view email]
[v1] Tue, 13 Apr 2021 06:41:11 GMT (506kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> cs > arXiv:2104.05965

Download:

Current browse context:

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

Computer Science > Computer Vision and Pattern Recognition

Title: Dealing with Missing Modalities in the Visual Question Answer-Difference Prediction Task through Knowledge Distillation

Submission history