On Modality Bias in the TVQA Dataset

Winterbottom, Thomas; Xiao, Sarah; McLean, Alistair; Moubayed, Noura Al

Full-text links:

Download:

Current browse context:

cs.CV

< prev | next >

new | recent | 2012

Computer Science > Computer Vision and Pattern Recognition

Title: On Modality Bias in the TVQA Dataset

Authors: Thomas Winterbottom, Sarah Xiao, Alistair McLean, Noura Al Moubayed

(Submitted on 18 Dec 2020)

Abstract: TVQA is a large scale video question answering (video-QA) dataset based on popular TV shows. The questions were specifically designed to require "both vision and language understanding to answer". In this work, we demonstrate an inherent bias in the dataset towards the textual subtitle modality. We infer said bias both directly and indirectly, notably finding that models trained with subtitles learn, on-average, to suppress video feature contribution. Our results demonstrate that models trained on only the visual information can answer ~45% of the questions, while using only the subtitles achieves ~68%. We find that a bilinear pooling based joint representation of modalities damages model performance by 9% implying a reliance on modality specific information. We also show that TVQA fails to benefit from the RUBi modality bias reduction technique popularised in VQA. By simply improving text processing using BERT embeddings with the simple model first proposed for TVQA, we achieve state-of-the-art results (72.13%) compared to the highly complex STAGE model (70.50%). We recommend a multimodal evaluation framework that can highlight biases in models and isolate visual and textual reliant subsets of data. Using this framework we propose subsets of TVQA that respond exclusively to either or both modalities in order to facilitate multimodal modelling as TVQA originally intended.

Comments:	10 pages, 4 Figures, 2 Tables, +Supp Mats, BMVC 2020
Subjects:	Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
MSC classes:	68T99
ACM classes:	I.2.10; I.2.7; I.2.4
Cite as:	arXiv:2012.10210 [cs.CV]
	(or arXiv:2012.10210v1 [cs.CV] for this version)

Submission history

From: Tom Winterbottom [view email]
[v1] Fri, 18 Dec 2020 13:06:23 GMT (2696kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> cs > arXiv:2012.10210

Download:

Current browse context:

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

Computer Science > Computer Vision and Pattern Recognition

Title: On Modality Bias in the TVQA Dataset

Submission history