VisFIS: Visual Feature Importance Supervision with Right-for-the-Right-Reason Objectives

Ying, Zhuofan; Hase, Peter; Bansal, Mohit

Full-text links:

Download:

Current browse context:

cs.CV

< prev | next >

new | recent | 2206

Computer Science > Computer Vision and Pattern Recognition

Title: VisFIS: Visual Feature Importance Supervision with Right-for-the-Right-Reason Objectives

Authors: Zhuofan Ying, Peter Hase, Mohit Bansal

(Submitted on 22 Jun 2022 (v1), last revised 25 Oct 2022 (this version, v2))

Abstract: Many past works aim to improve visual reasoning in models by supervising feature importance (estimated by model explanation techniques) with human annotations such as highlights of important image regions. However, recent work has shown that performance gains from feature importance (FI) supervision for Visual Question Answering (VQA) tasks persist even with random supervision, suggesting that these methods do not meaningfully align model FI with human FI. In this paper, we show that model FI supervision can meaningfully improve VQA model accuracy as well as performance on several Right-for-the-Right-Reason (RRR) metrics by optimizing for four key model objectives: (1) accurate predictions given limited but sufficient information (Sufficiency); (2) max-entropy predictions given no important information (Uncertainty); (3) invariance of predictions to changes in unimportant features (Invariance); and (4) alignment between model FI explanations and human FI explanations (Plausibility). Our best performing method, Visual Feature Importance Supervision (VisFIS), outperforms strong baselines on benchmark VQA datasets in terms of both in-distribution and out-of-distribution accuracy. While past work suggests that the mechanism for improved accuracy is through improved explanation plausibility, we show that this relationship depends crucially on explanation faithfulness (whether explanations truly represent the model's internal reasoning). Predictions are more accurate when explanations are plausible and faithful, and not when they are plausible but not faithful. Lastly, we show that, surprisingly, RRR metrics are not predictive of out-of-distribution model accuracy when controlling for a model's in-distribution accuracy, which calls into question the value of these metrics for evaluating model reasoning. All supporting code is available at this https URL

Comments:	NeurIPS 2022 (first two authors contributed equally)
Subjects:	Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)
Cite as:	arXiv:2206.11212 [cs.CV]
	(or arXiv:2206.11212v2 [cs.CV] for this version)

Submission history

From: Peter Hase [view email]
[v1] Wed, 22 Jun 2022 17:02:01 GMT (1535kb,D)
[v2] Tue, 25 Oct 2022 19:25:54 GMT (2934kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> cs > arXiv:2206.11212

Download:

Current browse context:

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

Computer Science > Computer Vision and Pattern Recognition

Title: VisFIS: Visual Feature Importance Supervision with Right-for-the-Right-Reason Objectives

Submission history