Interpretable Visual Reasoning via Induced Symbolic Space

Wang, Zhonghao; Yu, Mo; Wang, Kai; Xiong, Jinjun; Hwu, Wen-mei; Hasegawa-Johnson, Mark; Shi, Humphrey

Full-text links:

Download:

Current browse context:

cs.CV

< prev | next >

new | recent | 2011

Computer Science > Computer Vision and Pattern Recognition

Title: Interpretable Visual Reasoning via Induced Symbolic Space

Authors: Zhonghao Wang, Mo Yu, Kai Wang, Jinjun Xiong, Wen-mei Hwu, Mark Hasegawa-Johnson, Humphrey Shi

(Submitted on 23 Nov 2020 (this version), latest version 24 Aug 2021 (v2))

Abstract: We study the problem of concept induction in visual reasoning, i.e., identifying concepts and their hierarchical relationships from question-answer pairs associated with images; and achieve an interpretable model via working on the induced symbolic concept space. To this end, we first design a new framework named object-centric compositional attention model (OCCAM) to perform the visual reasoning task with object-level visual features. Then, we come up with a method to induce concepts of objects and relations using clues from the attention patterns between objects' visual features and question words. Finally, we achieve a higher level of interpretability by imposing OCCAM on the objects represented in the induced symbolic concept space. Experiments on the CLEVR dataset demonstrate: 1) our OCCAM achieves a new state of the art without human-annotated functional programs; 2) our induced concepts are both accurate and sufficient as OCCAM achieves an on-par performance on objects represented either in visual features or in the induced symbolic concept space.

Subjects:	Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)
Cite as:	arXiv:2011.11603 [cs.CV]
	(or arXiv:2011.11603v1 [cs.CV] for this version)

Submission history

From: Zhonghao Wang [view email]
[v1] Mon, 23 Nov 2020 18:21:49 GMT (5437kb,D)
[v2] Tue, 24 Aug 2021 13:55:14 GMT (4859kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> cs > arXiv:2011.11603v1

Download:

Current browse context:

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

Computer Science > Computer Vision and Pattern Recognition

Title: Interpretable Visual Reasoning via Induced Symbolic Space

Submission history