A survey on VQA_Datasets and Approaches

Zou, Yeyun; Xie, Qiyu

doi:10.1109/ITCA52113.2020.00069

Full-text links:

Download:

PDF only

Current browse context:

cs.CV

< prev | next >

new | recent | 2105

Computer Science > Computer Vision and Pattern Recognition

Title: A survey on VQA_Datasets and Approaches

Authors: Yeyun Zou, Qiyu Xie

(Submitted on 2 May 2021)

Abstract: Visual question answering (VQA) is a task that combines both the techniques of computer vision and natural language processing. It requires models to answer a text-based question according to the information contained in a visual. In recent years, the research field of VQA has been expanded. Research that focuses on the VQA, examining the reasoning ability and VQA on scientific diagrams, has also been explored more. Meanwhile, more multimodal feature fusion mechanisms have been proposed. This paper will review and analyze existing datasets, metrics, and models proposed for the VQA task.

Comments:	10 pages
Subjects:	Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
DOI:	10.1109/ITCA52113.2020.00069
Cite as:	arXiv:2105.00421 [cs.CV]
	(or arXiv:2105.00421v1 [cs.CV] for this version)

Submission history

From: Yeyun Zou [view email]
[v1] Sun, 2 May 2021 08:50:30 GMT (283kb)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> cs > arXiv:2105.00421

Download:

Current browse context:

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

Computer Science > Computer Vision and Pattern Recognition

Title: A survey on VQA_Datasets and Approaches

Submission history