Video Question Answering on Screencast Tutorials

Zhao, Wentian; Kim, Seokhwan; Xu, Ning; Jin, Hailin

doi:10.24963/ijcai.2020/148

Full-text links:

Download:

Current browse context:

cs.CV

< prev | next >

new | recent | 2008

Computer Science > Computation and Language

Title: Video Question Answering on Screencast Tutorials

Authors: Wentian Zhao, Seokhwan Kim, Ning Xu, Hailin Jin

(Submitted on 2 Aug 2020)

Abstract: This paper presents a new video question answering task on screencast tutorials. We introduce a dataset including question, answer and context triples from the tutorial videos for a software. Unlike other video question answering works, all the answers in our dataset are grounded to the domain knowledge base. An one-shot recognition algorithm is designed to extract the visual cues, which helps enhance the performance of video question answering. We also propose several baseline neural network architectures based on various aspects of video contexts from the dataset. The experimental results demonstrate that our proposed models significantly improve the question answering performances by incorporating multi-modal contexts and domain knowledge.

Subjects:	Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
DOI:	10.24963/ijcai.2020/148
Cite as:	arXiv:2008.00544 [cs.CL]
	(or arXiv:2008.00544v1 [cs.CL] for this version)

Submission history

From: Wentian Zhao [view email]
[v1] Sun, 2 Aug 2020 19:27:42 GMT (5424kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> cs > arXiv:2008.00544

Download:

Current browse context:

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

Computer Science > Computation and Language

Title: Video Question Answering on Screencast Tutorials

Submission history