We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

cs.CL

Change to browse by:

cs

References & Citations

DBLP - CS Bibliography

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo

Computer Science > Computation and Language

Title: Enhancing lexical-based approach with external knowledge for Vietnamese multiple-choice machine reading comprehension

Abstract: Although Vietnamese is the 17th most popular native-speaker language in the world, there are not many research studies on Vietnamese machine reading comprehension (MRC), the task of understanding a text and answering questions about it. One of the reasons is because of the lack of high-quality benchmark datasets for this task. In this work, we construct a dataset which consists of 2,783 pairs of multiple-choice questions and answers based on 417 Vietnamese texts which are commonly used for teaching reading comprehension for elementary school pupils. In addition, we propose a lexical-based MRC method that utilizes semantic similarity measures and external knowledge sources to analyze questions and extract answers from the given text. We compare the performance of the proposed model with several baseline lexical-based and neural network-based models. Our proposed method achieves 61.81% by accuracy, which is 5.51% higher than the best baseline model. We also measure human performance on our dataset and find that there is a big gap between machine-model and human performances. This indicates that significant progress can be made on this task. The dataset is freely available on our website for research purposes.
Subjects: Computation and Language (cs.CL)
Journal reference: IEEE Access, 2020
DOI: 10.1109/ACCESS.2020.3035701
Cite as: arXiv:2001.05687 [cs.CL]
  (or arXiv:2001.05687v5 [cs.CL] for this version)

Submission history

From: Kiet Nguyen [view email]
[v1] Thu, 16 Jan 2020 08:09:51 GMT (204kb,D)
[v2] Tue, 10 Mar 2020 10:07:39 GMT (235kb,D)
[v3] Fri, 15 May 2020 03:45:33 GMT (271kb,D)
[v4] Tue, 19 May 2020 10:02:23 GMT (273kb,D)
[v5] Sun, 1 Nov 2020 16:04:33 GMT (13035kb,D)

Link back to: arXiv, form interface, contact.