LIQUID: A Framework for List Question Answering Dataset Generation

Lee, Seongyun; Kim, Hyunjae; Kang, Jaewoo

Full-text links:

Download:

Current browse context:

cs.CL

< prev | next >

new | recent | 2302

Change to browse by:

Computer Science > Computation and Language

Title: LIQUID: A Framework for List Question Answering Dataset Generation

Authors: Seongyun Lee, Hyunjae Kim, Jaewoo Kang

(Submitted on 3 Feb 2023 (v1), last revised 6 Feb 2023 (this version, v2))

Abstract: Question answering (QA) models often rely on large-scale training datasets, which necessitates the development of a data generation framework to reduce the cost of manual annotations. Although several recent studies have aimed to generate synthetic questions with single-span answers, no study has been conducted on the creation of list questions with multiple, non-contiguous spans as answers. To address this gap, we propose LIQUID, an automated framework for generating list QA datasets from unlabeled corpora. We first convert a passage from Wikipedia or PubMed into a summary and extract named entities from the summarized text as candidate answers. This allows us to select answers that are semantically correlated in context and is, therefore, suitable for constructing list questions. We then create questions using an off-the-shelf question generator with the extracted entities and original passage. Finally, iterative filtering and answer expansion are performed to ensure the accuracy and completeness of the answers. Using our synthetic data, we significantly improve the performance of the previous best list QA models by exact-match F1 scores of 5.0 on MultiSpanQA, 1.9 on Quoref, and 2.8 averaged across three BioASQ benchmarks.

Comments:	AAAI 2023
Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2302.01691 [cs.CL]
	(or arXiv:2302.01691v2 [cs.CL] for this version)

Submission history

From: Hyunjae Kim [view email]
[v1] Fri, 3 Feb 2023 12:42:45 GMT (393kb,D)
[v2] Mon, 6 Feb 2023 08:04:56 GMT (393kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> cs > arXiv:2302.01691

Download:

Current browse context:

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

Computer Science > Computation and Language

Title: LIQUID: A Framework for List Question Answering Dataset Generation

Submission history