Development of an Extractive Clinical Question Answering Dataset with Multi-Answer and Multi-Focus Questions

Moon, Sungrim; He, Huan; Liu, Hongfang; Fan, Jungwei W.

doi:10.2196/41818

Full-text links:

Download:

PDF only

Current browse context:

cs.CL

< prev | next >

new | recent | 2201

Change to browse by:

Computer Science > Computation and Language

Title: Development of an Extractive Clinical Question Answering Dataset with Multi-Answer and Multi-Focus Questions

Authors: Sungrim Moon, Huan He, Hongfang Liu, Jungwei W. Fan

(Submitted on 7 Jan 2022 (v1), last revised 10 Feb 2023 (this version, v2))

Abstract: Background: Extractive question-answering (EQA) is a useful natural language processing (NLP) application for answering patient-specific questions by locating answers in their clinical notes. Realistic clinical EQA can have multiple answers to a single question and multiple focus points in one question, which are lacking in the existing datasets for development of artificial intelligence solutions. Objective: Create a dataset for developing and evaluating clinical EQA systems that can handle natural multi-answer and multi-focus questions. Methods: We leveraged the annotated relations from the 2018 National NLP Clinical Challenges (n2c2) corpus to generate an EQA dataset. Specifically, the 1-to-N, M-to-1, and M-to-N drug-reason relations were included to form the multi-answer and multi-focus QA entries, which represent more complex and natural challenges in addition to the basic one-drug-one-reason cases. A baseline solution was developed and tested on the dataset. Results: The derived RxWhyQA dataset contains 96,939 QA entries. Among the answerable questions, 25% require multiple answers, and 2% ask about multiple drugs within one question. There are frequent cues observed around the answers in the text, and 90% of the drug and reason terms occur within the same or an adjacent sentence. The baseline EQA solution achieved a best f1-measure of 0.72 on the entire dataset, and on specific subsets, it was: 0.93 on the unanswerable questions, 0.48 on single-drug questions versus 0.60 on multi-drug questions, 0.54 on the single-answer questions versus 0.43 on multi-answer questions. Discussion: The RxWhyQA dataset can be used to train and evaluate systems that need to handle multi-answer and multi-focus questions. Specifically, multi-answer EQA appears to be challenging and therefore warrants more investment in research.

Comments:	2 tables, 5 figures
Subjects:	Computation and Language (cs.CL)
Journal reference:	JMIR AI 2023;2:e41818
DOI:	10.2196/41818
Cite as:	arXiv:2201.02517 [cs.CL]
	(or arXiv:2201.02517v2 [cs.CL] for this version)

Submission history

From: Jungwei Fan [view email]
[v1] Fri, 7 Jan 2022 15:58:58 GMT (178kb)
[v2] Fri, 10 Feb 2023 03:10:34 GMT (737kb)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> cs > arXiv:2201.02517

Download:

Current browse context:

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

Computer Science > Computation and Language

Title: Development of an Extractive Clinical Question Answering Dataset with Multi-Answer and Multi-Focus Questions

Submission history