Benchmark datasets driving artificial intelligence development fail to capture the needs of medical professionals

Blagec, Kathrin; Kraiger, Jakob; Frühwirt, Wolfgang; Samwald, Matthias

doi:10.1016/j.jbi.2022.104274

Full-text links:

Download:

PDF only

Current browse context:

cs.AI

< prev | next >

new | recent | 2201

Change to browse by:

Computer Science > Artificial Intelligence

Title: Benchmark datasets driving artificial intelligence development fail to capture the needs of medical professionals

Authors: Kathrin Blagec, Jakob Kraiger, Wolfgang Frühwirt, Matthias Samwald

(Submitted on 18 Jan 2022 (v1), last revised 12 May 2022 (this version, v2))

Abstract: Publicly accessible benchmarks that allow for assessing and comparing model performances are important drivers of progress in artificial intelligence (AI). While recent advances in AI capabilities hold the potential to transform medical practice by assisting and augmenting the cognitive processes of healthcare professionals, the coverage of clinically relevant tasks by AI benchmarks is largely unclear. Furthermore, there is a lack of systematized meta-information that allows clinical AI researchers to quickly determine accessibility, scope, content and other characteristics of datasets and benchmark datasets relevant to the clinical domain.
To address these issues, we curated and released a comprehensive catalogue of datasets and benchmarks pertaining to the broad domain of clinical and biomedical natural language processing (NLP), based on a systematic review of literature and online resources. A total of 450 NLP datasets were manually systematized and annotated with rich metadata, such as targeted tasks, clinical applicability, data types, performance metrics, accessibility and licensing information, and availability of data splits. We then compared tasks covered by AI benchmark datasets with relevant tasks that medical practitioners reported as highly desirable targets for automation in a previous empirical study.
Our analysis indicates that AI benchmarks of direct clinical relevance are scarce and fail to cover most work activities that clinicians want to see addressed. In particular, tasks associated with routine documentation and patient data administration workflows are not represented despite significant associated workloads. Thus, currently available AI benchmarks are improperly aligned with desired targets for AI automation in clinical settings, and novel benchmarks should be created to fill these gaps.

Comments:	(this version extends the literature references)
Subjects:	Artificial Intelligence (cs.AI)
Journal reference:	Journal of Bioinformatics, January 2023
DOI:	10.1016/j.jbi.2022.104274
Cite as:	arXiv:2201.07040 [cs.AI]
	(or arXiv:2201.07040v2 [cs.AI] for this version)

Submission history

From: Matthias Samwald [view email]
[v1] Tue, 18 Jan 2022 15:05:28 GMT (641kb)
[v2] Thu, 12 May 2022 13:25:37 GMT (668kb)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> cs > arXiv:2201.07040

Download:

Current browse context:

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

Computer Science > Artificial Intelligence

Title: Benchmark datasets driving artificial intelligence development fail to capture the needs of medical professionals

Submission history