Current browse context:
cs.CL
Change to browse by:
References & Citations
Computer Science > Computation and Language
Title: Inspecting state of the art performance and NLP metrics in image-based medical report generation
(Submitted on 18 Nov 2020 (v1), last revised 15 Jan 2022 (this version, v3))
Abstract: Several deep learning architectures have been proposed over the last years to deal with the problem of generating a written report given an imaging exam as input. Most works evaluate the generated reports using standard Natural Language Processing (NLP) metrics (e.g. BLEU, ROUGE), reporting significant progress. In this article, we contrast this progress by comparing state of the art (SOTA) models against weak baselines. We show that simple and even naive approaches yield near SOTA performance on most traditional NLP metrics. We conclude that evaluation methods in this task should be further studied towards correctly measuring clinical accuracy, ideally involving physicians to contribute to this end.
Submission history
From: Pablo Pino [view email][v1] Wed, 18 Nov 2020 13:09:12 GMT (140kb,D)
[v2] Sat, 21 Nov 2020 17:58:40 GMT (139kb,D)
[v3] Sat, 15 Jan 2022 06:05:51 GMT (143kb,D)
Link back to: arXiv, form interface, contact.