Automatic Pharma News Categorization

Adaszewski, Stanislaw; Kuner, Pascal; Jaeger, Ralf J.

Full-text links:

Download:

Current browse context:

cs.IR

< prev | next >

new | recent | 2201

Computer Science > Information Retrieval

Title: Automatic Pharma News Categorization

Authors: Stanislaw Adaszewski, Pascal Kuner, Ralf J. Jaeger

(Submitted on 28 Dec 2021)

Abstract: We use a text dataset consisting of 23 news categories relevant to pharma information science, in order to compare the fine-tuning performance of multiple transformer models in a classification task. Using a well-balanced dataset with multiple autoregressive and autocoding transformation models, we compare their fine-tuning performance. To validate the winning approach, we perform diagnostics of model behavior on mispredicted instances, including inspection of category-wise metrics, evaluation of prediction certainty and assessment of latent space representations. Lastly, we propose an ensemble model consisting of the top performing individual predictors and demonstrate that this approach offers a modest improvement in the F1 metric.

Comments:	5 pages, 1 figure, 9 pages appendix
Subjects:	Information Retrieval (cs.IR); Machine Learning (cs.LG)
Cite as:	arXiv:2201.00688 [cs.IR]
	(or arXiv:2201.00688v1 [cs.IR] for this version)

Submission history

From: Stanislaw Adaszewski [view email]
[v1] Tue, 28 Dec 2021 08:42:16 GMT (870kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> cs > arXiv:2201.00688

Download:

Current browse context:

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

Computer Science > Information Retrieval

Title: Automatic Pharma News Categorization

Submission history