Current browse context:
cs.CL
Change to browse by:
References & Citations
Computer Science > Computation and Language
Title: ZeroBERTo: Leveraging Zero-Shot Text Classification by Topic Modeling
(Submitted on 4 Jan 2022 (v1), last revised 4 Jun 2022 (this version, v3))
Abstract: Traditional text classification approaches often require a good amount of labeled data, which is difficult to obtain, especially in restricted domains or less widespread languages. This lack of labeled data has led to the rise of low-resource methods, that assume low data availability in natural language processing. Among them, zero-shot learning stands out, which consists of learning a classifier without any previously labeled data. The best results reported with this approach use language models such as Transformers, but fall into two problems: high execution time and inability to handle long texts as input. This paper proposes a new model, ZeroBERTo, which leverages an unsupervised clustering step to obtain a compressed data representation before the classification task. We show that ZeroBERTo has better performance for long inputs and shorter execution time, outperforming XLM-R by about 12% in the F1 score in the FolhaUOL dataset. Keywords: Low-Resource NLP, Unlabeled data, Zero-Shot Learning, Topic Modeling, Transformers.
Submission history
From: Thomas Palmeira Ferraz [view email][v1] Tue, 4 Jan 2022 20:08:17 GMT (366kb,D)
[v2] Thu, 27 Jan 2022 17:46:32 GMT (83kb,D)
[v3] Sat, 4 Jun 2022 21:02:16 GMT (63kb,D)
Link back to: arXiv, form interface, contact.