Towards Robustness to Label Noise in Text Classification via Noise Modeling

Garg, Siddhant; Ramakrishnan, Goutham; Thumbe, Varun

doi:10.1145/3459637.3482204

Full-text links:

Download:

Current browse context:

cs.CL

< prev | next >

new | recent | 2101

Computer Science > Computation and Language

Title: Towards Robustness to Label Noise in Text Classification via Noise Modeling

Authors: Siddhant Garg, Goutham Ramakrishnan, Varun Thumbe

(Submitted on 27 Jan 2021 (v1), last revised 7 Nov 2021 (this version, v3))

Abstract: Large datasets in NLP suffer from noisy labels, due to erroneous automatic and human annotation procedures. We study the problem of text classification with label noise, and aim to capture this noise through an auxiliary noise model over the classifier. We first assign a probability score to each training sample of having a noisy label, through a beta mixture model fitted on the losses at an early epoch of training. Then, we use this score to selectively guide the learning of the noise model and classifier. Our empirical evaluation on two text classification tasks shows that our approach can improve over the baseline accuracy, and prevent over-fitting to the noise.

Comments:	Accepted at CIKM'21 (30th ACM International Conference on Information & Knowledge Management). Accepted at ICLR 2021 RobustML and S2D-OLAD Workshops
Subjects:	Computation and Language (cs.CL); Machine Learning (cs.LG)
DOI:	10.1145/3459637.3482204
Cite as:	arXiv:2101.11214 [cs.CL]
	(or arXiv:2101.11214v3 [cs.CL] for this version)

Submission history

From: Goutham Ramakrishnan [view email]
[v1] Wed, 27 Jan 2021 05:41:57 GMT (7783kb,D)
[v2] Thu, 22 Apr 2021 02:48:02 GMT (1124kb,D)
[v3] Sun, 7 Nov 2021 23:35:19 GMT (659kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> cs > arXiv:2101.11214

Download:

Current browse context:

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

Computer Science > Computation and Language

Title: Towards Robustness to Label Noise in Text Classification via Noise Modeling

Submission history