We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

cs.CL

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo

Computer Science > Computation and Language

Title: Towards Robustness to Label Noise in Text Classification via Noise Modeling

Abstract: Large datasets in NLP suffer from noisy labels, due to erroneous automatic and human annotation procedures. We study the problem of text classification with label noise, and aim to capture this noise through an auxiliary noise model over the classifier. We first assign a probability score to each training sample of having a noisy label, through a beta mixture model fitted on the losses at an early epoch of training. Then, we use this score to selectively guide the learning of the noise model and classifier. Our empirical evaluation on two text classification tasks shows that our approach can improve over the baseline accuracy, and prevent over-fitting to the noise.
Comments: Accepted at CIKM'21 (30th ACM International Conference on Information & Knowledge Management). Accepted at ICLR 2021 RobustML and S2D-OLAD Workshops
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)
DOI: 10.1145/3459637.3482204
Cite as: arXiv:2101.11214 [cs.CL]
  (or arXiv:2101.11214v3 [cs.CL] for this version)

Submission history

From: Goutham Ramakrishnan [view email]
[v1] Wed, 27 Jan 2021 05:41:57 GMT (7783kb,D)
[v2] Thu, 22 Apr 2021 02:48:02 GMT (1124kb,D)
[v3] Sun, 7 Nov 2021 23:35:19 GMT (659kb,D)

Link back to: arXiv, form interface, contact.