We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

cs.CL

Change to browse by:

cs

References & Citations

DBLP - CS Bibliography

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo ScienceWISE logo

Computer Science > Computation and Language

Title: Learning with Noisy Labels by Targeted Relabeling

Abstract: Crowdsourcing platforms are often used to collect datasets for training deep neural networks, despite higher levels of inaccurate labeling compared to expert labeling. There are two common strategies to manage the impact of this noise, the first involves aggregating redundant annotations, but comes at the expense of labeling substantially fewer examples. Secondly, prior works have also considered using the entire annotation budget to label as many examples as possible and subsequently apply denoising algorithms to implicitly clean up the dataset. We propose an approach which instead reserves a fraction of annotations to explicitly relabel highly probable labeling errors. In particular, we allocate a large portion of the labeling budget to form an initial dataset used to train a model. This model is then used to identify specific examples that appear most likely to be incorrect, which we spend the remaining budget to relabel. Experiments across three model variations and four natural language processing tasks show our approach outperforms both label aggregation and advanced denoising methods designed to handle noisy labels when allocated the same annotation budget.
Comments: 14 pages, 5 figures
Subjects: Computation and Language (cs.CL)
Cite as: arXiv:2110.08355 [cs.CL]
  (or arXiv:2110.08355v1 [cs.CL] for this version)

Submission history

From: Derek Chen [view email]
[v1] Fri, 15 Oct 2021 20:37:29 GMT (418kb,D)

Link back to: arXiv, form interface, contact.