Analysing the Noise Model Error for Realistic Noisy Label Data

Hedderich, Michael A.; Zhu, Dawei; Klakow, Dietrich

Full-text links:

Download:

Current browse context:

cs.LG

< prev | next >

new | recent | 2101

Computer Science > Machine Learning

Title: Analysing the Noise Model Error for Realistic Noisy Label Data

Authors: Michael A. Hedderich, Dawei Zhu, Dietrich Klakow

(Submitted on 24 Jan 2021 (v1), last revised 1 Mar 2021 (this version, v2))

Abstract: Distant and weak supervision allow to obtain large amounts of labeled training data quickly and cheaply, but these automatic annotations tend to contain a high amount of errors. A popular technique to overcome the negative effects of these noisy labels is noise modelling where the underlying noise process is modelled. In this work, we study the quality of these estimated noise models from the theoretical side by deriving the expected error of the noise model. Apart from evaluating the theoretical results on commonly used synthetic noise, we also publish NoisyNER, a new noisy label dataset from the NLP domain that was obtained through a realistic distant supervision technique. It provides seven sets of labels with differing noise patterns to evaluate different noise levels on the same instances. Parallel, clean labels are available making it possible to study scenarios where a small amount of gold-standard data can be leveraged. Our theoretical results and the corresponding experiments give insights into the factors that influence the noise model estimation like the noise distribution and the sampling technique.

Comments:	Accepted at AAAI 2021, additional material at this https URL
Subjects:	Machine Learning (cs.LG); Computation and Language (cs.CL); Machine Learning (stat.ML)
Cite as:	arXiv:2101.09763 [cs.LG]
	(or arXiv:2101.09763v2 [cs.LG] for this version)

Submission history

From: Michael A. Hedderich [view email]
[v1] Sun, 24 Jan 2021 17:45:15 GMT (562kb,D)
[v2] Mon, 1 Mar 2021 11:14:54 GMT (562kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> cs > arXiv:2101.09763

Download:

Current browse context:

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

Computer Science > Machine Learning

Title: Analysing the Noise Model Error for Realistic Noisy Label Data

Submission history