Memorisation versus Generalisation in Pre-trained Language Models

Tänzer, Michael; Ruder, Sebastian; Rei, Marek

Full-text links:

Download:

Current browse context:

cs.CL

< prev | next >

new | recent | 2105

Computer Science > Computation and Language

Title: Memorisation versus Generalisation in Pre-trained Language Models

Authors: Michael Tänzer, Sebastian Ruder, Marek Rei

(Submitted on 16 Apr 2021 (v1), last revised 15 Mar 2022 (this version, v2))

Abstract: State-of-the-art pre-trained language models have been shown to memorise facts and perform well with limited amounts of training data. To gain a better understanding of how these models learn, we study their generalisation and memorisation capabilities in noisy and low-resource scenarios. We find that the training of these models is almost unaffected by label noise and that it is possible to reach near-optimal results even on extremely noisy datasets. However, our experiments also show that they mainly learn from high-frequency patterns and largely fail when tested on low-resource tasks such as few-shot learning and rare entity recognition. To mitigate such limitations, we propose an extension based on prototypical networks that improves performance in low-resource named entity recognition tasks.

Comments:	15 pages, 25 figures. To be published in ACL2022
Subjects:	Computation and Language (cs.CL); Machine Learning (cs.LG)
Cite as:	arXiv:2105.00828 [cs.CL]
	(or arXiv:2105.00828v2 [cs.CL] for this version)

Submission history

From: Michael Tänzer Mr [view email]
[v1] Fri, 16 Apr 2021 18:53:19 GMT (2215kb,D)
[v2] Tue, 15 Mar 2022 01:14:16 GMT (2313kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> cs > arXiv:2105.00828

Download:

Current browse context:

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

Computer Science > Computation and Language

Title: Memorisation versus Generalisation in Pre-trained Language Models

Submission history