We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

cs.CL

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo

Computer Science > Computation and Language

Title: Memorisation versus Generalisation in Pre-trained Language Models

Abstract: State-of-the-art pre-trained language models have been shown to memorise facts and perform well with limited amounts of training data. To gain a better understanding of how these models learn, we study their generalisation and memorisation capabilities in noisy and low-resource scenarios. We find that the training of these models is almost unaffected by label noise and that it is possible to reach near-optimal results even on extremely noisy datasets. However, our experiments also show that they mainly learn from high-frequency patterns and largely fail when tested on low-resource tasks such as few-shot learning and rare entity recognition. To mitigate such limitations, we propose an extension based on prototypical networks that improves performance in low-resource named entity recognition tasks.
Comments: 15 pages, 25 figures. To be published in ACL2022
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)
Cite as: arXiv:2105.00828 [cs.CL]
  (or arXiv:2105.00828v2 [cs.CL] for this version)

Submission history

From: Michael Tänzer Mr [view email]
[v1] Fri, 16 Apr 2021 18:53:19 GMT (2215kb,D)
[v2] Tue, 15 Mar 2022 01:14:16 GMT (2313kb,D)

Link back to: arXiv, form interface, contact.