Language Modelling via Learning to Rank

Frydenlund, Arvid; Singh, Gagandeep; Rudzicz, Frank

Full-text links:

Download:

Current browse context:

cs.CL

< prev | next >

new | recent | 2110

Computer Science > Computation and Language

Title: Language Modelling via Learning to Rank

Authors: Arvid Frydenlund, Gagandeep Singh, Frank Rudzicz

(Submitted on 13 Oct 2021 (v1), last revised 10 Dec 2021 (this version, v2))

Abstract: We consider language modelling (LM) as a multi-label structured prediction task by re-framing training from solely predicting a single ground-truth word to ranking a set of words which could continue a given context. To avoid annotating top-$k$ ranks, we generate them using pre-trained LMs: GPT-2, BERT, and Born-Again models. This leads to a rank-based form of knowledge distillation (KD). We also develop a method using $N$-grams to create a non-probabilistic teacher which generates the ranks without the need of a pre-trained LM.
We confirm the hypotheses that we can treat LMing as a ranking task and that we can do so without the use of a pre-trained LM. We show that rank-based KD generally improves perplexity (PPL), often with statistical significance, when compared to Kullback-Leibler-based KD. Surprisingly, given the simplicity of the method, $N$-grams act as competitive teachers and achieve similar performance as using either BERT or a Born-Again model teachers. GPT-2 always acts as the best teacher, though, and using it and a Transformer-XL student on Wiki-02, rank-based KD reduces a cross-entropy baseline from 65.27 to 55.94 and against a KL-based KD of 56.70.

Comments:	Accepted to AAAI22. Minor writing fixes
Subjects:	Computation and Language (cs.CL); Machine Learning (cs.LG)
ACM classes:	I.2.7; I.2.6
Cite as:	arXiv:2110.06961 [cs.CL]
	(or arXiv:2110.06961v2 [cs.CL] for this version)

Submission history

From: Arvid Frydenlund [view email]
[v1] Wed, 13 Oct 2021 18:03:47 GMT (227kb)
[v2] Fri, 10 Dec 2021 19:49:23 GMT (226kb)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> cs > arXiv:2110.06961

Download:

Current browse context:

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

Computer Science > Computation and Language

Title: Language Modelling via Learning to Rank

Submission history