We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:


Current browse context:


Change to browse by:


References & Citations

DBLP - CS Bibliography


(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo ScienceWISE logo

Computer Science > Computation and Language

Title: Improved Knowledge Distillation for Pre-trained Language Models via Knowledge Selection

Abstract: Knowledge distillation addresses the problem of transferring knowledge from a teacher model to a student model. In this process, we typically have multiple types of knowledge extracted from the teacher model. The problem is to make full use of them to train the student model. Our preliminary study shows that: (1) not all of the knowledge is necessary for learning a good student model, and (2) knowledge distillation can benefit from certain knowledge at different training steps. In response to these, we propose an actor-critic approach to selecting appropriate knowledge to transfer during the process of knowledge distillation. In addition, we offer a refinement of the training algorithm to ease the computational burden. Experimental results on the GLUE datasets show that our method outperforms several strong knowledge distillation baselines significantly.
Comments: accepted by EMNLP (Findings) 2022
Subjects: Computation and Language (cs.CL)
Cite as: arXiv:2302.00444 [cs.CL]
  (or arXiv:2302.00444v1 [cs.CL] for this version)

Submission history

From: Chenglong Wang [view email]
[v1] Wed, 1 Feb 2023 13:40:19 GMT (305kb,D)

Link back to: arXiv, form interface, contact.