We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:


Current browse context:


Change to browse by:

References & Citations

DBLP - CS Bibliography


(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo ScienceWISE logo

Computer Science > Machine Learning

Title: One Student Knows All Experts Know: From Sparse to Dense

Abstract: Human education system trains one student by multiple experts. Mixture-of-experts (MoE) is a powerful sparse architecture including multiple experts. However, sparse MoE model is hard to implement, easy to overfit, and not hardware-friendly. In this work, inspired by human education model, we propose a novel task, knowledge integration, to obtain a dense student model (OneS) as knowledgeable as one sparse MoE. We investigate this task by proposing a general training framework including knowledge gathering and knowledge distillation. Specifically, we first propose Singular Value Decomposition Knowledge Gathering (SVD-KG) to gather key knowledge from different pretrained experts. We then refine the dense student model by knowledge distillation to offset the noise from gathering. On ImageNet, our OneS preserves $61.7\%$ benefits from MoE. OneS can achieve $78.4\%$ top-1 accuracy with only $15$M parameters. On four natural language processing datasets, OneS obtains $88.2\%$ MoE benefits and outperforms SoTA by $51.7\%$ using the same architecture and training data. In addition, compared with the MoE counterpart, OneS can achieve $3.7 \times$ inference speedup due to the hardware-friendly architecture.
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV)
Cite as: arXiv:2201.10890 [cs.LG]
  (or arXiv:2201.10890v3 [cs.LG] for this version)

Submission history

From: Fuzhao Xue [view email]
[v1] Wed, 26 Jan 2022 12:11:02 GMT (737kb,D)
[v2] Wed, 16 Feb 2022 05:57:31 GMT (740kb,D)
[v3] Sat, 21 May 2022 05:44:41 GMT (1801kb,D)

Link back to: arXiv, form interface, contact.