Which Student is Best? A Comprehensive Knowledge Distillation Exam for Task-Specific BERT Models

Nityasya, Made Nindyatama; Wibowo, Haryo Akbarianto; Chevi, Rendi; Prasojo, Radityo Eko; Aji, Alham Fikri

Full-text links:

Download:

Current browse context:

cs.CL

< prev | next >

new | recent | 2201

Change to browse by:

Computer Science > Computation and Language

Title: Which Student is Best? A Comprehensive Knowledge Distillation Exam for Task-Specific BERT Models

Authors: Made Nindyatama Nityasya, Haryo Akbarianto Wibowo, Rendi Chevi, Radityo Eko Prasojo, Alham Fikri Aji

(Submitted on 3 Jan 2022)

Abstract: We perform knowledge distillation (KD) benchmark from task-specific BERT-base teacher models to various student models: BiLSTM, CNN, BERT-Tiny, BERT-Mini, and BERT-Small. Our experiment involves 12 datasets grouped in two tasks: text classification and sequence labeling in the Indonesian language. We also compare various aspects of distillations including the usage of word embeddings and unlabeled data augmentation. Our experiments show that, despite the rising popularity of Transformer-based models, using BiLSTM and CNN student models provide the best trade-off between performance and computational resource (CPU, RAM, and storage) compared to pruned BERT models. We further propose some quick wins on performing KD to produce small NLP models via efficient KD training mechanisms involving simple choices of loss functions, word embeddings, and unlabeled data preparation.

Comments:	14 pages, 3 figures, submitted to Elsevier
Subjects:	Computation and Language (cs.CL)
MSC classes:	68T50
ACM classes:	I.2.7; I.2.6
Cite as:	arXiv:2201.00558 [cs.CL]
	(or arXiv:2201.00558v1 [cs.CL] for this version)

Submission history

From: Made Nindyatama Nityasya [view email]
[v1] Mon, 3 Jan 2022 10:07:13 GMT (138kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> cs > arXiv:2201.00558

Download:

Current browse context:

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

Computer Science > Computation and Language

Title: Which Student is Best? A Comprehensive Knowledge Distillation Exam for Task-Specific BERT Models

Submission history