Oracle Teacher: Leveraging Target Information for Better Knowledge Distillation of CTC Models

Yoon, Ji Won; Kim, Hyung Yong; Lee, Hyeonseung; Ahn, Sunghwan; Kim, Nam Soo

doi:10.1109/TASLP.2023.3297955

Full-text links:

Download:

Current browse context:

cs.LG

< prev | next >

new | recent | 2111

Computer Science > Machine Learning

Title: Oracle Teacher: Leveraging Target Information for Better Knowledge Distillation of CTC Models

Authors: Ji Won Yoon, Hyung Yong Kim, Hyeonseung Lee, Sunghwan Ahn, Nam Soo Kim

(Submitted on 5 Nov 2021 (v1), last revised 11 Aug 2023 (this version, v4))

Abstract: Knowledge distillation (KD), best known as an effective method for model compression, aims at transferring the knowledge of a bigger network (teacher) to a much smaller network (student). Conventional KD methods usually employ the teacher model trained in a supervised manner, where output labels are treated only as targets. Extending this supervised scheme further, we introduce a new type of teacher model for connectionist temporal classification (CTC)-based sequence models, namely Oracle Teacher, that leverages both the source inputs and the output labels as the teacher model's input. Since the Oracle Teacher learns a more accurate CTC alignment by referring to the target information, it can provide the student with more optimal guidance. One potential risk for the proposed approach is a trivial solution that the model's output directly copies the target input. Based on a many-to-one mapping property of the CTC algorithm, we present a training strategy that can effectively prevent the trivial solution and thus enables utilizing both source and target inputs for model training. Extensive experiments are conducted on two sequence learning tasks: speech recognition and scene text recognition. From the experimental results, we empirically show that the proposed model improves the students across these tasks while achieving a considerable speed-up in the teacher model's training time.

Comments:	Accepted by IEEE/ACM Transactions on Audio, Speech and Language Processing
Subjects:	Machine Learning (cs.LG); Audio and Speech Processing (eess.AS); Image and Video Processing (eess.IV)
DOI:	10.1109/TASLP.2023.3297955
Cite as:	arXiv:2111.03664 [cs.LG]
	(or arXiv:2111.03664v4 [cs.LG] for this version)

Submission history

From: Ji Won Yoon [view email]
[v1] Fri, 5 Nov 2021 14:14:05 GMT (466kb,D)
[v2] Tue, 1 Feb 2022 16:21:44 GMT (323kb,D)
[v3] Mon, 24 Oct 2022 14:28:29 GMT (949kb,D)
[v4] Fri, 11 Aug 2023 16:15:45 GMT (2446kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> cs > arXiv:2111.03664

Download:

Current browse context:

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

Computer Science > Machine Learning

Title: Oracle Teacher: Leveraging Target Information for Better Knowledge Distillation of CTC Models

Submission history