Learn Spelling from Teachers: Transferring Knowledge from Language Models to Sequence-to-Sequence Speech Recognition

Bai, Ye; Yi, Jiangyan; Tao, Jianhua; Tian, Zhengkun; Wen, Zhengqi

Full-text links:

Download:

Current browse context:

eess.AS

< prev | next >

new | recent | 1907

Electrical Engineering and Systems Science > Audio and Speech Processing

Title: Learn Spelling from Teachers: Transferring Knowledge from Language Models to Sequence-to-Sequence Speech Recognition

Authors: Ye Bai, Jiangyan Yi, Jianhua Tao, Zhengkun Tian, Zhengqi Wen

(Submitted on 13 Jul 2019)

Abstract: Integrating an external language model into a sequence-to-sequence speech recognition system is non-trivial. Previous works utilize linear interpolation or a fusion network to integrate external language models. However, these approaches introduce external components, and increase decoding computation. In this paper, we instead propose a knowledge distillation based training approach to integrating external language models into a sequence-to-sequence model. A recurrent neural network language model, which is trained on large scale external text, generates soft labels to guide the sequence-to-sequence model training. Thus, the language model plays the role of the teacher. This approach does not add any external component to the sequence-to-sequence model during testing. And this approach is flexible to be combined with shallow fusion technique together for decoding. The experiments are conducted on public Chinese datasets AISHELL-1 and CLMAD. Our approach achieves a character error rate of 9.3%, which is relatively reduced by 18.42% compared with the vanilla sequence-to-sequence model.

Comments:	5 pages, 3 figures, accepted by INTERSPEECH 2019
Subjects:	Audio and Speech Processing (eess.AS); Computation and Language (cs.CL)
Cite as:	arXiv:1907.06017 [eess.AS]
	(or arXiv:1907.06017v1 [eess.AS] for this version)

Submission history

From: Ye Bai [view email]
[v1] Sat, 13 Jul 2019 06:27:24 GMT (619kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> eess > arXiv:1907.06017

Download:

Current browse context:

Change to browse by:

References & Citations

Bookmark

Electrical Engineering and Systems Science > Audio and Speech Processing

Title: Learn Spelling from Teachers: Transferring Knowledge from Language Models to Sequence-to-Sequence Speech Recognition

Submission history