Censer: Curriculum Semi-supervised Learning for Speech Recognition Based on Self-supervised Pre-training

Zhang, Bowen; Cao, Songjun; Zhang, Xiaoming; Zhang, Yike; Ma, Long; Shinozaki, Takahiro

Full-text links:

Download:

Current browse context:

cs.SD

< prev | next >

new | recent | 2206

Computer Science > Sound

Title: Censer: Curriculum Semi-supervised Learning for Speech Recognition Based on Self-supervised Pre-training

Authors: Bowen Zhang, Songjun Cao, Xiaoming Zhang, Yike Zhang, Long Ma, Takahiro Shinozaki

(Submitted on 16 Jun 2022 (v1), last revised 27 Jun 2022 (this version, v2))

Abstract: Recent studies have shown that the benefits provided by self-supervised pre-training and self-training (pseudo-labeling) are complementary. Semi-supervised fine-tuning strategies under the pre-training framework, however, remain insufficiently studied. Besides, modern semi-supervised speech recognition algorithms either treat unlabeled data indiscriminately or filter out noisy samples with a confidence threshold. The dissimilarities among different unlabeled data are often ignored. In this paper, we propose Censer, a semi-supervised speech recognition algorithm based on self-supervised pre-training to maximize the utilization of unlabeled data. The pre-training stage of Censer adopts wav2vec2.0 and the fine-tuning stage employs an improved semi-supervised learning algorithm from slimIPL, which leverages unlabeled data progressively according to their pseudo labels' qualities. We also incorporate a temporal pseudo label pool and an exponential moving average to control the pseudo labels' update frequency and to avoid model divergence. Experimental results on Libri-Light and LibriSpeech datasets manifest our proposed method achieves better performance compared to existing approaches while being more unified.

Subjects:	Sound (cs.SD); Audio and Speech Processing (eess.AS)
Cite as:	arXiv:2206.08189 [cs.SD]
	(or arXiv:2206.08189v2 [cs.SD] for this version)

Submission history

From: Bowen Zhang [view email]
[v1] Thu, 16 Jun 2022 14:02:20 GMT (1067kb,D)
[v2] Mon, 27 Jun 2022 09:30:43 GMT (1067kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> cs > arXiv:2206.08189

Download:

Current browse context:

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

Computer Science > Sound

Title: Censer: Curriculum Semi-supervised Learning for Speech Recognition Based on Self-supervised Pre-training

Submission history