Semi-supervised acoustic modelling for five-lingual code-switched ASR using automatically-segmented soap opera speech

Wilkinson, N.; Biswas, A.; Yılmaz, E.; de Wet, F.; van der Westhuizen, E.; Niesler, T. R.

Full-text links:

Download:

Current browse context:

eess.AS

< prev | next >

new | recent | 2004

Electrical Engineering and Systems Science > Audio and Speech Processing

Title: Semi-supervised acoustic modelling for five-lingual code-switched ASR using automatically-segmented soap opera speech

Authors: N. Wilkinson, A. Biswas, E. Yılmaz, F. de Wet, E. van der Westhuizen, T.R. Niesler

(Submitted on 8 Apr 2020)

Abstract: This paper considers the impact of automatic segmentation on the fully-automatic, semi-supervised training of automatic speech recognition (ASR) systems for five-lingual code-switched (CS) speech. Four automatic segmentation techniques were evaluated in terms of the recognition performance of an ASR system trained on the resulting segments in a semi-supervised manner. The system's output was compared with the recognition rates achieved by a semi-supervised system trained on manually assigned segments. Three of the automatic techniques use a newly proposed convolutional neural network (CNN) model for framewise classification, and include a novel form of HMM smoothing of the CNN outputs. Automatic segmentation was applied in combination with automatic speaker diarization. The best-performing segmentation technique was also tested without speaker diarization. An evaluation based on 248 unsegmented soap opera episodes indicated that voice activity detection (VAD) based on a CNN followed by Gaussian mixture modelhidden Markov model smoothing (CNN-GMM-HMM) yields the best ASR performance. The semi-supervised system trained with the resulting segments achieved an overall WER improvement of 1.1% absolute over the system trained with manually created segments. Furthermore, we found that system performance improved even further when the automatic segmentation was used in conjunction with speaker diarization.

Comments:	SLTU 2020. arXiv admin note: text overlap with arXiv:2003.03135
Subjects:	Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
Cite as:	arXiv:2004.06480 [eess.AS]
	(or arXiv:2004.06480v1 [eess.AS] for this version)

Submission history

From: Astik Biswas [view email]
[v1] Wed, 8 Apr 2020 04:36:25 GMT (1387kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> eess > arXiv:2004.06480

Download:

Current browse context:

Change to browse by:

References & Citations

Bookmark

Electrical Engineering and Systems Science > Audio and Speech Processing

Title: Semi-supervised acoustic modelling for five-lingual code-switched ASR using automatically-segmented soap opera speech

Submission history