A Cycle-GAN Approach to Model Natural Perturbations in Speech for ASR Applications

Dumpala, Sri Harsha; Sheikh, Imran; Chakraborty, Rupayan; Kopparapu, Sunil Kumar

Full-text links:

Download:

Current browse context:

eess

< prev | next >

new | recent | 1912

Electrical Engineering and Systems Science > Audio and Speech Processing

Title: A Cycle-GAN Approach to Model Natural Perturbations in Speech for ASR Applications

Authors: Sri Harsha Dumpala, Imran Sheikh, Rupayan Chakraborty, Sunil Kumar Kopparapu

(Submitted on 18 Dec 2019)

Abstract: Naturally introduced perturbations in audio signal, caused by emotional and physical states of the speaker, can significantly degrade the performance of Automatic Speech Recognition (ASR) systems. In this paper, we propose a front-end based on Cycle-Consistent Generative Adversarial Network (CycleGAN) which transforms naturally perturbed speech into normal speech, and hence improves the robustness of an ASR system. The CycleGAN model is trained on non-parallel examples of perturbed and normal speech. Experiments on spontaneous laughter-speech and creaky-speech datasets show that the performance of four different ASR systems improve by using speech obtained from CycleGAN based front-end, as compared to directly using the original perturbed speech. Visualization of the features of the laughter perturbed speech and those generated by the proposed front-end further demonstrates the effectiveness of our approach.

Comments:	7 pages, 3 figures, ICASSP-2019
Subjects:	Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
Cite as:	arXiv:1912.11151 [eess.AS]
	(or arXiv:1912.11151v1 [eess.AS] for this version)

Submission history

From: Rupayan Chakraborty [view email]
[v1] Wed, 18 Dec 2019 12:26:38 GMT (345kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> eess > arXiv:1912.11151

Download:

Current browse context:

Change to browse by:

References & Citations

Bookmark

Electrical Engineering and Systems Science > Audio and Speech Processing

Title: A Cycle-GAN Approach to Model Natural Perturbations in Speech for ASR Applications

Submission history