Attention-Guided Generative Adversarial Network for Whisper to Normal Speech Conversion

Gao, Teng; Zhou, Jian; Wang, Huabin; Tao, Liang; Kwan, Hon Keung

Full-text links:

Download:

Current browse context:

cs.SD

< prev | next >

new | recent | 2111

Computer Science > Sound

Title: Attention-Guided Generative Adversarial Network for Whisper to Normal Speech Conversion

Authors: Teng Gao, Jian Zhou, Huabin Wang, Liang Tao, Hon Keung Kwan

(Submitted on 2 Nov 2021)

Abstract: Whispered speech is a special way of pronunciation without using vocal cord vibration. A whispered speech does not contain a fundamental frequency, and its energy is about 20dB lower than that of a normal speech. Converting a whispered speech into a normal speech can improve speech quality and intelligibility. In this paper, a novel attention-guided generative adversarial network model incorporating an autoencoder, a Siamese neural network, and an identity mapping loss function for whisper to normal speech conversion (AGAN-W2SC) is proposed. The proposed method avoids the challenge of estimating the fundamental frequency of the normal voiced speech converted from a whispered speech. Specifically, the proposed model is more amendable to practical applications because it does not need to align speech features for training. Experimental results demonstrate that the proposed AGAN-W2SC can obtain improved speech quality and intelligibility compared with dynamic-time-warping-based methods.

Subjects:	Sound (cs.SD); Human-Computer Interaction (cs.HC); Audio and Speech Processing (eess.AS)
Cite as:	arXiv:2111.01342 [cs.SD]
	(or arXiv:2111.01342v1 [cs.SD] for this version)

Submission history

From: Jian Zhou [view email]
[v1] Tue, 2 Nov 2021 03:00:19 GMT (528kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> cs > arXiv:2111.01342

Download:

Current browse context:

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

Computer Science > Sound

Title: Attention-Guided Generative Adversarial Network for Whisper to Normal Speech Conversion

Submission history