We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

cs.SD

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo ScienceWISE logo

Computer Science > Sound

Title: Attention-Guided Generative Adversarial Network for Whisper to Normal Speech Conversion

Abstract: Whispered speech is a special way of pronunciation without using vocal cord vibration. A whispered speech does not contain a fundamental frequency, and its energy is about 20dB lower than that of a normal speech. Converting a whispered speech into a normal speech can improve speech quality and intelligibility. In this paper, a novel attention-guided generative adversarial network model incorporating an autoencoder, a Siamese neural network, and an identity mapping loss function for whisper to normal speech conversion (AGAN-W2SC) is proposed. The proposed method avoids the challenge of estimating the fundamental frequency of the normal voiced speech converted from a whispered speech. Specifically, the proposed model is more amendable to practical applications because it does not need to align speech features for training. Experimental results demonstrate that the proposed AGAN-W2SC can obtain improved speech quality and intelligibility compared with dynamic-time-warping-based methods.
Subjects: Sound (cs.SD); Human-Computer Interaction (cs.HC); Audio and Speech Processing (eess.AS)
Cite as: arXiv:2111.01342 [cs.SD]
  (or arXiv:2111.01342v1 [cs.SD] for this version)

Submission history

From: Jian Zhou [view email]
[v1] Tue, 2 Nov 2021 03:00:19 GMT (528kb,D)

Link back to: arXiv, form interface, contact.