Voice Conversion for Whispered Speech Synthesis

Cotescu, Marius; Drugman, Thomas; Huybrechts, Goeric; Lorenzo-Trueba, Jaime; Moinet, Alexis

doi:10.1109/LSP.2019.2961213

Full-text links:

Download:

Current browse context:

cs.SD

< prev | next >

new | recent | 1912

Computer Science > Sound

Title: Voice Conversion for Whispered Speech Synthesis

Authors: Marius Cotescu, Thomas Drugman, Goeric Huybrechts, Jaime Lorenzo-Trueba, Alexis Moinet

(Submitted on 11 Dec 2019 (v1), last revised 17 Jan 2020 (this version, v2))

Abstract: We present an approach to synthesize whisper by applying a handcrafted signal processing recipe and Voice Conversion (VC) techniques to convert normally phonated speech to whispered speech. We investigate using Gaussian Mixture Models (GMM) and Deep Neural Networks (DNN) to model the mapping between acoustic features of normal speech and those of whispered speech. We evaluate naturalness and speaker similarity of the converted whisper on an internal corpus and on the publicly available wTIMIT corpus. We show that applying VC techniques is significantly better than using rule-based signal processing methods and it achieves results that are indistinguishable from copy-synthesis of natural whisper recordings. We investigate the ability of the DNN model to generalize on unseen speakers, when trained with data from multiple speakers. We show that excluding the target speaker from the training set has little or no impact on the perceived naturalness and speaker similarity of the converted whisper. The proposed DNN method is used in the newly released Whisper Mode of Amazon Alexa.

Comments:	Submitted to IEEE Signal Processing Letters
Subjects:	Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
DOI:	10.1109/LSP.2019.2961213
Cite as:	arXiv:1912.05289 [cs.SD]
	(or arXiv:1912.05289v2 [cs.SD] for this version)

Submission history

From: Marius Cotescu [view email]
[v1] Wed, 11 Dec 2019 13:34:43 GMT (1001kb,D)
[v2] Fri, 17 Jan 2020 20:43:49 GMT (84kb)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> cs > arXiv:1912.05289v2

Download:

Current browse context:

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

Computer Science > Sound

Title: Voice Conversion for Whispered Speech Synthesis

Submission history