Disentanglement of Emotional Style and Speaker Identity for Expressive Voice Conversion

Du, Zongyang; Sisman, Berrak; Zhou, Kun; Li, Haizhou

Full-text links:

Download:

Current browse context:

eess.AS

< prev | next >

new | recent | 2110

Electrical Engineering and Systems Science > Audio and Speech Processing

Title: Disentanglement of Emotional Style and Speaker Identity for Expressive Voice Conversion

Authors: Zongyang Du, Berrak Sisman, Kun Zhou, Haizhou Li

(Submitted on 20 Oct 2021 (v1), last revised 21 Jul 2022 (this version, v2))

Abstract: Expressive voice conversion performs identity conversion for emotional speakers by jointly converting speaker identity and emotional style. Due to the hierarchical structure of speech emotion, it is challenging to disentangle the emotional style for different speakers. Inspired by the recent success of speaker disentanglement with variational autoencoder (VAE), we propose an any-to-any expressive voice conversion framework, that is called StyleVC. StyleVC is designed to disentangle linguistic content, speaker identity, pitch, and emotional style information. We study the use of style encoder to model emotional style explicitly. At run-time, StyleVC converts both speaker identity and emotional style for arbitrary speakers. Experiments validate the effectiveness of our proposed framework in both objective and subjective evaluations.

Comments:	Accepted by Interspeech 2022
Subjects:	Audio and Speech Processing (eess.AS); Sound (cs.SD)
Cite as:	arXiv:2110.10326 [eess.AS]
	(or arXiv:2110.10326v2 [eess.AS] for this version)

Submission history

From: Zongyang Du [view email]
[v1] Wed, 20 Oct 2021 00:49:02 GMT (786kb,D)
[v2] Thu, 21 Jul 2022 05:10:48 GMT (2919kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> eess > arXiv:2110.10326

Download:

Current browse context:

Change to browse by:

References & Citations

Bookmark

Electrical Engineering and Systems Science > Audio and Speech Processing

Title: Disentanglement of Emotional Style and Speaker Identity for Expressive Voice Conversion

Submission history