Multi-Target Emotional Voice Conversion With Neural Vocoders

Liu, Songxiang; Cao, Yuewen; Meng, Helen

Full-text links:

Download:

Current browse context:

eess.AS

< prev | next >

new | recent | 2004

Electrical Engineering and Systems Science > Audio and Speech Processing

Title: Multi-Target Emotional Voice Conversion With Neural Vocoders

Authors: Songxiang Liu, Yuewen Cao, Helen Meng

(Submitted on 8 Apr 2020)

Abstract: Emotional voice conversion (EVC) is one way to generate expressive synthetic speech. Previous approaches mainly focused on modeling one-to-one mapping, i.e., conversion from one emotional state to another emotional state, with Mel-cepstral vocoders. In this paper, we investigate building a multi-target EVC (MTEVC) architecture, which combines a deep bidirectional long-short term memory (DBLSTM)-based conversion model and a neural vocoder. Phonetic posteriorgrams (PPGs) containing rich linguistic information are incorporated into the conversion model as auxiliary input features, which boost the conversion performance. To leverage the advantages of the newly emerged neural vocoders, we investigate the conditional WaveNet and flow-based WaveNet (FloWaveNet) as speech generators. The vocoders take in additional speaker information and emotion information as auxiliary features and are trained with a multi-speaker and multi-emotion speech corpus. Objective metrics and subjective evaluation of the experimental results verify the efficacy of the proposed MTEVC architecture for EVC.

Comments:	7 pages
Subjects:	Audio and Speech Processing (eess.AS)
Cite as:	arXiv:2004.03782 [eess.AS]
	(or arXiv:2004.03782v1 [eess.AS] for this version)

Submission history

From: Songxiang Liu [view email]
[v1] Wed, 8 Apr 2020 03:00:27 GMT (506kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> eess > arXiv:2004.03782

Download:

Current browse context:

Change to browse by:

References & Citations

Bookmark

Electrical Engineering and Systems Science > Audio and Speech Processing

Title: Multi-Target Emotional Voice Conversion With Neural Vocoders

Submission history