MediumVC: Any-to-any voice conversion using synthetic specific-speaker speeches as intermedium features

Gu, Yewei; Zhang, Zhenyu; Yi, Xiaowei; Zhao, Xianfeng

Full-text links:

Download:

Current browse context:

eess

< prev | next >

new | recent | 2110

Electrical Engineering and Systems Science > Audio and Speech Processing

Title: MediumVC: Any-to-any voice conversion using synthetic specific-speaker speeches as intermedium features

Authors: Yewei Gu, Zhenyu Zhang, Xiaowei Yi, Xianfeng Zhao

(Submitted on 6 Oct 2021)

Abstract: To realize any-to-any (A2A) voice conversion (VC), most methods are to perform symmetric self-supervised reconstruction tasks (Xi to Xi), which usually results in inefficient performances due to inadequate feature decoupling, especially for unseen speakers. We propose a two-stage reconstruction task (Xi to Yi to Xi) using synthetic specific-speaker speeches as intermedium features, where A2A VC is divided into two stages: any-to-one (A2O) and one-to-Any (O2A). In the A2O stage, we propose a new A2O method: SingleVC, by employing a noval data augment strategy(pitch-shifted and duration-remained, PSDR) to accomplish Xi to Yi. In the O2A stage, MediumVC is proposed based on pre-trained SingleVC to conduct Yi to Xi. Through such asymmetrical reconstruction tasks (Xi to Yi in SingleVC and Yi to Xi in MediumVC), the models are to capture robust disentangled features purposefully. Experiments indicate MediumVC can enhance the similarity of converted speeches while maintaining a high degree of naturalness.

Comments:	5 pages, 2 figures
Subjects:	Audio and Speech Processing (eess.AS)
Cite as:	arXiv:2110.02500 [eess.AS]
	(or arXiv:2110.02500v1 [eess.AS] for this version)

Submission history

From: Yewei Gu [view email]
[v1] Wed, 6 Oct 2021 04:29:31 GMT (52kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> eess > arXiv:2110.02500

Download:

Current browse context:

Change to browse by:

References & Citations

Bookmark

Electrical Engineering and Systems Science > Audio and Speech Processing

Title: MediumVC: Any-to-any voice conversion using synthetic specific-speaker speeches as intermedium features

Submission history