PPG-based singing voice conversion with adversarial representation learning

Li, Zhonghao; Tang, Benlai; Yin, Xiang; Wan, Yuan; Xu, Ling; Shen, Chen; Ma, Zejun

Full-text links:

Download:

Current browse context:

cs.SD

< prev | next >

new | recent | 2010

Computer Science > Sound

Title: PPG-based singing voice conversion with adversarial representation learning

Authors: Zhonghao Li, Benlai Tang, Xiang Yin, Yuan Wan, Ling Xu, Chen Shen, Zejun Ma

(Submitted on 28 Oct 2020)

Abstract: Singing voice conversion (SVC) aims to convert the voice of one singer to that of other singers while keeping the singing content and melody. On top of recent voice conversion works, we propose a novel model to steadily convert songs while keeping their naturalness and intonation. We build an end-to-end architecture, taking phonetic posteriorgrams (PPGs) as inputs and generating mel spectrograms. Specifically, we implement two separate encoders: one encodes PPGs as content, and the other compresses mel spectrograms to supply acoustic and musical information. To improve the performance on timbre and melody, an adversarial singer confusion module and a mel-regressive representation learning module are designed for the model. Objective and subjective experiments are conducted on our private Chinese singing corpus. Comparing with the baselines, our methods can significantly improve the conversion performance in terms of naturalness, melody, and voice similarity. Moreover, our PPG-based method is proved to be robust for noisy sources.

Subjects:	Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
Cite as:	arXiv:2010.14804 [cs.SD]
	(or arXiv:2010.14804v1 [cs.SD] for this version)

Submission history

From: Benlai Tang [view email]
[v1] Wed, 28 Oct 2020 08:03:27 GMT (920kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> cs > arXiv:2010.14804v1

Download:

Current browse context:

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

Computer Science > Sound

Title: PPG-based singing voice conversion with adversarial representation learning

Submission history