Current browse context:
eess.AS
Change to browse by:
References & Citations
Electrical Engineering and Systems Science > Audio and Speech Processing
Title: Speech-to-Singing Conversion based on Boundary Equilibrium GAN
(Submitted on 28 May 2020 (v1), last revised 5 Aug 2020 (this version, v3))
Abstract: This paper investigates the use of generative adversarial network (GAN)-based models for converting the spectrogram of a speech signal into that of a singing one, without reference to the phoneme sequence underlying the speech. This is achieved by viewing speech-to-singing conversion as a style transfer problem. Specifically, given a speech input, and optionally the F0 contour of the target singing, the proposed model generates as the output a singing signal with a progressive-growing encoder/decoder architecture and boundary equilibrium GAN loss functions. Our quantitative and qualitative analysis show that the proposed model generates singing voices with much higher naturalness than an existing non adversarially-trained baseline. For reproducibility, the code will be publicly available at a GitHub repository upon paper publication.
Submission history
From: Da Yi Wu [view email][v1] Thu, 28 May 2020 08:18:02 GMT (1407kb,D)
[v2] Sat, 30 May 2020 13:34:14 GMT (1407kb,D)
[v3] Wed, 5 Aug 2020 13:21:32 GMT (1722kb,D)
Link back to: arXiv, form interface, contact.