GELP: GAN-Excited Linear Prediction for Speech Synthesis from Mel-spectrogram

Juvela, Lauri; Bollepalli, Bajibabu; Yamagishi, Junichi; Alku, Paavo

Full-text links:

Download:

Current browse context:

eess.AS

< prev | next >

new | recent | 1904

Electrical Engineering and Systems Science > Audio and Speech Processing

Title: GELP: GAN-Excited Linear Prediction for Speech Synthesis from Mel-spectrogram

Authors: Lauri Juvela, Bajibabu Bollepalli, Junichi Yamagishi, Paavo Alku

(Submitted on 8 Apr 2019 (v1), last revised 26 Jun 2019 (this version, v3))

Abstract: Recent advances in neural network -based text-to-speech have reached human level naturalness in synthetic speech. The present sequence-to-sequence models can directly map text to mel-spectrogram acoustic features, which are convenient for modeling, but present additional challenges for vocoding (i.e., waveform generation from the acoustic features). High-quality synthesis can be achieved with neural vocoders, such as WaveNet, but such autoregressive models suffer from slow sequential inference. Meanwhile, their existing parallel inference counterparts are difficult to train and require increasingly large model sizes. In this paper, we propose an alternative training strategy for a parallel neural vocoder utilizing generative adversarial networks, and integrate a linear predictive synthesis filter into the model. Results show that the proposed model achieves significant improvement in inference speed, while outperforming a WaveNet in copy-synthesis quality.

Comments:	Interspeech 2019 accepted version
Subjects:	Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
Cite as:	arXiv:1904.03976 [eess.AS]
	(or arXiv:1904.03976v3 [eess.AS] for this version)

Submission history

From: Lauri Juvela [view email]
[v1] Mon, 8 Apr 2019 11:58:00 GMT (264kb,D)
[v2] Wed, 10 Apr 2019 12:44:57 GMT (264kb,D)
[v3] Wed, 26 Jun 2019 14:25:15 GMT (163kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> eess > arXiv:1904.03976

Download:

Current browse context:

Change to browse by:

References & Citations

Bookmark

Electrical Engineering and Systems Science > Audio and Speech Processing

Title: GELP: GAN-Excited Linear Prediction for Speech Synthesis from Mel-spectrogram

Submission history