Towards Universal Neural Vocoding with a Multi-band Excited WaveNet

Roebel, Axel; Bous, Frederik

Full-text links:

Download:

Current browse context:

eess.AS

< prev | next >

new | recent | 2110

Electrical Engineering and Systems Science > Audio and Speech Processing

Title: Towards Universal Neural Vocoding with a Multi-band Excited WaveNet

Authors: Axel Roebel, Frederik Bous

(Submitted on 7 Oct 2021)

Abstract: This paper introduces the Multi-Band Excited WaveNet a neural vocoder for speaking and singing voices. It aims to advance the state of the art towards an universal neural vocoder, which is a model that can generate voice signals from arbitrary mel spectrograms extracted from voice signals. Following the success of the DDSP model and following the development of the recently proposed excitation vocoders we propose a vocoder structure consisting of multiple specialized DNN that are combined with dedicated signal processing components. All components are implemented as differentiable operators and therefore allow joined optimization of the model parameters. To prove the capacity of the model to reproduce high quality voice signals we evaluate the model on single and multi speaker/singer datasets. We conduct a subjective evaluation demonstrating that the models support a wide range of domain variations (unseen voices, languages, expressivity) achieving perceptive quality that compares with a state of the art universal neural vocoder, however using significantly smaller training datasets and significantly less parameters. We also demonstrate remaining limits of the universality of neural vocoders e.g. the creation of saturated singing voices.

Subjects:	Audio and Speech Processing (eess.AS); Sound (cs.SD)
Cite as:	arXiv:2110.03329 [eess.AS]
	(or arXiv:2110.03329v1 [eess.AS] for this version)

Submission history

From: Axel Roebel [view email]
[v1] Thu, 7 Oct 2021 10:47:03 GMT (73kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> eess > arXiv:2110.03329

Download:

Current browse context:

Change to browse by:

References & Citations

Bookmark

Electrical Engineering and Systems Science > Audio and Speech Processing

Title: Towards Universal Neural Vocoding with a Multi-band Excited WaveNet

Submission history