Knowledge-and-Data-Driven Amplitude Spectrum Prediction for Hierarchical Neural Vocoders

Ai, Yang; Ling, Zhen-Hua

Full-text links:

Download:

Current browse context:

eess.AS

< prev | next >

new | recent | 2004

Electrical Engineering and Systems Science > Audio and Speech Processing

Title: Knowledge-and-Data-Driven Amplitude Spectrum Prediction for Hierarchical Neural Vocoders

Authors: Yang Ai, Zhen-Hua Ling

(Submitted on 16 Apr 2020 (v1), last revised 18 May 2020 (this version, v2))

Abstract: In our previous work, we have proposed a neural vocoder called HiNet which recovers speech waveforms by predicting amplitude and phase spectra hierarchically from input acoustic features. In HiNet, the amplitude spectrum predictor (ASP) predicts log amplitude spectra (LAS) from input acoustic features. This paper proposes a novel knowledge-and-data-driven ASP (KDD-ASP) to improve the conventional one. First, acoustic features (i.e., F0 and mel-cepstra) pass through a knowledge-driven LAS recovery module to obtain approximate LAS (ALAS). This module is designed based on the combination of STFT and source-filter theory, in which the source part and the filter part are designed based on input F0 and mel-cepstra, respectively. Then, the recovered ALAS are processed by a data-driven LAS refinement module which consists of multiple trainable convolutional layers to get the final LAS. Experimental results show that the HiNet vocoder using KDD-ASP can achieve higher quality of synthetic speech than that using conventional ASP and the WaveRNN vocoder on a text-to-speech (TTS) task.

Comments:	Submitted to Interspeech 2020
Subjects:	Audio and Speech Processing (eess.AS); Sound (cs.SD)
Cite as:	arXiv:2004.07832 [eess.AS]
	(or arXiv:2004.07832v2 [eess.AS] for this version)

Submission history

From: Yang Ai [view email]
[v1] Thu, 16 Apr 2020 12:20:37 GMT (581kb,D)
[v2] Mon, 18 May 2020 07:35:35 GMT (581kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> eess > arXiv:2004.07832

Download:

Current browse context:

Change to browse by:

References & Citations

Bookmark

Electrical Engineering and Systems Science > Audio and Speech Processing

Title: Knowledge-and-Data-Driven Amplitude Spectrum Prediction for Hierarchical Neural Vocoders

Submission history