Quasi-Periodic WaveNet: An Autoregressive Raw Waveform Generative Model with Pitch-dependent Dilated Convolution Neural Network

Wu, Yi-Chiao; Hayashi, Tomoki; Tobing, Patrick Lumban; Kobayashi, Kazuhiro; Toda, Tomoki

doi:10.1109/TASLP.2021.3061245

Full-text links:

Download:

Current browse context:

eess.AS

< prev | next >

new | recent | 2007

Electrical Engineering and Systems Science > Audio and Speech Processing

Title: Quasi-Periodic WaveNet: An Autoregressive Raw Waveform Generative Model with Pitch-dependent Dilated Convolution Neural Network

Authors: Yi-Chiao Wu, Tomoki Hayashi, Patrick Lumban Tobing, Kazuhiro Kobayashi, Tomoki Toda

(Submitted on 11 Jul 2020 (v1), last revised 27 Mar 2021 (this version, v3))

Abstract: In this paper, a pitch-adaptive waveform generative model named Quasi-Periodic WaveNet (QPNet) is proposed to improve the limited pitch controllability of vanilla WaveNet (WN) using pitch-dependent dilated convolution neural networks (PDCNNs). Specifically, as a probabilistic autoregressive generation model with stacked dilated convolution layers, WN achieves high-fidelity audio waveform generation. However, the pure-data-driven nature and the lack of prior knowledge of audio signals degrade the pitch controllability of WN. For instance, it is difficult for WN to precisely generate the periodic components of audio signals when the given auxiliary fundamental frequency ($F_{0}$) features are outside the $F_{0}$ range observed in the training data. To address this problem, QPNet with two novel designs is proposed. First, the PDCNN component is applied to dynamically change the network architecture of WN according to the given auxiliary $F_{0}$ features. Second, a cascaded network structure is utilized to simultaneously model the long- and short-term dependencies of quasi-periodic signals such as speech. The performances of single-tone sinusoid and speech generations are evaluated. The experimental results show the effectiveness of the PDCNNs for unseen auxiliary $F_{0}$ features and the effectiveness of the cascaded structure for speech generation.

Comments:	15 pages, 12 figures, 11 tables
Subjects:	Audio and Speech Processing (eess.AS); Sound (cs.SD)
Journal reference:	IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 29, pp. 1134-1148, 2021
DOI:	10.1109/TASLP.2021.3061245
Cite as:	arXiv:2007.05663 [eess.AS]
	(or arXiv:2007.05663v3 [eess.AS] for this version)

Submission history

From: Yi-Chiao Wu [view email]
[v1] Sat, 11 Jul 2020 02:23:08 GMT (2315kb,D)
[v2] Wed, 11 Nov 2020 08:54:42 GMT (1934kb,D)
[v3] Sat, 27 Mar 2021 06:41:13 GMT (1519kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> eess > arXiv:2007.05663

Download:

Current browse context:

Change to browse by:

References & Citations

Bookmark

Electrical Engineering and Systems Science > Audio and Speech Processing

Title: Quasi-Periodic WaveNet: An Autoregressive Raw Waveform Generative Model with Pitch-dependent Dilated Convolution Neural Network

Submission history