Universal Adaptor: Converting Mel-Spectrograms Between Different Configurations for Speech Synthesis

Wang, Fan-Lin; Hsu, Po-chun; Liu, Da-rong; Lee, Hung-yi

Full-text links:

Download:

Current browse context:

eess.AS

< prev | next >

new | recent | 2204

Electrical Engineering and Systems Science > Audio and Speech Processing

Title: Universal Adaptor: Converting Mel-Spectrograms Between Different Configurations for Speech Synthesis

Authors: Fan-Lin Wang, Po-chun Hsu, Da-rong Liu, Hung-yi Lee

(Submitted on 1 Apr 2022 (v1), last revised 29 Oct 2022 (this version, v2))

Abstract: Most recent speech synthesis systems are composed of a synthesizer and a vocoder. However, the existing synthesizers and vocoders can only be matched to acoustic features extracted with a specific configuration. Hence, we can't combine arbitrary synthesizers and vocoders together to form a complete system, not to mention apply to a newly developed model. In this paper, we proposed Universal Adaptor, which takes a Mel-spectrogram parametrized by the source configuration and converts it into a Mel-spectrogram parametrized by the target configuration, as long as we feed in the source and the target configurations. Experiments show that the quality of speeches synthesized from our output of Universal Adaptor is comparable to those synthesized from ground truth Mel-spectrogram no matter in single-speaker or multi-speaker scenarios. Moreover, Universal Adaptor can be applied in the recent TTS systems and voice conversion systems without dropping quality.

Subjects:	Audio and Speech Processing (eess.AS); Sound (cs.SD)
Cite as:	arXiv:2204.00170 [eess.AS]
	(or arXiv:2204.00170v2 [eess.AS] for this version)

Submission history

From: Po-Chun Hsu [view email]
[v1] Fri, 1 Apr 2022 02:43:13 GMT (1134kb,D)
[v2] Sat, 29 Oct 2022 13:25:23 GMT (734kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> eess > arXiv:2204.00170

Download:

Current browse context:

Change to browse by:

References & Citations

Bookmark

Electrical Engineering and Systems Science > Audio and Speech Processing

Title: Universal Adaptor: Converting Mel-Spectrograms Between Different Configurations for Speech Synthesis

Submission history