Environment Aware Text-to-Speech Synthesis

Tan, Daxin; Zhang, Guangyan; Lee, Tan

Full-text links:

Download:

Current browse context:

eess.AS

< prev | next >

new | recent | 2110

Electrical Engineering and Systems Science > Audio and Speech Processing

Title: Environment Aware Text-to-Speech Synthesis

Authors: Daxin Tan, Guangyan Zhang, Tan Lee

(Submitted on 8 Oct 2021 (v1), last revised 6 Aug 2022 (this version, v4))

Abstract: This study aims at designing an environment-aware text-to-speech (TTS) system that can generate speech to suit specific acoustic environments. It is also motivated by the desire to leverage massive data of speech audio from heterogeneous sources in TTS system development. The key idea is to model the acoustic environment in speech audio as a factor of data variability and incorporate it as a condition in the process of neural network based speech synthesis. Two embedding extractors are trained with two purposely constructed datasets for characterization and disentanglement of speaker and environment factors in speech. A neural network model is trained to generate speech from extracted speaker and environment embeddings. Objective and subjective evaluation results demonstrate that the proposed TTS system is able to effectively disentangle speaker and environment factors and synthesize speech audio that carries designated speaker characteristics and environment attribute. Audio samples are available online for demonstration this https URL .

Comments:	Accepted by Interspeech 2022
Subjects:	Audio and Speech Processing (eess.AS); Sound (cs.SD)
Cite as:	arXiv:2110.03887 [eess.AS]
	(or arXiv:2110.03887v4 [eess.AS] for this version)

Submission history

From: Daxin Tan [view email]
[v1] Fri, 8 Oct 2021 04:19:19 GMT (776kb,D)
[v2] Mon, 11 Oct 2021 12:12:49 GMT (1118kb,D)
[v3] Sat, 2 Apr 2022 08:37:35 GMT (1129kb,D)
[v4] Sat, 6 Aug 2022 18:55:53 GMT (1128kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> eess > arXiv:2110.03887

Download:

Current browse context:

Change to browse by:

References & Citations

Bookmark

Electrical Engineering and Systems Science > Audio and Speech Processing

Title: Environment Aware Text-to-Speech Synthesis

Submission history