Dynamic Prosody Generation for Speech Synthesis using Linguistics-Driven Acoustic Embedding Selection

Tyagi, Shubhi; Nicolis, Marco; Rohnke, Jonas; Drugman, Thomas; Lorenzo-Trueba, Jaime

doi:10.21437/Interspeech.2020-1411

Full-text links:

Download:

Current browse context:

cs.CL

< prev | next >

new | recent | 1912

Computer Science > Computation and Language

Title: Dynamic Prosody Generation for Speech Synthesis using Linguistics-Driven Acoustic Embedding Selection

Authors: Shubhi Tyagi, Marco Nicolis, Jonas Rohnke, Thomas Drugman, Jaime Lorenzo-Trueba

(Submitted on 2 Dec 2019 (v1), last revised 18 Nov 2020 (this version, v3))

Abstract: Recent advances in Text-to-Speech (TTS) have improved quality and naturalness to near-human capabilities when considering isolated sentences. But something which is still lacking in order to achieve human-like communication is the dynamic variations and adaptability of human speech. This work attempts to solve the problem of achieving a more dynamic and natural intonation in TTS systems, particularly for stylistic speech such as the newscaster speaking style. We propose a novel embedding selection approach which exploits linguistic information, leveraging the speech variability present in the training dataset. We analyze the contribution of both semantic and syntactic features. Our results show that the approach improves the prosody and naturalness for complex utterances as well as in Long Form Reading (LFR).

Subjects:	Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
Journal reference:	INTERSPEECH 2020: 4407-4411
DOI:	10.21437/Interspeech.2020-1411
Cite as:	arXiv:1912.00955 [cs.CL]
	(or arXiv:1912.00955v3 [cs.CL] for this version)

Submission history

From: Shubhi Tyagi [view email]
[v1] Mon, 2 Dec 2019 17:32:59 GMT (457kb,D)
[v2] Sat, 4 Jul 2020 17:54:56 GMT (1042kb,D)
[v3] Wed, 18 Nov 2020 16:47:30 GMT (1038kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> cs > arXiv:1912.00955

Download:

Current browse context:

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

Computer Science > Computation and Language

Title: Dynamic Prosody Generation for Speech Synthesis using Linguistics-Driven Acoustic Embedding Selection

Submission history