LSTM Deep Neural Networks Postfiltering for Improving the Quality of Synthetic Voices

Coto-Jiménez, Marvin; Goddard-Close, John

Full-text links:

Download:

Current browse context:

cs.NE

< prev | next >

new | recent | 1602

Computer Science > Sound

Title: LSTM Deep Neural Networks Postfiltering for Improving the Quality of Synthetic Voices

Authors: Marvin Coto-Jiménez, John Goddard-Close

(Submitted on 8 Feb 2016)

Abstract: Recent developments in speech synthesis have produced systems capable of outcome intelligible speech, but now researchers strive to create models that more accurately mimic human voices. One such development is the incorporation of multiple linguistic styles in various languages and accents.
HMM-based Speech Synthesis is of great interest to many researchers, due to its ability to produce sophisticated features with small footprint. Despite such progress, its quality has not yet reached the level of the predominant unit-selection approaches that choose and concatenate recordings of real speech. Recent efforts have been made in the direction of improving these systems.
In this paper we present the application of Long-Short Term Memory Deep Neural Networks as a Postfiltering step of HMM-based speech synthesis, in order to obtain closer spectral characteristics to those of natural speech. The results show how HMM-voices could be improved using this approach.

Comments:	5 pages, 5 figures
Subjects:	Sound (cs.SD); Neural and Evolutionary Computing (cs.NE)
Cite as:	arXiv:1602.02656 [cs.SD]
	(or arXiv:1602.02656v1 [cs.SD] for this version)

Submission history

From: Marvin Coto Mr. [view email]
[v1] Mon, 8 Feb 2016 17:25:22 GMT (1939kb)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> cs > arXiv:1602.02656

Download:

Current browse context:

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

Computer Science > Sound

Title: LSTM Deep Neural Networks Postfiltering for Improving the Quality of Synthetic Voices

Submission history