We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

cs.LG

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo ScienceWISE logo

Computer Science > Machine Learning

Title: Representation Mixing for TTS Synthesis

Abstract: Recent character and phoneme-based parametric TTS systems using deep learning have shown strong performance in natural speech generation. However, the choice between character or phoneme input can create serious limitations for practical deployment, as direct control of pronunciation is crucial in certain cases. We demonstrate a simple method for combining multiple types of linguistic information in a single encoder, named representation mixing, enabling flexible choice between character, phoneme, or mixed representations during inference. Experiments and user studies on a public audiobook corpus show the efficacy of our approach.
Comments: 5 pages, 3 figures
Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS); Machine Learning (stat.ML)
Cite as: arXiv:1811.07240 [cs.LG]
  (or arXiv:1811.07240v2 [cs.LG] for this version)

Submission history

From: Kyle Kastner [view email]
[v1] Sat, 17 Nov 2018 22:45:15 GMT (63kb,D)
[v2] Sat, 24 Nov 2018 23:16:10 GMT (63kb,D)

Link back to: arXiv, form interface, contact.