We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

cs.SD

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo

Computer Science > Sound

Title: Cross-Lingual Text-to-Speech Using Multi-Task Learning and Speaker Classifier Joint Training

Authors: J. Yang, Lei He
Abstract: In cross-lingual speech synthesis, the speech in various languages can be synthesized for a monoglot speaker. Normally, only the data of monoglot speakers are available for model training, thus the speaker similarity is relatively low between the synthesized cross-lingual speech and the native language recordings. Based on the multilingual transformer text-to-speech model, this paper studies a multi-task learning framework to improve the cross-lingual speaker similarity. To further improve the speaker similarity, joint training with a speaker classifier is proposed. Here, a scheme similar to parallel scheduled sampling is proposed to train the transformer model efficiently to avoid breaking the parallel training mechanism when introducing joint training. By using multi-task learning and speaker classifier joint training, in subjective and objective evaluations, the cross-lingual speaker similarity can be consistently improved for both the seen and unseen speakers in the training set.
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
Cite as: arXiv:2201.08124 [cs.SD]
  (or arXiv:2201.08124v1 [cs.SD] for this version)

Submission history

From: Jingzhou Yang [view email]
[v1] Thu, 20 Jan 2022 12:02:58 GMT (757kb,D)

Link back to: arXiv, form interface, contact.