XPhoneBERT: A Pre-trained Multilingual Model for Phoneme Representations for Text-to-Speech

Nguyen, Linh The; Pham, Thinh; Nguyen, Dat Quoc

Full-text links:

Download:

Current browse context:

cs.CL

< prev | next >

new | recent | 2305

Computer Science > Computation and Language

Title: XPhoneBERT: A Pre-trained Multilingual Model for Phoneme Representations for Text-to-Speech

Authors: Linh The Nguyen, Thinh Pham, Dat Quoc Nguyen

(Submitted on 31 May 2023)

Abstract: We present XPhoneBERT, the first multilingual model pre-trained to learn phoneme representations for the downstream text-to-speech (TTS) task. Our XPhoneBERT has the same model architecture as BERT-base, trained using the RoBERTa pre-training approach on 330M phoneme-level sentences from nearly 100 languages and locales. Experimental results show that employing XPhoneBERT as an input phoneme encoder significantly boosts the performance of a strong neural TTS model in terms of naturalness and prosody and also helps produce fairly high-quality speech with limited training data. We publicly release our pre-trained XPhoneBERT with the hope that it would facilitate future research and downstream TTS applications for multiple languages. Our XPhoneBERT model is available at this https URL

Comments:	In Proceedings of INTERSPEECH 2023 (to appear)
Subjects:	Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
Cite as:	arXiv:2305.19709 [cs.CL]
	(or arXiv:2305.19709v1 [cs.CL] for this version)

Submission history

From: Dat Quoc Nguyen [view email]
[v1] Wed, 31 May 2023 10:05:33 GMT (2804kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> cs > arXiv:2305.19709

Download:

Current browse context:

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

Computer Science > Computation and Language

Title: XPhoneBERT: A Pre-trained Multilingual Model for Phoneme Representations for Text-to-Speech

Submission history