Autosegmental Neural Nets: Should Phones and Tones be Synchronous or Asynchronous?

Li, Jialu; Hasegawa-Johnson, Mark

doi:10.21437/Interspeech.2020-1834

Full-text links:

Download:

Current browse context:

eess.AS

< prev | next >

new | recent | 2007

Electrical Engineering and Systems Science > Audio and Speech Processing

Title: Autosegmental Neural Nets: Should Phones and Tones be Synchronous or Asynchronous?

Authors: Jialu Li, Mark Hasegawa-Johnson

(Submitted on 28 Jul 2020)

Abstract: Phones, the segmental units of the International Phonetic Alphabet (IPA), are used for lexical distinctions in most human languages; Tones, the suprasegmental units of the IPA, are used in perhaps 70%. Many previous studies have explored cross-lingual adaptation of automatic speech recognition (ASR) phone models, but few have explored the multilingual and cross-lingual transfer of synchronization between phones and tones. In this paper, we test four Connectionist Temporal Classification (CTC)-based acoustic models, differing in the degree of synchrony they impose between phones and tones. Models are trained and tested multilingually in three languages, then adapted and tested cross-lingually in a fourth. Both synchronous and asynchronous models are effective in both multilingual and cross-lingual settings. Synchronous models achieve lower error rate in the joint phone+tone tier, but asynchronous training results in lower tone error rate.

Comments:	Accepted to Interspeech 2020
Subjects:	Audio and Speech Processing (eess.AS); Sound (cs.SD)
DOI:	10.21437/Interspeech.2020-1834
Cite as:	arXiv:2007.14351 [eess.AS]
	(or arXiv:2007.14351v1 [eess.AS] for this version)

Submission history

From: Jialu Li [view email]
[v1] Tue, 28 Jul 2020 16:32:09 GMT (51kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> eess > arXiv:2007.14351

Download:

Current browse context:

Change to browse by:

References & Citations

Bookmark

Electrical Engineering and Systems Science > Audio and Speech Processing

Title: Autosegmental Neural Nets: Should Phones and Tones be Synchronous or Asynchronous?

Submission history