We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

cs.SD

Change to browse by:

cs

References & Citations

DBLP - CS Bibliography

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo

Computer Science > Sound

Title: Language-specific Acoustic Boundary Learning for Mandarin-English Code-switching Speech Recognition

Abstract: Code-switching speech recognition (CSSR) transcribes speech that switches between multiple languages or dialects within a single sentence. The main challenge in this task is that different languages often have similar pronunciations, making it difficult for models to distinguish between them. In this paper, we propose a method for solving the CSSR task from the perspective of language-specific acoustic boundary learning. We introduce language-specific weight estimators (LSWE) to model acoustic boundary learning in different languages separately. Additionally, a non-autoregressive (NAR) decoder and a language change detection (LCD) module are employed to assist in training. Evaluated on the SEAME corpus, our method achieves a state-of-the-art mixed error rate (MER) of 16.29% and 22.81% on the test_man and test_sge sets. We also demonstrate the effectiveness of our method on a 9000-hour in-house meeting code-switching dataset, where our method achieves a relatively 7.9% MER reduction.
Subjects: Sound (cs.SD)
Cite as: arXiv:2306.05279 [cs.SD]
  (or arXiv:2306.05279v1 [cs.SD] for this version)

Submission history

From: Zhiyun Fan [view email]
[v1] Thu, 8 Jun 2023 15:27:40 GMT (626kb,D)

Link back to: arXiv, form interface, contact.