Spoken Language Identification System for English-Mandarin Code-Switching Child-Directed Speech

Gupta, Shashi Kant; Hiray, Sushant; Kukde, Prashant

doi:10.21437/Interspeech.2023-1335

Full-text links:

Download:

Current browse context:

eess.AS

< prev | next >

new | recent | 2306

Electrical Engineering and Systems Science > Audio and Speech Processing

Title: Spoken Language Identification System for English-Mandarin Code-Switching Child-Directed Speech

Authors: Shashi Kant Gupta, Sushant Hiray, Prashant Kukde

(Submitted on 1 Jun 2023)

Abstract: This work focuses on improving the Spoken Language Identification (LangId) system for a challenge that focuses on developing robust language identification systems that are reliable for non-standard, accented (Singaporean accent), spontaneous code-switched, and child-directed speech collected via Zoom. We propose a two-stage Encoder-Decoder-based E2E model. The encoder module consists of 1D depth-wise separable convolutions with Squeeze-and-Excitation (SE) layers with a global context. The decoder module uses an attentive temporal pooling mechanism to get fixed length time-independent feature representation. The total number of parameters in the model is around 22.1 M, which is relatively light compared to using some large-scale pre-trained speech models. We achieved an EER of 15.6% in the closed track and 11.1% in the open track (baseline system 22.1%). We also curated additional LangId data from YouTube videos (having Singaporean speakers), which will be released for public use.

Comments:	Accepted by Interspeech 2023, 5 pages, 1 figure, 4 tables
Subjects:	Audio and Speech Processing (eess.AS); Sound (cs.SD)
Journal reference:	Proc. INTERSPEECH 2023, 4114--4118
DOI:	10.21437/Interspeech.2023-1335
Cite as:	arXiv:2306.00736 [eess.AS]
	(or arXiv:2306.00736v1 [eess.AS] for this version)

Submission history

From: Shashi Kant Gupta [view email]
[v1] Thu, 1 Jun 2023 14:30:28 GMT (370kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> eess > arXiv:2306.00736

Download:

Current browse context:

Change to browse by:

References & Citations

Bookmark

Electrical Engineering and Systems Science > Audio and Speech Processing

Title: Spoken Language Identification System for English-Mandarin Code-Switching Child-Directed Speech

Submission history