References & Citations
Computer Science > Computation and Language
Title: Prevalence of code mixing in semi-formal patient communication in low resource languages of South Africa
(Submitted on 13 Nov 2019 (v1), last revised 10 Dec 2019 (this version, v3))
Abstract: In this paper we address the problem of code-mixing in resource-poor language settings. We examine data consisting of 182k unique questions generated by users of the MomConnect helpdesk, part of a national scale public health platform in South Africa. We show evidence of code-switching at the level of approximately 10% within this dataset -- a level that is likely to pose challenges for future services. We use a natural language processing library (Polyglot) that supports detection of 196 languages and attempt to evaluate its performance at identifying English, isiZulu and code-mixed questions.
Submission history
From: Charles Copley [view email][v1] Wed, 13 Nov 2019 17:12:40 GMT (12kb)
[v2] Thu, 14 Nov 2019 05:50:55 GMT (12kb)
[v3] Tue, 10 Dec 2019 14:55:37 GMT (31kb)
Link back to: arXiv, form interface, contact.