We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

cs.CL

Change to browse by:

cs

References & Citations

DBLP - CS Bibliography

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo

Computer Science > Computation and Language

Title: Prevalence of code mixing in semi-formal patient communication in low resource languages of South Africa

Abstract: In this paper we address the problem of code-mixing in resource-poor language settings. We examine data consisting of 182k unique questions generated by users of the MomConnect helpdesk, part of a national scale public health platform in South Africa. We show evidence of code-switching at the level of approximately 10% within this dataset -- a level that is likely to pose challenges for future services. We use a natural language processing library (Polyglot) that supports detection of 196 languages and attempt to evaluate its performance at identifying English, isiZulu and code-mixed questions.
Comments: 3 pages, Presented at NeurIPS 2019 Workshop on Machine Learning for the Developing World
Subjects: Computation and Language (cs.CL)
Cite as: arXiv:1911.05636 [cs.CL]
  (or arXiv:1911.05636v3 [cs.CL] for this version)

Submission history

From: Charles Copley [view email]
[v1] Wed, 13 Nov 2019 17:12:40 GMT (12kb)
[v2] Thu, 14 Nov 2019 05:50:55 GMT (12kb)
[v3] Tue, 10 Dec 2019 14:55:37 GMT (31kb)

Link back to: arXiv, form interface, contact.