We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

cs.CL

Change to browse by:

cs

References & Citations

DBLP - CS Bibliography

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo

Computer Science > Computation and Language

Title: A Digital Corpus of St. Lawrence Island Yupik

Abstract: St. Lawrence Island Yupik (ISO 639-3: ess) is an endangered polysynthetic language in the Inuit-Yupik language family indigenous to Alaska and Chukotka. This work presents a step-by-step pipeline for the digitization of written texts, and the first publicly available digital corpus for St. Lawrence Island Yupik, created using that pipeline. This corpus has great potential for future linguistic inquiry and research in NLP. It was also developed for use in Yupik language education and revitalization, with a primary goal of enabling easy access to Yupik texts by educators and by members of the Yupik community. A secondary goal is to support development of language technology such as spell-checkers, text-completion systems, interactive e-books, and language learning apps for use by the Yupik community.
Comments: ComputEL-4
Subjects: Computation and Language (cs.CL)
Cite as: arXiv:2101.10496 [cs.CL]
  (or arXiv:2101.10496v1 [cs.CL] for this version)

Submission history

From: Lane Schwartz [view email]
[v1] Tue, 26 Jan 2021 00:14:00 GMT (265kb,D)

Link back to: arXiv, form interface, contact.