We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

cs.CL

Change to browse by:

cs

References & Citations

DBLP - CS Bibliography

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo

Computer Science > Computation and Language

Title: Weakly Supervised POS Taggers Perform Poorly on Truly Low-Resource Languages

Abstract: Part-of-speech (POS) taggers for low-resource languages which are exclusively based on various forms of weak supervision - e.g., cross-lingual transfer, type-level supervision, or a combination thereof - have been reported to perform almost as well as supervised ones. However, weakly supervised POS taggers are commonly only evaluated on languages that are very different from truly low-resource languages, and the taggers use sources of information, like high-coverage and almost error-free dictionaries, which are likely not available for resource-poor languages. We train and evaluate state-of-the-art weakly supervised POS taggers for a typologically diverse set of 15 truly low-resource languages. On these languages, given a realistic amount of resources, even our best model gets only less than half of the words right. Our results highlight the need for new and different approaches to POS tagging for truly low-resource languages.
Comments: AAAI 2020
Subjects: Computation and Language (cs.CL)
Cite as: arXiv:2004.13305 [cs.CL]
  (or arXiv:2004.13305v1 [cs.CL] for this version)

Submission history

From: Katharina Kann [view email]
[v1] Tue, 28 Apr 2020 05:14:08 GMT (37kb)

Link back to: arXiv, form interface, contact.