We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

cs.CL

Change to browse by:

cs

References & Citations

DBLP - CS Bibliography

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo

Computer Science > Computation and Language

Title: Unknown Words Analysis in POS tagging of Sinhala Language

Abstract: Part of Speech (POS) is a very vital topic in Natural Language Processing (NLP) task in any language, which involves analysing the construction of the language, behaviours and the dynamics of the language, the knowledge that could be utilized in computational linguistics analysis and automation applications. In this context, dealing with unknown words (words do not appear in the lexicon referred as unknown words) is also an important task, since growing NLP systems are used in more and more new applications. One aid of predicting lexical categories of unknown words is the use of syntactical knowledge of the language. The distinction between open class words and closed class words together with syntactical features of the language used in this research to predict lexical categories of unknown words in the tagging process. An experiment is performed to investigate the ability of the approach to parse unknown words using syntactical knowledge without human intervention. This experiment shows that the performance of the tagging process is enhanced when word class distinction is used together with syntactic rules to parse sentences containing unknown words in Sinhala language.
Comments: 7 pages
Subjects: Computation and Language (cs.CL)
ACM classes: I.2.7
Cite as: arXiv:1501.01254 [cs.CL]
  (or arXiv:1501.01254v1 [cs.CL] for this version)

Submission history

From: Manoj Jayaweera [view email]
[v1] Tue, 6 Jan 2015 18:03:13 GMT (621kb)

Link back to: arXiv, form interface, contact.