We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

cs.CL

Change to browse by:

cs

References & Citations

DBLP - CS Bibliography

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo

Computer Science > Computation and Language

Title: Universal Dependency Treebank for Odia Language

Abstract: This paper presents the first publicly available treebank of Odia, a morphologically rich low resource Indian language. The treebank contains approx. 1082 tokens (100 sentences) in Odia selected from "Samantar", the largest available parallel corpora collection for Indic languages. All the selected sentences are manually annotated following the ``Universal Dependency (UD)" guidelines. The morphological analysis of the Odia treebank was performed using machine learning techniques. The Odia annotated treebank will enrich the Odia language resource and will help in building language technology tools for cross-lingual learning and typological research. We also build a preliminary Odia parser using a machine learning approach. The accuracy of the parser is 86.6% Tokenization, 64.1% UPOS, 63.78% XPOS, 42.04% UAS and 21.34% LAS. Finally, the paper briefly discusses the linguistic analysis of the Odia UD treebank.
Comments: To be appear in 6th Workshop on Indian Language Data: Resources and Evaluation (WILDRE-6) @ LREC 2022
Subjects: Computation and Language (cs.CL)
Cite as: arXiv:2205.11976 [cs.CL]
  (or arXiv:2205.11976v1 [cs.CL] for this version)

Submission history

From: Shantipriya Parida [view email]
[v1] Tue, 24 May 2022 11:19:26 GMT (1929kb,D)

Link back to: arXiv, form interface, contact.