We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

cs

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo ScienceWISE logo

Computer Science > Computation and Language

Title: Creating and Managing a large annotated parallel corpora of Indian languages

Abstract: This paper presents the challenges in creating and managing large parallel corpora of 12 major Indian languages (which is soon to be extended to 23 languages) as part of a major consortium project funded by the Department of Information Technology (DIT), Govt. of India, and running parallel in 10 different universities of India. In order to efficiently manage the process of creation and dissemination of these huge corpora, the web-based (with a reduced stand-alone version also) annotation tool ILCIANN (Indian Languages Corpora Initiative Annotation Tool) has been developed. It was primarily developed for the POS annotation as well as the management of the corpus annotation by people with differing amount of competence and at locations physically situated far apart. In order to maintain consistency and standards in the creation of the corpora, it was necessary that everyone works on a common platform which was provided by this tool.
Subjects: Computation and Language (cs.CL)
Journal reference: Proceedings of Workshop on Challenges in the management of large corpora (CMLC-2012), 8th International Conference on Language Resources and Evaluation (LREC 2012), pp. 18 - 22, 2012
Cite as: arXiv:2112.01764 [cs.CL]
  (or arXiv:2112.01764v1 [cs.CL] for this version)

Submission history

From: Ritesh Kumar [view email]
[v1] Fri, 3 Dec 2021 07:44:22 GMT (106kb)

Link back to: arXiv, form interface, contact.