MULTEXT-East

Erjavec, Tomaž

doi:10.1007/978-94-024-0881-2_17

Full-text links:

Download:

Current browse context:

cs.CL

< prev | next >

new | recent | 2003

Change to browse by:

Computer Science > Computation and Language

Title: MULTEXT-East

Authors: Tomaž Erjavec

(Submitted on 31 Mar 2020)

Abstract: MULTEXT-East language resources, a multilingual dataset for language engineering research, focused on the morphosyntactic level of linguistic description. The MULTEXT-East dataset includes the EAGLES-based morphosyntactic specifications, morphosyntactic lexicons, and an annotated multilingual corpora. The parallel corpus, the novel "1984" by George Orwell, is sentence aligned and contains hand-validated morphosyntactic descriptions and lemmas. The resources are uniformly encoded in XML, using the Text Encoding Initiative Guidelines, TEI P5, and cover 16 languages: Bulgarian, Croatian, Czech, English, Estonian, Hungarian, Macedonian, Persian, Polish, Resian, Romanian, Russian, Serbian, Slovak, Slovene, and Ukrainian. This dataset is extensively documented, and freely available for research purposes. This case study gives a history of the development of the MULTEXT-East resources, presents their encoding and components, discusses related work and gives some conclusions.

Subjects:	Computation and Language (cs.CL)
ACM classes:	I.2.7
Journal reference:	Published in: Nancy Ide, James Pustejovsky, eds. 2007. Handbook of linguistic annotation. pp. 441-462. Springer
DOI:	10.1007/978-94-024-0881-2_17
Cite as:	arXiv:2003.14026 [cs.CL]
	(or arXiv:2003.14026v1 [cs.CL] for this version)

Submission history

From: Tomaž Erjavec [view email]
[v1] Tue, 31 Mar 2020 08:45:52 GMT (110kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> cs > arXiv:2003.14026

Download:

Current browse context:

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

Computer Science > Computation and Language

Title: MULTEXT-East

Submission history