We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

cs.CL

Change to browse by:

cs

References & Citations

DBLP - CS Bibliography

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo ScienceWISE logo

Computer Science > Computation and Language

Title: Lemmatization of Historical Old Literary Finnish Texts in Modern Orthography

Abstract: Texts written in Old Literary Finnish represent the first literary work ever written in Finnish starting from the 16th century. There have been several projects in Finland that have digitized old publications and made them available for research use. However, using modern NLP methods in such data poses great challenges. In this paper we propose an approach for simultaneously normalizing and lemmatizing Old Literary Finnish into modern spelling. Our best model reaches to 96.3\% accuracy in texts written by Agricola and 87.7\% accuracy in other contemporary out-of-domain text. Our method has been made freely available on Zenodo and Github.
Comments: la 28e Conf\'erence sur le Traitement Automatique des Langues Naturelles (TALN)
Subjects: Computation and Language (cs.CL)
Cite as: arXiv:2107.03266 [cs.CL]
  (or arXiv:2107.03266v1 [cs.CL] for this version)

Submission history

From: Mika Hämäläinen [view email]
[v1] Wed, 7 Jul 2021 15:01:13 GMT (37kb)

Link back to: arXiv, form interface, contact.