We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

cs.CL

Change to browse by:

cs

References & Citations

DBLP - CS Bibliography

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo

Computer Science > Computation and Language

Title: Use of Transformer-Based Models for Word-Level Transliteration of the Book of the Dean of Lismore

Abstract: The Book of the Dean of Lismore (BDL) is a 16th-century Scottish Gaelic manuscript written in a non-standard orthography. In this work, we outline the problem of transliterating the text of the BDL into a standardised orthography, and perform exploratory experiments using Transformer-based models for this task. In particular, we focus on the task of word-level transliteration, and achieve a character-level BLEU score of 54.15 with our best model, a BART architecture pre-trained on the text of Scottish Gaelic Wikipedia and then fine-tuned on around 2,000 word-level parallel examples. Our initial experiments give promising results, but we highlight the shortcomings of our model, and discuss directions for future work.
Comments: 4th Celtic Language Technology Workshop
Subjects: Computation and Language (cs.CL)
Cite as: arXiv:2205.11370 [cs.CL]
  (or arXiv:2205.11370v2 [cs.CL] for this version)

Submission history

From: Edward Gow-Smith [view email]
[v1] Mon, 23 May 2022 15:04:26 GMT (23kb)
[v2] Tue, 31 May 2022 12:09:52 GMT (23kb,D)

Link back to: arXiv, form interface, contact.