We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:


Current browse context:


Change to browse by:


References & Citations

DBLP - CS Bibliography


(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo ScienceWISE logo

Computer Science > Computation and Language

Title: Using CollGram to Compare Formulaic Language in Human and Neural Machine Translation

Authors: Yves Bestgen
Abstract: A comparison of formulaic sequences in human and neural machine translation of quality newspaper articles shows that neural machine translations contain less lower-frequency, but strongly-associated formulaic sequences, and more high-frequency formulaic sequences. These differences were statistically significant and the effect sizes were almost always medium or large. These observations can be related to the differences between second language learners of various levels and between translated and untranslated texts. The comparison between the neural machine translation systems indicates that some systems produce more formulaic sequences of both types than other systems.
Comments: Accepted at Translation and Interpreting Technology Online - TRITON 2021, two figures added
Subjects: Computation and Language (cs.CL)
Cite as: arXiv:2107.03625 [cs.CL]
  (or arXiv:2107.03625v2 [cs.CL] for this version)

Submission history

From: Yves Bestgen [view email]
[v1] Thu, 8 Jul 2021 06:30:35 GMT (87kb)
[v2] Fri, 23 Jul 2021 09:59:18 GMT (248kb,D)

Link back to: arXiv, form interface, contact.