We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

cs.CL

Change to browse by:

cs

References & Citations

DBLP - CS Bibliography

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo

Computer Science > Computation and Language

Title: Translator2Vec: Understanding and Representing Human Post-Editors

Abstract: The combination of machines and humans for translation is effective, with many studies showing productivity gains when humans post-edit machine-translated output instead of translating from scratch. To take full advantage of this combination, we need a fine-grained understanding of how human translators work, and which post-editing styles are more effective than others. In this paper, we release and analyze a new dataset with document-level post-editing action sequences, including edit operations from keystrokes, mouse actions, and waiting times. Our dataset comprises 66,268 full document sessions post-edited by 332 humans, the largest of the kind released to date. We show that action sequences are informative enough to identify post-editors accurately, compared to baselines that only look at the initial and final text. We build on this to learn and visualize continuous representations of post-editors, and we show that these representations improve the downstream task of predicting post-editing time.
Comments: Accepted on MT Summit 2019; dataset available here: this https URL; please cite as: @article{gois2019translator2vec, title={Translator2Vec: Understanding and Representing Human Post-Editors}, author={G\'ois, Ant\'onio and F. T. Martins, Andr\'e}, year={2019}, publisher={European Association for Machine Translation} }
Subjects: Computation and Language (cs.CL)
Cite as: arXiv:1907.10362 [cs.CL]
  (or arXiv:1907.10362v1 [cs.CL] for this version)

Submission history

From: António Góis [view email]
[v1] Wed, 24 Jul 2019 11:01:24 GMT (310kb,D)

Link back to: arXiv, form interface, contact.