We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

cs.CL

Change to browse by:

cs

References & Citations

DBLP - CS Bibliography

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo

Computer Science > Computation and Language

Title: Volctrans Parallel Corpus Filtering System for WMT 2020

Abstract: In this paper, we describe our submissions to the WMT20 shared task on parallel corpus filtering and alignment for low-resource conditions. The task requires the participants to align potential parallel sentence pairs out of the given document pairs, and score them so that low-quality pairs can be filtered. Our system, Volctrans, is made of two modules, i.e., a mining module and a scoring module. Based on the word alignment model, the mining module adopts an iterative mining strategy to extract latent parallel sentences. In the scoring module, an XLM-based scorer provides scores, followed by reranking mechanisms and ensemble. Our submissions outperform the baseline by 3.x/2.x and 2.x/2.x for km-en and ps-en on From Scratch/Fine-Tune conditions, which is the highest among all submissions.
Comments: WMT 2020
Subjects: Computation and Language (cs.CL)
Cite as: arXiv:2010.14029 [cs.CL]
  (or arXiv:2010.14029v1 [cs.CL] for this version)

Submission history

From: Runxin Xu [view email]
[v1] Tue, 27 Oct 2020 03:20:04 GMT (145kb,D)

Link back to: arXiv, form interface, contact.