Aligning Cross-lingual Sentence Representations with Dual Momentum Contrast

Wang, Liang; Zhao, Wei; Liu, Jingming

Full-text links:

Download:

Current browse context:

cs.CL

< prev | next >

new | recent | 2109

Computer Science > Computation and Language

Title: Aligning Cross-lingual Sentence Representations with Dual Momentum Contrast

Authors: Liang Wang, Wei Zhao, Jingming Liu

(Submitted on 1 Sep 2021)

Abstract: In this paper, we propose to align sentence representations from different languages into a unified embedding space, where semantic similarities (both cross-lingual and monolingual) can be computed with a simple dot product. Pre-trained language models are fine-tuned with the translation ranking task. Existing work (Feng et al., 2020) uses sentences within the same batch as negatives, which can suffer from the issue of easy negatives. We adapt MoCo (He et al., 2020) to further improve the quality of alignment. As the experimental results show, the sentence representations produced by our model achieve the new state-of-the-art on several tasks, including Tatoeba en-zh similarity search (Artetxe and Schwenk, 2019b), BUCC en-zh bitext mining, and semantic textual similarity on 7 datasets.

Comments:	Accepted to EMNLP 2021 main conference
Subjects:	Computation and Language (cs.CL); Information Retrieval (cs.IR)
Cite as:	arXiv:2109.00253 [cs.CL]
	(or arXiv:2109.00253v1 [cs.CL] for this version)

Submission history

From: Liang Wang [view email]
[v1] Wed, 1 Sep 2021 08:48:34 GMT (216kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> cs > arXiv:2109.00253

Download:

Current browse context:

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

Computer Science > Computation and Language

Title: Aligning Cross-lingual Sentence Representations with Dual Momentum Contrast

Submission history