Generalizing Multimodal Pre-training into Multilingual via Language Acquisition

Zhang, Liang; Hu, Anwen; Jin, Qin

Full-text links:

Download:

Current browse context:

cs.CL

< prev | next >

new | recent | 2206

Computer Science > Computation and Language

Title: Generalizing Multimodal Pre-training into Multilingual via Language Acquisition

Authors: Liang Zhang, Anwen Hu, Qin Jin

(Submitted on 29 May 2022)

Abstract: English-based Vision-Language Pre-training (VLP) has achieved great success in various downstream tasks. Some efforts have been taken to generalize this success to non-English languages through Multilingual Vision-Language Pre-training (M-VLP). However, due to the large number of languages, M-VLP models often require huge computing resources and cannot be flexibly extended to new languages. In this work, we propose a \textbf{M}ulti\textbf{L}ingual \textbf{A}cquisition (MLA) framework that can easily generalize a monolingual Vision-Language Pre-training model into multilingual. Specifically, we design a lightweight language acquisition encoder based on state-of-the-art monolingual VLP models. We further propose a two-stage training strategy to optimize the language acquisition encoder, namely the Native Language Transfer stage and the Language Exposure stage. With much less multilingual training data and computing resources, our model achieves state-of-the-art performance on multilingual image-text and video-text retrieval benchmarks.

Comments:	14 pages, 5 figures
Subjects:	Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2206.11091 [cs.CL]
	(or arXiv:2206.11091v1 [cs.CL] for this version)

Submission history

From: Liang Zhang [view email]
[v1] Sun, 29 May 2022 08:53:22 GMT (1099kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> cs > arXiv:2206.11091v1

Download:

Current browse context:

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

Computer Science > Computation and Language

Title: Generalizing Multimodal Pre-training into Multilingual via Language Acquisition

Submission history