MetaVL: Transferring In-Context Learning Ability From Language Models to Vision-Language Models

Monajatipoor, Masoud; Li, Liunian Harold; Rouhsedaghat, Mozhdeh; Yang, Lin F.; Chang, Kai-Wei

Full-text links:

Download:

Current browse context:

cs.CL

< prev | next >

new | recent | 2306

Change to browse by:

Computer Science > Computation and Language

Title: MetaVL: Transferring In-Context Learning Ability From Language Models to Vision-Language Models

Authors: Masoud Monajatipoor, Liunian Harold Li, Mozhdeh Rouhsedaghat, Lin F. Yang, Kai-Wei Chang

(Submitted on 2 Jun 2023)

Abstract: Large-scale language models have shown the ability to adapt to a new task via conditioning on a few demonstrations (i.e., in-context learning). However, in the vision-language domain, most large-scale pre-trained vision-language (VL) models do not possess the ability to conduct in-context learning. How can we enable in-context learning for VL models? In this paper, we study an interesting hypothesis: can we transfer the in-context learning ability from the language domain to VL domain? Specifically, we first meta-trains a language model to perform in-context learning on NLP tasks (as in MetaICL); then we transfer this model to perform VL tasks by attaching a visual encoder. Our experiments suggest that indeed in-context learning ability can be transferred cross modalities: our model considerably improves the in-context learning capability on VL tasks and can even compensate for the size of the model significantly. On VQA, OK-VQA, and GQA, our method could outperform the baseline model while having 20 times fewer parameters.

Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2306.01311 [cs.CL]
	(or arXiv:2306.01311v1 [cs.CL] for this version)

Submission history

From: Masoud Monajatipoor [view email]
[v1] Fri, 2 Jun 2023 07:21:03 GMT (12775kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> cs > arXiv:2306.01311

Download:

Current browse context:

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

Computer Science > Computation and Language

Title: MetaVL: Transferring In-Context Learning Ability From Language Models to Vision-Language Models

Submission history