Learning to Select: A Fully Attentive Approach for Novel Object Captioning

Cagrandi, Marco; Cornia, Marcella; Stefanini, Matteo; Baraldi, Lorenzo; Cucchiara, Rita

doi:10.1145/3460426.3463587

Full-text links:

Download:

Current browse context:

cs.CV

< prev | next >

new | recent | 2106

Computer Science > Computer Vision and Pattern Recognition

Title: Learning to Select: A Fully Attentive Approach for Novel Object Captioning

Authors: Marco Cagrandi, Marcella Cornia, Matteo Stefanini, Lorenzo Baraldi, Rita Cucchiara

(Submitted on 2 Jun 2021)

Abstract: Image captioning models have lately shown impressive results when applied to standard datasets. Switching to real-life scenarios, however, constitutes a challenge due to the larger variety of visual concepts which are not covered in existing training sets. For this reason, novel object captioning (NOC) has recently emerged as a paradigm to test captioning models on objects which are unseen during the training phase. In this paper, we present a novel approach for NOC that learns to select the most relevant objects of an image, regardless of their adherence to the training set, and to constrain the generative process of a language model accordingly. Our architecture is fully-attentive and end-to-end trainable, also when incorporating constraints. We perform experiments on the held-out COCO dataset, where we demonstrate improvements over the state of the art, both in terms of adaptability to novel objects and caption quality.

Comments:	ICMR 2021
Subjects:	Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL)
DOI:	10.1145/3460426.3463587
Cite as:	arXiv:2106.01424 [cs.CV]
	(or arXiv:2106.01424v1 [cs.CV] for this version)

Submission history

From: Marcella Cornia [view email]
[v1] Wed, 2 Jun 2021 19:11:21 GMT (245kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> cs > arXiv:2106.01424

Download:

Current browse context:

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

Computer Science > Computer Vision and Pattern Recognition

Title: Learning to Select: A Fully Attentive Approach for Novel Object Captioning

Submission history