A Self-Explainable Stylish Image Captioning Framework via Multi-References

Li, Chengxi; Harrison, Brent

Full-text links:

Download:

Current browse context:

cs.CL

< prev | next >

new | recent | 2110

Change to browse by:

Computer Science > Computation and Language

Title: A Self-Explainable Stylish Image Captioning Framework via Multi-References

Authors: Chengxi Li, Brent Harrison

(Submitted on 20 Oct 2021 (v1), last revised 18 Nov 2021 (this version, v2))

Abstract: In this paper, we propose to build a stylish image captioning model through a Multi-style Multi modality mechanism (2M). We demonstrate that with 2M, we can build an effective stylish captioner and that multi-references produced by the model can also support explaining the model through identifying erroneous input features on faulty examples. We show how this 2M mechanism can be used to build stylish captioning models and show how these models can be utilized to provide explanations of likely errors in the models.

Comments:	arXiv admin note: substantial text overlap with arXiv:2103.11186 This paper is under consideration at Computer Vision and Image Understanding
Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2110.10704 [cs.CL]
	(or arXiv:2110.10704v2 [cs.CL] for this version)

Submission history

From: Brent Harrison [view email]
[v1] Wed, 20 Oct 2021 18:00:40 GMT (10328kb,D)
[v2] Thu, 18 Nov 2021 18:39:15 GMT (10328kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> cs > arXiv:2110.10704

Download:

Current browse context:

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

Computer Science > Computation and Language

Title: A Self-Explainable Stylish Image Captioning Framework via Multi-References

Submission history