We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

cs.CV

Change to browse by:

cs

References & Citations

DBLP - CS Bibliography

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo ScienceWISE logo

Computer Science > Computer Vision and Pattern Recognition

Title: 3M: Multi-style image caption generation using Multi-modality features under Multi-UPDOWN model

Abstract: In this paper, we build a multi-style generative model for stylish image captioning which uses multi-modality image features, ResNeXt features and text features generated by DenseCap. We propose the 3M model, a Multi-UPDOWN caption model that encodes multi-modality features and decode them to captions. We demonstrate the effectiveness of our model on generating human-like captions by examining its performance on two datasets, the PERSONALITY-CAPTIONS dataset and the FlickrStyle10K dataset. We compare against a variety of state-of-the-art baselines on various automatic NLP metrics such as BLEU, ROUGE-L, CIDEr, SPICE, etc. A qualitative study has also been done to verify our 3M model can be used for generating different stylized captions.
Comments: To be published at FLAIRS-34
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Cite as: arXiv:2103.11186 [cs.CV]
  (or arXiv:2103.11186v1 [cs.CV] for this version)

Submission history

From: Brent Harrison [view email]
[v1] Sat, 20 Mar 2021 14:12:13 GMT (8730kb,D)

Link back to: arXiv, form interface, contact.