References & Citations
Computer Science > Computer Vision and Pattern Recognition
Title: IRGen: Generative Modeling for Image Retrieval
(Submitted on 17 Mar 2023 (v1), last revised 27 Mar 2023 (this version, v2))
Abstract: While generative modeling has been ubiquitous in natural language processing and computer vision, its application to image retrieval remains unexplored. In this paper, we recast image retrieval as a form of generative modeling by employing a sequence-to-sequence model, contributing to the current unified theme. Our framework, IRGen, is a unified model that enables end-to-end differentiable search, thus achieving superior performance thanks to direct optimization. While developing IRGen we tackle the key technical challenge of converting an image into quite a short sequence of semantic units in order to enable efficient and effective retrieval. Empirical experiments demonstrate that our model yields significant improvement over three commonly used benchmarks, for example, 22.9\% higher than the best baseline method in precision@10 on In-shop dataset with comparable recall@10 score.
Submission history
From: Ting Zhang [view email][v1] Fri, 17 Mar 2023 17:07:36 GMT (8928kb,D)
[v2] Mon, 27 Mar 2023 02:21:31 GMT (8928kb,D)
Link back to: arXiv, form interface, contact.