We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

cs.CV

Change to browse by:

cs

References & Citations

DBLP - CS Bibliography

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo

Computer Science > Computer Vision and Pattern Recognition

Title: Transformer-based Dual Relation Graph for Multi-label Image Recognition

Abstract: The simultaneous recognition of multiple objects in one image remains a challenging task, spanning multiple events in the recognition field such as various object scales, inconsistent appearances, and confused inter-class relationships. Recent research efforts mainly resort to the statistic label co-occurrences and linguistic word embedding to enhance the unclear semantics. Different from these researches, in this paper, we propose a novel Transformer-based Dual Relation learning framework, constructing complementary relationships by exploring two aspects of correlation,~\ie, structural relation graph and semantic relation graph. The structural relation graph aims to capture long-range correlations from object context, by developing a cross-scale transformer-based architecture. The semantic graph dynamically models the semantic meanings of image objects with explicit semantic-aware constraints. In addition, we also incorporate the learnt structural relationship into the semantic graph, constructing a joint relation graph for robust representations. With the collaborative learning of these two effective relation graphs, our approach achieves new state-of-the-art on two popular multi-label recognition benchmarks, i.e., MS-COCO and VOC 2007 dataset.
Comments: 10 pages, 5 figures. Published in ICCV 2021
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Journal reference: In Proceedings of the IEEE/CVF International Conference on Computer Vision 2021 (pp. 163-172)
Cite as: arXiv:2110.04722 [cs.CV]
  (or arXiv:2110.04722v1 [cs.CV] for this version)

Submission history

From: Yifan Zhao [view email]
[v1] Sun, 10 Oct 2021 07:14:52 GMT (12580kb,D)
[v2] Tue, 12 Oct 2021 02:09:17 GMT (12580kb,D)

Link back to: arXiv, form interface, contact.