We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:


Current browse context:


Change to browse by:


References & Citations

DBLP - CS Bibliography


(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo ScienceWISE logo

Computer Science > Computer Vision and Pattern Recognition

Title: Integral Migrating Pre-trained Transformer Encoder-decoders for Visual Object Detection

Abstract: Modern object detectors have taken the advantages of pre-trained vision transformers by using them as backbone networks. However, except for the backbone networks, other detector components, such as the detector head and the feature pyramid network, remain randomly initialized, which hinders the consistency between detectors and pre-trained models. In this study, we propose to integrally migrate the pre-trained transformer encoder-decoders (imTED) for object detection, constructing a feature extraction-operation path that is not only "fully pre-trained" but also consistent with pre-trained models. The essential improvements of imTED over existing transformer-based detectors are twofold: (1) it embeds the pre-trained transformer decoder to the detector head; and (2) it removes the feature pyramid network from the feature extraction path. Such improvements significantly reduce the proportion of randomly initialized parameters and enhance the generation capability of detectors. Experiments on MS COCO dataset demonstrate that imTED consistently outperforms its counterparts by ~2.8% AP. Without bells and whistles, imTED improves the state-of-the-art of few-shot object detection by up to 7.6% AP, demonstrating significantly higher generalization capability. Code will be made publicly available.
Comments: 12 pages,5 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Cite as: arXiv:2205.09613 [cs.CV]
  (or arXiv:2205.09613v1 [cs.CV] for this version)

Submission history

From: Feng Liu [view email]
[v1] Thu, 19 May 2022 15:11:20 GMT (1290kb,D)

Link back to: arXiv, form interface, contact.