Current browse context:
cs.CV
Change to browse by:
References & Citations
Computer Science > Computer Vision and Pattern Recognition
Title: HindSight: A Graph-Based Vision Model Architecture For Representing Part-Whole Hierarchies
(Submitted on 8 Apr 2021)
Abstract: This paper presents a model architecture for encoding the representations of part-whole hierarchies in images in form of a graph. The idea is to divide the image into patches of different levels and then treat all of these patches as nodes for a fully connected graph. A dynamic feature extraction module is used to extract feature representations from these patches in each graph iteration. This enables us to learn a rich graph representation of the image that encompasses the inherent part-whole hierarchical information. Utilizing proper self-supervised training techniques, such a model can be trained as a general purpose vision encoder model which can then be used for various vision related downstream tasks (e.g., Image Classification, Object Detection, Image Captioning, etc.).
Submission history
From: Muhammad AbdurRafae [view email][v1] Thu, 8 Apr 2021 12:17:54 GMT (2511kb,D)
Link back to: arXiv, form interface, contact.