KE-RCNN: Unifying Knowledge based Reasoning into Part-level Attribute Parsing

Wang, Xuanhan; Song, Jingkuan; Chen, Xiaojia; Cheng, Lechao; Gao, Lianli; Shen, Heng Tao

Full-text links:

Download:

Current browse context:

cs.CV

< prev | next >

new | recent | 2206

Change to browse by:

Computer Science > Computer Vision and Pattern Recognition

Title: KE-RCNN: Unifying Knowledge based Reasoning into Part-level Attribute Parsing

Authors: Xuanhan Wang, Jingkuan Song, Xiaojia Chen, Lechao Cheng, Lianli Gao, Heng Tao Shen

(Submitted on 21 Jun 2022)

Abstract: Part-level attribute parsing is a fundamental but challenging task, which requires the region-level visual understanding to provide explainable details of body parts. Most existing approaches address this problem by adding a regional convolutional neural network (RCNN) with an attribute prediction head to a two-stage detector, in which attributes of body parts are identified from local-wise part boxes. However, local-wise part boxes with limit visual clues (i.e., part appearance only) lead to unsatisfying parsing results, since attributes of body parts are highly dependent on comprehensive relations among them. In this article, we propose a Knowledge Embedded RCNN (KE-RCNN) to identify attributes by leveraging rich knowledges, including implicit knowledge (e.g., the attribute ``above-the-hip'' for a shirt requires visual/geometry relations of shirt-hip) and explicit knowledge (e.g., the part of ``shorts'' cannot have the attribute of ``hoodie'' or ``lining''). Specifically, the KE-RCNN consists of two novel components, i.e., Implicit Knowledge based Encoder (IK-En) and Explicit Knowledge based Decoder (EK-De). The former is designed to enhance part-level representation by encoding part-part relational contexts into part boxes, and the latter one is proposed to decode attributes with a guidance of prior knowledge about \textit{part-attribute} relations. In this way, the KE-RCNN is plug-and-play, which can be integrated into any two-stage detectors, e.g., Attribute-RCNN, Cascade-RCNN, HRNet based RCNN and SwinTransformer based RCNN. Extensive experiments conducted on two challenging benchmarks, e.g., Fashionpedia and Kinetics-TPS, demonstrate the effectiveness and generalizability of the KE-RCNN. In particular, it achieves higher improvements over all existing methods, reaching around 3% of AP on Fashionpedia and around 4% of Acc on Kinetics-TPS.

Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2206.10146 [cs.CV]
	(or arXiv:2206.10146v1 [cs.CV] for this version)

Submission history

From: Xuanhan Wang [view email]
[v1] Tue, 21 Jun 2022 07:05:14 GMT (2030kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> cs > arXiv:2206.10146

Download:

Current browse context:

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

Computer Science > Computer Vision and Pattern Recognition

Title: KE-RCNN: Unifying Knowledge based Reasoning into Part-level Attribute Parsing

Submission history