Team RUC_AIM3 Technical Report at ActivityNet 2021: Entities Object Localization

Ruan, Ludan; Chen, Jieting; Song, Yuqing; Chen, Shizhe; Jin, Qin

Full-text links:

Download:

Current browse context:

cs.CV

< prev | next >

new | recent | 2106

Change to browse by:

Computer Science > Computer Vision and Pattern Recognition

Title: Team RUC_AIM3 Technical Report at ActivityNet 2021: Entities Object Localization

Authors: Ludan Ruan (1), Jieting Chen (1), Yuqing Song (1), Shizhe Chen (2), Qin Jin (1) ((1) Renmin University of China, (2) INRIA)

(Submitted on 11 Jun 2021)

Abstract: Entities Object Localization (EOL) aims to evaluate how grounded or faithful a description is, which consists of caption generation and object grounding. Previous works tackle this problem by jointly training the two modules in a framework, which limits the complexity of each module. Therefore, in this work, we propose to divide these two modules into two stages and improve them respectively to boost the whole system performance. For the caption generation, we propose a Unified Multi-modal Pre-training Model (UMPM) to generate event descriptions with rich objects for better localization. For the object grounding, we fine-tune the state-of-the-art detection model MDETR and design a post processing method to make the grounding results more faithful. Our overall system achieves the state-of-the-art performances on both sub-tasks in Entities Object Localization challenge at Activitynet 2021, with 72.57 localization accuracy on the testing set of sub-task I and 0.2477 F1_all_per_sent on the hidden testing set of sub-task II.

Comments:	6 pages, 4 figures
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2106.06138 [cs.CV]
	(or arXiv:2106.06138v1 [cs.CV] for this version)

Submission history

From: Jieting Chen [view email]
[v1] Fri, 11 Jun 2021 02:50:25 GMT (2368kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> cs > arXiv:2106.06138

Download:

Current browse context:

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

Computer Science > Computer Vision and Pattern Recognition

Title: Team RUC_AIM3 Technical Report at ActivityNet 2021: Entities Object Localization

Submission history