Team RUC_AIM3 Technical Report at Activitynet 2020 Task 2: Exploring Sequential Events Detection for Dense Video Captioning

Song, Yuqing; Chen, Shizhe; Zhao, Yida; Jin, Qin

Full-text links:

Download:

Current browse context:

cs.CV

< prev | next >

new | recent | 2006

Change to browse by:

Computer Science > Computer Vision and Pattern Recognition

Title: Team RUC_AIM3 Technical Report at Activitynet 2020 Task 2: Exploring Sequential Events Detection for Dense Video Captioning

Authors: Yuqing Song, Shizhe Chen, Yida Zhao, Qin Jin

(Submitted on 14 Jun 2020)

Abstract: Detecting meaningful events in an untrimmed video is essential for dense video captioning. In this work, we propose a novel and simple model for event sequence generation and explore temporal relationships of the event sequence in the video. The proposed model omits inefficient two-stage proposal generation and directly generates event boundaries conditioned on bi-directional temporal dependency in one pass. Experimental results show that the proposed event sequence generation model can generate more accurate and diverse events within a small number of proposals. For the event captioning, we follow our previous work to employ the intra-event captioning models into our pipeline system. The overall system achieves state-of-the-art performance on the dense-captioning events in video task with 9.894 METEOR score on the challenge testing set.

Comments:	Winner solution in CVPR 2020 Activitynet Dense Video Captioning challenge
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2006.07896 [cs.CV]
	(or arXiv:2006.07896v1 [cs.CV] for this version)

Submission history

From: Yuqing Song [view email]
[v1] Sun, 14 Jun 2020 13:21:37 GMT (2529kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> cs > arXiv:2006.07896

Download:

Current browse context:

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

Computer Science > Computer Vision and Pattern Recognition

Title: Team RUC_AIM3 Technical Report at Activitynet 2020 Task 2: Exploring Sequential Events Detection for Dense Video Captioning

Submission history