Progressive Video Summarization via Multimodal Self-supervised Learning

Haopeng, Li; Qiuhong, Ke; Mingming, Gong; Drummond, Tom

Full-text links:

Download:

Current browse context:

cs.CV

< prev | next >

new | recent | 2201

Computer Science > Computer Vision and Pattern Recognition

Title: Progressive Video Summarization via Multimodal Self-supervised Learning

Authors: Li Haopeng, Ke Qiuhong, Gong Mingming, Tom Drummond

(Submitted on 7 Jan 2022 (v1), last revised 19 Oct 2022 (this version, v4))

Abstract: Modern video summarization methods are based on deep neural networks that require a large amount of annotated data for training. However, existing datasets for video summarization are small-scale, easily leading to over-fitting of the deep models. Considering that the annotation of large-scale datasets is time-consuming, we propose a multimodal self-supervised learning framework to obtain semantic representations of videos, which benefits the video summarization task. Specifically, the self-supervised learning is conducted by exploring the semantic consistency between the videos and text in both coarse-grained and fine-grained fashions, as well as recovering masked frames in the videos. The multimodal framework is trained on a newly-collected dataset that consists of video-text pairs. Additionally, we introduce a progressive video summarization method, where the important content in a video is pinpointed progressively to generate better summaries. Extensive experiments have proved the effectiveness and superiority of our method in rank correlation coefficients and F-score.

Subjects:	Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL)
Cite as:	arXiv:2201.02494 [cs.CV]
	(or arXiv:2201.02494v4 [cs.CV] for this version)

Submission history

From: Haopeng Li [view email]
[v1] Fri, 7 Jan 2022 15:21:46 GMT (696kb,D)
[v2] Mon, 10 Jan 2022 02:23:06 GMT (695kb,D)
[v3] Tue, 13 Sep 2022 04:48:28 GMT (695kb,D)
[v4] Wed, 19 Oct 2022 05:01:13 GMT (6363kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> cs > arXiv:2201.02494

Download:

Current browse context:

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

Computer Science > Computer Vision and Pattern Recognition

Title: Progressive Video Summarization via Multimodal Self-supervised Learning

Submission history