We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:


Current browse context:


Change to browse by:


References & Citations

DBLP - CS Bibliography


(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo ScienceWISE logo

Computer Science > Computation and Language

Title: VT-SSum: A Benchmark Dataset for Video Transcript Segmentation and Summarization

Abstract: Video transcript summarization is a fundamental task for video understanding. Conventional approaches for transcript summarization are usually built upon the summarization data for written language such as news articles, while the domain discrepancy may degrade the model performance on spoken text. In this paper, we present VT-SSum, a benchmark dataset with spoken language for video transcript segmentation and summarization, which includes 125K transcript-summary pairs from 9,616 videos. VT-SSum takes advantage of the videos from VideoLectures.NET by leveraging the slides content as the weak supervision to generate the extractive summary for video transcripts. Experiments with a state-of-the-art deep learning approach show that the model trained with VT-SSum brings a significant improvement on the AMI spoken text summarization benchmark. VT-SSum is publicly available at this https URL to support the future research of video transcript segmentation and summarization tasks.
Comments: Work in progress
Subjects: Computation and Language (cs.CL)
Cite as: arXiv:2106.05606 [cs.CL]
  (or arXiv:2106.05606v2 [cs.CL] for this version)

Submission history

From: Lei Cui [view email]
[v1] Thu, 10 Jun 2021 09:19:58 GMT (5785kb,D)
[v2] Thu, 15 Jul 2021 06:13:31 GMT (7090kb,D)

Link back to: arXiv, form interface, contact.