We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

cs.CV

Change to browse by:

cs

References & Citations

DBLP - CS Bibliography

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo

Computer Science > Computer Vision and Pattern Recognition

Title: Unsupervised Semantic Parsing of Video Collections

Abstract: Human communication typically has an underlying structure. This is reflected in the fact that in many user generated videos, a starting point, ending, and certain objective steps between these two can be identified. In this paper, we propose a method for parsing a video into such semantic steps in an unsupervised way. The proposed method is capable of providing a semantic "storyline" of the video composed of its objective steps. We accomplish this using both visual and language cues in a joint generative model. The proposed method can also provide a textual description for each of the identified semantic steps and video segments. We evaluate this method on a large number of complex YouTube videos and show results of unprecedented quality for this intricate and impactful problem.
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Cite as: arXiv:1506.08438 [cs.CV]
  (or arXiv:1506.08438v3 [cs.CV] for this version)

Submission history

From: Ozan Sener [view email]
[v1] Sun, 28 Jun 2015 19:16:38 GMT (8986kb,D)
[v2] Tue, 7 Jul 2015 23:45:17 GMT (8986kb,D)
[v3] Mon, 10 Aug 2015 23:57:10 GMT (8986kb,D)
[v4] Wed, 27 Jan 2016 12:54:15 GMT (8685kb,D)

Link back to: arXiv, form interface, contact.