References & Citations
Computer Science > Computer Vision and Pattern Recognition
Title: Unsupervised Semantic Parsing of Video Collections
(Submitted on 28 Jun 2015 (this version), latest version 27 Jan 2016 (v4))
Abstract: Human communication typically has an underlying structure. This is reflected in the fact that in many user generated videos, a starting point, ending, and certain objective steps between these two can be identified. In this paper, we propose a method for parsing a video into such semantic steps in an unsupervised way. The proposed method is capable of providing a semantic "storyline" of the video composed of its objective steps. We accomplish this using both visual and language cues in a joint generative model. The proposed method can also provide a textual description for each of the identified semantic steps and video segments. We evaluate this method on a large number of complex YouTube videos and show results of unprecedented quality for this intricate and impactful problem.
Submission history
From: Ozan Sener [view email][v1] Sun, 28 Jun 2015 19:16:38 GMT (8986kb,D)
[v2] Tue, 7 Jul 2015 23:45:17 GMT (8986kb,D)
[v3] Mon, 10 Aug 2015 23:57:10 GMT (8986kb,D)
[v4] Wed, 27 Jan 2016 12:54:15 GMT (8685kb,D)
Link back to: arXiv, form interface, contact.