Generating Long Videos of Dynamic Scenes

Brooks, Tim; Hellsten, Janne; Aittala, Miika; Wang, Ting-Chun; Aila, Timo; Lehtinen, Jaakko; Liu, Ming-Yu; Efros, Alexei A.; Karras, Tero

Full-text links:

Download:

Current browse context:

cs.CV

< prev | next >

new | recent | 2206

Computer Science > Computer Vision and Pattern Recognition

Title: Generating Long Videos of Dynamic Scenes

Authors: Tim Brooks, Janne Hellsten, Miika Aittala, Ting-Chun Wang, Timo Aila, Jaakko Lehtinen, Ming-Yu Liu, Alexei A. Efros, Tero Karras

(Submitted on 7 Jun 2022 (v1), last revised 9 Jun 2022 (this version, v2))

Abstract: We present a video generation model that accurately reproduces object motion, changes in camera viewpoint, and new content that arises over time. Existing video generation methods often fail to produce new content as a function of time while maintaining consistencies expected in real environments, such as plausible dynamics and object persistence. A common failure case is for content to never change due to over-reliance on inductive biases to provide temporal consistency, such as a single latent code that dictates content for the entire video. On the other extreme, without long-term consistency, generated videos may morph unrealistically between different scenes. To address these limitations, we prioritize the time axis by redesigning the temporal latent representation and learning long-term consistency from data by training on longer videos. To this end, we leverage a two-phase training strategy, where we separately train using longer videos at a low resolution and shorter videos at a high resolution. To evaluate the capabilities of our model, we introduce two new benchmark datasets with explicit focus on long-term temporal dynamics.

Subjects:	Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
Cite as:	arXiv:2206.03429 [cs.CV]
	(or arXiv:2206.03429v2 [cs.CV] for this version)

Submission history

From: Tim Brooks [view email]
[v1] Tue, 7 Jun 2022 16:29:51 GMT (10718kb,D)
[v2] Thu, 9 Jun 2022 06:24:12 GMT (10709kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> cs > arXiv:2206.03429

Download:

Current browse context:

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

Computer Science > Computer Vision and Pattern Recognition

Title: Generating Long Videos of Dynamic Scenes

Submission history