Video Autoencoder: self-supervised disentanglement of static 3D structure and motion

Lai, Zihang; Liu, Sifei; Efros, Alexei A.; Wang, Xiaolong

Full-text links:

Download:

Current browse context:

cs.CV

< prev | next >

new | recent | 2110

Computer Science > Computer Vision and Pattern Recognition

Title: Video Autoencoder: self-supervised disentanglement of static 3D structure and motion

Authors: Zihang Lai, Sifei Liu, Alexei A. Efros, Xiaolong Wang

(Submitted on 6 Oct 2021)

Abstract: A video autoencoder is proposed for learning disentan- gled representations of 3D structure and camera pose from videos in a self-supervised manner. Relying on temporal continuity in videos, our work assumes that the 3D scene structure in nearby video frames remains static. Given a sequence of video frames as input, the video autoencoder extracts a disentangled representation of the scene includ- ing: (i) a temporally-consistent deep voxel feature to represent the 3D structure and (ii) a 3D trajectory of camera pose for each frame. These two representations will then be re-entangled for rendering the input video frames. This video autoencoder can be trained directly using a pixel reconstruction loss, without any ground truth 3D or camera pose annotations. The disentangled representation can be applied to a range of tasks, including novel view synthesis, camera pose estimation, and video generation by motion following. We evaluate our method on several large- scale natural video datasets, and show generalization results on out-of-domain images.

Comments:	Accepted to ICCV 2021. Project page: this https URL
Subjects:	Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Cite as:	arXiv:2110.02951 [cs.CV]
	(or arXiv:2110.02951v1 [cs.CV] for this version)

Submission history

From: Zihang Lai [view email]
[v1] Wed, 6 Oct 2021 17:57:42 GMT (3957kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> cs > arXiv:2110.02951

Download:

Current browse context:

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

Computer Science > Computer Vision and Pattern Recognition

Title: Video Autoencoder: self-supervised disentanglement of static 3D structure and motion

Submission history