Delving into the Frequency: Temporally Consistent Human Motion Transfer in the Fourier Space

Yang, Guang; Liu, Wu; Liu, Xinchen; Gu, Xiaoyan; Cao, Juan; Li, Jintao

Full-text links:

Download:

Current browse context:

cs.CV

< prev | next >

new | recent | 2209

Change to browse by:

Computer Science > Computer Vision and Pattern Recognition

Title: Delving into the Frequency: Temporally Consistent Human Motion Transfer in the Fourier Space

Authors: Guang Yang, Wu Liu, Xinchen Liu, Xiaoyan Gu, Juan Cao, Jintao Li

(Submitted on 1 Sep 2022 (v1), last revised 7 Sep 2022 (this version, v2))

Abstract: Human motion transfer refers to synthesizing photo-realistic and temporally coherent videos that enable one person to imitate the motion of others. However, current synthetic videos suffer from the temporal inconsistency in sequential frames that significantly degrades the video quality, yet is far from solved by existing methods in the pixel domain. Recently, some works on DeepFake detection try to distinguish the natural and synthetic images in the frequency domain because of the frequency insufficiency of image synthesizing methods. Nonetheless, there is no work to study the temporal inconsistency of synthetic videos from the aspects of the frequency-domain gap between natural and synthetic videos. In this paper, we propose to delve into the frequency space for temporally consistent human motion transfer. First of all, we make the first comprehensive analysis of natural and synthetic videos in the frequency domain to reveal the frequency gap in both the spatial dimension of individual frames and the temporal dimension of the video. To close the frequency gap between the natural and synthetic videos, we propose a novel Frequency-based human MOtion TRansfer framework, named FreMOTR, which can effectively mitigate the spatial artifacts and the temporal inconsistency of the synthesized videos. FreMOTR explores two novel frequency-based regularization modules: 1) the Frequency-domain Appearance Regularization (FAR) to improve the appearance of the person in individual frames and 2) Temporal Frequency Regularization (TFR) to guarantee the temporal consistency between adjacent frames. Finally, comprehensive experiments demonstrate that the FreMOTR not only yields superior performance in temporal consistency metrics but also improves the frame-level visual quality of synthetic videos. In particular, the temporal consistency metrics are improved by nearly 30% than the state-of-the-art model.

Comments:	Accepted to ACM MM 2022, 9 pages
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2209.00233 [cs.CV]
	(or arXiv:2209.00233v2 [cs.CV] for this version)

Submission history

From: Guang Yang [view email]
[v1] Thu, 1 Sep 2022 05:30:23 GMT (8323kb,D)
[v2] Wed, 7 Sep 2022 06:15:32 GMT (8324kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> cs > arXiv:2209.00233

Download:

Current browse context:

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

Computer Science > Computer Vision and Pattern Recognition

Title: Delving into the Frequency: Temporally Consistent Human Motion Transfer in the Fourier Space

Submission history