References & Citations
Computer Science > Computer Vision and Pattern Recognition
Title: LASOR: Learning Accurate 3D Human Pose and Shape Via Synthetic Occlusion-Aware Data and Neural Mesh Rendering
(Submitted on 1 Aug 2021 (v1), last revised 29 Jan 2022 (this version, v5))
Abstract: A key challenge in the task of human pose and shape estimation is occlusion, including self-occlusions, object-human occlusions, and inter-person occlusions. The lack of diverse and accurate pose and shape training data becomes a major bottleneck, especially for scenes with occlusions in the wild. In this paper, we focus on the estimation of human pose and shape in the case of inter-person occlusions, while also handling object-human occlusions and self-occlusion. We propose a novel framework that synthesizes occlusion-aware silhouette and 2D keypoints data and directly regress to the SMPL pose and shape parameters. A neural 3D mesh renderer is exploited to enable silhouette supervision on the fly, which contributes to great improvements in shape estimation. In addition, keypoints-and-silhouette-driven training data in panoramic viewpoints are synthesized to compensate for the lack of viewpoint diversity in any existing dataset. Experimental results show that we are among the state-of-the-art on the 3DPW and 3DPW-Crowd datasets in terms of pose estimation accuracy. The proposed method evidently outperforms Mesh Transformer, 3DCrowdNet and ROMP in terms of shape estimation. Top performance is also achieved on SSP-3D in terms of shape prediction accuracy. Demo and code will be available at this https URL
Submission history
From: Kaibing Yang [view email][v1] Sun, 1 Aug 2021 02:09:16 GMT (27072kb,D)
[v2] Sun, 12 Dec 2021 13:18:59 GMT (42331kb,D)
[v3] Wed, 15 Dec 2021 09:29:57 GMT (42331kb,D)
[v4] Sun, 2 Jan 2022 05:18:50 GMT (42331kb,D)
[v5] Sat, 29 Jan 2022 12:36:13 GMT (41245kb,D)
Link back to: arXiv, form interface, contact.