Learning Sequential Contexts using Transformer for 3D Hand Pose Estimation

Khaleghi, Leyla; Marshall, Joshua; Etemad, Ali

Full-text links:

Download:

Current browse context:

cs.CV

< prev | next >

new | recent | 2206

Change to browse by:

Computer Science > Computer Vision and Pattern Recognition

Title: Learning Sequential Contexts using Transformer for 3D Hand Pose Estimation

Authors: Leyla Khaleghi, Joshua Marshall, Ali Etemad

(Submitted on 1 Jun 2022 (v1), last revised 11 Jun 2022 (this version, v2))

Abstract: 3D hand pose estimation (HPE) is the process of locating the joints of the hand in 3D from any visual input. HPE has recently received an increased amount of attention due to its key role in a variety of human-computer interaction applications. Recent HPE methods have demonstrated the advantages of employing videos or multi-view images, allowing for more robust HPE systems. Accordingly, in this study, we propose a new method to perform Sequential learning with Transformer for Hand Pose (SeTHPose) estimation. Our SeTHPose pipeline begins by extracting visual embeddings from individual hand images. We then use a transformer encoder to learn the sequential context along time or viewing angles and generate accurate 2D hand joint locations. Then, a graph convolutional neural network with a U-Net configuration is used to convert the 2D hand joint locations to 3D poses. Our experiments show that SeTHPose performs well on both hand sequence varieties, temporal and angular. Also, SeTHPose outperforms other methods in the field to achieve new state-of-the-art results on two public available sequential datasets, STB and MuViHand.

Comments:	Accepted to ICPR'22
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2206.00171 [cs.CV]
	(or arXiv:2206.00171v2 [cs.CV] for this version)

Submission history

From: Leyla Khaleghi [view email]
[v1] Wed, 1 Jun 2022 01:22:29 GMT (971kb,D)
[v2] Sat, 11 Jun 2022 20:46:14 GMT (971kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> cs > arXiv:2206.00171

Download:

Current browse context:

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

Computer Science > Computer Vision and Pattern Recognition

Title: Learning Sequential Contexts using Transformer for 3D Hand Pose Estimation

Submission history