Effective Fusion of Deep Multitasking Representations for Robust Visual Tracking

Marvasti-Zadeh, Seyed Mojtaba; Ghanei-Yakhdan, Hossein; Kasaei, Shohreh; Nasrollahi, Kamal; Moeslund, Thomas B.

Full-text links:

Download:

Current browse context:

cs.CV

< prev | next >

new | recent | 2004

Computer Science > Computer Vision and Pattern Recognition

Title: Effective Fusion of Deep Multitasking Representations for Robust Visual Tracking

Authors: Seyed Mojtaba Marvasti-Zadeh, Hossein Ghanei-Yakhdan, Shohreh Kasaei, Kamal Nasrollahi, Thomas B. Moeslund

(Submitted on 3 Apr 2020 (v1), last revised 20 Sep 2021 (this version, v2))

Abstract: Visual object tracking remains an active research field in computer vision due to persisting challenges with various problem-specific factors in real-world scenes. Many existing tracking methods based on discriminative correlation filters (DCFs) employ feature extraction networks (FENs) to model the target appearance during the learning process. However, using deep feature maps extracted from FENs based on different residual neural networks (ResNets) has not previously been investigated. This paper aims to evaluate the performance of twelve state-of-the-art ResNet-based FENs in a DCF-based framework to determine the best for visual tracking purposes. First, it ranks their best feature maps and explores the generalized adoption of the best ResNet-based FEN into another DCF-based method. Then, the proposed method extracts deep semantic information from a fully convolutional FEN and fuses it with the best ResNet-based feature maps to strengthen the target representation in the learning process of continuous convolution filters. Finally, it introduces a new and efficient semantic weighting method (using semantic segmentation feature maps on each video frame) to reduce the drift problem. Extensive experimental results on the well-known OTB-2013, OTB-2015, TC-128 and VOT-2018 visual tracking datasets demonstrate that the proposed method effectively outperforms state-of-the-art methods in terms of precision and robustness of visual tracking.

Comments:	To be appeared in The Visual Computer (International Journal of Computer Graphics), Springer, 2021
Subjects:	Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Image and Video Processing (eess.IV)
Cite as:	arXiv:2004.01382 [cs.CV]
	(or arXiv:2004.01382v2 [cs.CV] for this version)

Submission history

From: Seyed Mojtaba Marvasti-Zadeh [view email]
[v1] Fri, 3 Apr 2020 05:33:59 GMT (6191kb,D)
[v2] Mon, 20 Sep 2021 09:24:50 GMT (6191kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> cs > arXiv:2004.01382

Download:

Current browse context:

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

Computer Science > Computer Vision and Pattern Recognition

Title: Effective Fusion of Deep Multitasking Representations for Robust Visual Tracking

Submission history