ScanRefer: 3D Object Localization in RGB-D Scans using Natural Language

Chen, Dave Zhenyu; Chang, Angel X.; Nießner, Matthias

Full-text links:

Download:

Current browse context:

cs.CV

< prev | next >

new | recent | 1912

Computer Science > Computer Vision and Pattern Recognition

Title: ScanRefer: 3D Object Localization in RGB-D Scans using Natural Language

Authors: Dave Zhenyu Chen, Angel X. Chang, Matthias Nießner

(Submitted on 18 Dec 2019 (v1), last revised 11 Nov 2020 (this version, v3))

Abstract: We introduce the task of 3D object localization in RGB-D scans using natural language descriptions. As input, we assume a point cloud of a scanned 3D scene along with a free-form description of a specified target object. To address this task, we propose ScanRefer, learning a fused descriptor from 3D object proposals and encoded sentence embeddings. This fused descriptor correlates language expressions with geometric features, enabling regression of the 3D bounding box of a target object. We also introduce the ScanRefer dataset, containing 51,583 descriptions of 11,046 objects from 800 ScanNet scenes. ScanRefer is the first large-scale effort to perform object localization via natural language expression directly in 3D.

Comments:	Project page: this https URL
Subjects:	Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL); Machine Learning (cs.LG); Image and Video Processing (eess.IV)
Cite as:	arXiv:1912.08830 [cs.CV]
	(or arXiv:1912.08830v3 [cs.CV] for this version)

Submission history

From: Dave Zhenyu Chen [view email]
[v1] Wed, 18 Dec 2019 19:00:49 GMT (5926kb,D)
[v2] Tue, 21 Jul 2020 21:41:53 GMT (7681kb,D)
[v3] Wed, 11 Nov 2020 09:33:31 GMT (7682kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> cs > arXiv:1912.08830

Download:

Current browse context:

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

Computer Science > Computer Vision and Pattern Recognition

Title: ScanRefer: 3D Object Localization in RGB-D Scans using Natural Language

Submission history