We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

cs.CV

Change to browse by:

cs

References & Citations

DBLP - CS Bibliography

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo

Computer Science > Computer Vision and Pattern Recognition

Title: RGBD Semantic Segmentation Using Spatio-Temporal Data-Driven Pooling

Abstract: Beyond the success in classification, neural networks have recently shown strong results on pixel-wise prediction tasks like image semantic segmentation on RGBD data. However, the commonly used deconvolutional layers for upsampling intermediate representations to the full-resolution output still shows different failure modes, like imprecise segmentation boundaries and label mistakes particular on large, weakly textured objects (e.g. fridge, whiteboard, door). We attribute these errors in part to the rigid way, current network aggregate information, that can be either too local (missing context) or too global (inaccurate boundaries). Therefore we propose a data-driven pooling layer that integrates with fully convolutional architectures and utilizes boundary detection from RGBD image segmentation approaches. We extend our approach to leverage region-level correspondence across images with an additional temporal pooling stage. We evaluate our approach on the NYU-Depth-V2 dataset comprised of indoor RGBD video sequences and make comparison with respect to various state-of-the-art baselines. We improve on the state-of-the-art and in particular in accuracy of the predicted boundaries and previously problematic classes.
Comments: 16 pages, 6 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Cite as: arXiv:1604.02388 [cs.CV]
  (or arXiv:1604.02388v1 [cs.CV] for this version)

Submission history

From: Yang He [view email]
[v1] Fri, 8 Apr 2016 16:01:34 GMT (1377kb,D)
[v2] Thu, 9 Jun 2016 19:52:02 GMT (1373kb,D)
[v3] Wed, 26 Apr 2017 13:13:02 GMT (9173kb,D)

Link back to: arXiv, form interface, contact.