References & Citations
Computer Science > Computer Vision and Pattern Recognition
Title: Geometric Scene Parsing with Hierarchical LSTM
(Submitted on 7 Apr 2016 (v1), last revised 8 Apr 2016 (this version, v2))
Abstract: This paper addresses the problem of geometric scene parsing, i.e. simultaneously labeling geometric surfaces (e.g. sky, ground and vertical plane) and determining the interaction relations (e.g. layering, supporting, siding and affinity) between main regions. This problem is more challenging than the traditional semantic scene labeling, as recovering geometric structures necessarily requires the rich and diverse contextual information. To achieve these goals, we propose a novel recurrent neural network model, named Hierarchical Long Short-Term Memory (H-LSTM). It contains two coupled sub-networks: the Pixel LSTM (P-LSTM) and the Multi-scale Super-pixel LSTM (MS-LSTM) for handling the surface labeling and relation prediction, respectively. The two sub-networks provide complementary information to each other to exploit hierarchical scene contexts, and they are jointly optimized for boosting the performance. Our extensive experiments show that our model is capable of parsing scene geometric structures and outperforming several state-of-the-art methods by large margins. In addition, we show promising 3D reconstruction results from the still images based on the geometric parsing.
Submission history
From: Zhanglin Peng [view email][v1] Thu, 7 Apr 2016 09:20:51 GMT (5135kb)
[v2] Fri, 8 Apr 2016 06:21:57 GMT (5135kb)
Link back to: arXiv, form interface, contact.