We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

cs.CV

Change to browse by:

cs

References & Citations

DBLP - CS Bibliography

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo

Computer Science > Computer Vision and Pattern Recognition

Title: Occupancy-MAE: Self-supervised Pre-training Large-scale LiDAR Point Clouds with Masked Occupancy Autoencoders

Abstract: Current perception models in autonomous driving heavily rely on large-scale labelled 3D data, which is both costly and time-consuming to annotate. This work proposes a solution to reduce the dependence on labelled 3D training data by leveraging pre-training on large-scale unlabeled outdoor LiDAR point clouds using masked autoencoders (MAE). While existing masked point autoencoding methods mainly focus on small-scale indoor point clouds or pillar-based large-scale outdoor LiDAR data, our approach introduces a new self-supervised masked occupancy pre-training method called Occupancy-MAE, specifically designed for voxel-based large-scale outdoor LiDAR point clouds. Occupancy-MAE takes advantage of the gradually sparse voxel occupancy structure of outdoor LiDAR point clouds and incorporates a range-aware random masking strategy and a pretext task of occupancy prediction. By randomly masking voxels based on their distance to the LiDAR and predicting the masked occupancy structure of the entire 3D surrounding scene, Occupancy-MAE encourages the extraction of high-level semantic information to reconstruct the masked voxel using only a small number of visible voxels. Extensive experiments demonstrate the effectiveness of Occupancy-MAE across several downstream tasks. For 3D object detection, Occupancy-MAE reduces the labelled data required for car detection on the KITTI dataset by half and improves small object detection by approximately 2% in AP on the Waymo dataset. For 3D semantic segmentation, Occupancy-MAE outperforms training from scratch by around 2% in mIoU. For multi-object tracking, Occupancy-MAE enhances training from scratch by approximately 1% in terms of AMOTA and AMOTP. Codes are publicly available at this https URL
Comments: Accepted by TIV
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Cite as: arXiv:2206.09900 [cs.CV]
  (or arXiv:2206.09900v7 [cs.CV] for this version)

Submission history

From: Chen Min [view email]
[v1] Mon, 20 Jun 2022 17:15:50 GMT (2834kb,D)
[v2] Fri, 24 Jun 2022 06:46:02 GMT (3068kb,D)
[v3] Mon, 27 Jun 2022 09:01:51 GMT (2834kb,D)
[v4] Tue, 16 Aug 2022 14:16:21 GMT (1431kb,D)
[v5] Wed, 23 Nov 2022 06:15:30 GMT (1309kb,D)
[v6] Sat, 29 Apr 2023 00:54:33 GMT (1540kb,D)
[v7] Mon, 9 Oct 2023 12:34:02 GMT (5018kb,D)

Link back to: arXiv, form interface, contact.