M$^3$CS: Multi-Target Masked Point Modeling with Learnable Codebook and Siamese Decoders

Qiu, Qibo; Yang, Honghui; Wang, Wenxiao; Zhang, Shun; Gao, Haiming; Ying, Haochao; Hua, Wei; He, Xiaofei

Computer Science > Computer Vision and Pattern Recognition

arXiv:2309.13235 (cs)

[Submitted on 23 Sep 2023]

Title:M$^3$CS: Multi-Target Masked Point Modeling with Learnable Codebook and Siamese Decoders

Authors:Qibo Qiu, Honghui Yang, Wenxiao Wang, Shun Zhang, Haiming Gao, Haochao Ying, Wei Hua, Xiaofei He

View PDF

Abstract:Masked point modeling has become a promising scheme of self-supervised pre-training for point clouds. Existing methods reconstruct either the original points or related features as the objective of pre-training. However, considering the diversity of downstream tasks, it is necessary for the model to have both low- and high-level representation modeling capabilities to capture geometric details and semantic contexts during pre-training. To this end, M$^3$CS is proposed to enable the model with the above abilities. Specifically, with masked point cloud as input, M$^3$CS introduces two decoders to predict masked representations and the original points simultaneously. While an extra decoder doubles parameters for the decoding process and may lead to overfitting, we propose siamese decoders to keep the amount of learnable parameters unchanged. Further, we propose an online codebook projecting continuous tokens into discrete ones before reconstructing masked points. In such way, we can enforce the decoder to take effect through the combinations of tokens rather than remembering each token. Comprehensive experiments show that M$^3$CS achieves superior performance at both classification and segmentation tasks, outperforming existing methods.

Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2309.13235 [cs.CV]
	(or arXiv:2309.13235v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2309.13235

Submission history

From: Qibo Qiu [view email]
[v1] Sat, 23 Sep 2023 02:19:21 UTC (1,775 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:M$^3$CS: Multi-Target Masked Point Modeling with Learnable Codebook and Siamese Decoders

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:M$^3$CS: Multi-Target Masked Point Modeling with Learnable Codebook and Siamese Decoders

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators