A New Perspective of Weakly Supervised 3D Instance Segmentation via Bounding Boxes

Yu, Qingtao; Du, Heming; Yu, Xin

doi:10.1007/978-981-99-8388-9_9

Qingtao Yu^11,12,
Heming Du^11,12 &
Xin Yu¹¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 14471))

Included in the following conference series:

Australasian Joint Conference on Artificial Intelligence

531 Accesses

Abstract

Existing fully supervised method 3D point cloud segmentation methods heavily rely on carefully annotated point labels. In this work, we look at weakly-supervised 3D instance segmentation using bounding boxes supervision. Bounding boxes are much easier to annotate than dense point-wise labels. Moreover, they demonstrated high potential in addressing instance-level segmentation compared to other types of weak annotations. However, existing bounding-box supervised techniques have struggled to keep pace with the development of fully-supervised methods. To tackle this issue, we propose a simple-yet-effective approach to directly leverage the network architecture of fully-supervised methods for such weak supervision scenarios. We found that accurate instance labels for each point can be generated with the given bounding boxes by leveraging 3D geometric prior. Such a process is efficient and does not require any additional training or fine-tuning. The generated point-wise labels can be fed to any advanced fully-supervised model without re-designing specific networks for bounding-box supervision. In this fashion, our designed approach achieves on par performance of fully supervised methods in terms of AP, AP50 and AP25. Remarkably, we outperformed the state-of-the-art bounding-box supervised method by 21%. Compared with existing methods, our method is extremely simple and only involves two small heuristics in the data preprocessing step. In addition, our method is proven to be robust against noisy bounding box scenario through experiments.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 69.99; Price excludes VAT (USA)

Softcover Book: USD 89.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12346, pp. 213–229. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58452-8_13
Chapter Google Scholar
Chen, S., Fang, J., Zhang, Q., Liu, W., Wang, X.: Hierarchical aggregation for 3D instance segmentation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 15467–15476 (2021)
Google Scholar
Cheng, B., Choudhuri, A., Misra, I., Kirillov, A., Girdhar, R., Schwing, A.G.: Mask2Former for video instance segmentation. arXiv preprint arXiv:2112.10764 (2021)
Chibane, J., Engelmann, F., Anh Tran, T., Pons-Moll, G.: Box2Mask: weakly supervised 3D semantic instance segmentation using bounding boxes. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) ECCV 2022. LNCS, vol. 13691, pp. 681–699. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-19821-2_39
Chapter Google Scholar
Dai, A., Chang, A.X., Savva, M., Halber, M., Funkhouser, T., Nießner, M.: ScanNet: richly-annotated 3D reconstructions of indoor scenes. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5828–5839 (2017)
Google Scholar
Dai, A., Nießner, M., Zollhöfer, M., Izadi, S., Theobalt, C.: BundleFusion: real-time globally consistent 3D reconstruction using on-the-fly surface reintegration. ACM Trans. Graph. (ToG) 36(4), 1 (2017)
Article Google Scholar
Du, H., Yu, X., Hussain, F., Armin, M.A., Petersson, L., Li, W.: Weakly-supervised point cloud instance segmentation with geometric priors. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 4271–4280 (2023)
Google Scholar
Engelmann, F., Bokeloh, M., Fathi, A., Leibe, B., Nießner, M.: 3D-MPA: multi-proposal aggregation for 3D semantic instance segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9031–9040 (2020)
Google Scholar
Han, C., Yu, X., Gao, C., Sang, N., Yang, Y.: Single image based 3D human pose estimation via uncertainty learning. Pattern Recogn. 132, 108934 (2022)
Article Google Scholar
Hou, J., Dai, A., Nießner, M.: 3D-SIS: 3D semantic instance segmentation of RGB-D scans. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4421–4430 (2019)
Google Scholar
Hou, J., Graham, B., Nießner, M., Xie, S.: Exploring data-efficient 3D scene understanding with contrastive scene contexts. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 15587–15597 (2021)
Google Scholar
Jiang, L., Zhao, H., Shi, S., Liu, S., Fu, C.W., Jia, J.: PointGroup: dual-set point grouping for 3D instance segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2020)
Google Scholar
Kulharia, V., Chandra, S., Agrawal, A., Torr, P., Tyagi, A.: Box2Seg: attention weighted loss and discriminative feature learning for weakly supervised segmentation. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12372, pp. 290–308. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58583-9_18
Chapter Google Scholar
Landrieu, L., Boussaha, M.: Point cloud over segmentation with graph-structured deep metric learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7440–7449 (2019)
Google Scholar
Liang, Z., Li, Z., Xu, S., Tan, M., Jia, K.: Instance segmentation in 3D scenes using semantic superpoint tree networks. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 2783–2792 (2021)
Google Scholar
Liao, Y., Zhu, H., Zhang, Y., Ye, C., Chen, T., Fan, J.: Point cloud instance segmentation with semi-supervised bounding-box mining. IEEE Trans. Pattern Anal. Mach. Intell. 44, 10159–10170 (2021)
Article Google Scholar
Liu, C., Furukawa, Y.: MASC: multi-scale affinity with sparse convolution for 3D instance segmentation. arXiv preprint arXiv:1902.04478 (2019)
Liu, C., et al.: Audio-visual segmentation, sound localization, semantic-aware sounding objects localization. arXiv preprint arXiv:2307.16620 (2023)
Liu, C., et al.: BAVS: bootstrapping audio-visual segmentation by integrating foundation knowledge. arXiv preprint arXiv:2308.10175 (2023)
Liu, S.H., Yu, S.Y., Wu, S.C., Chen, H.T., Liu, T.L.: Learning Gaussian instance segmentation in point clouds. arXiv preprint arXiv:2007.09860 (2020)
Liu, Y., Hu, Q., Lei, Y., Xu, K., Li, J., Guo, Y.: Box2Seg: learning semantics of 3D point clouds with box-level supervision. arXiv preprint arXiv:2201.02963 (2022)
Liu, Z., Qi, X., Fu, C.W.: One thing one click: a self-training approach for weakly supervised 3D semantic segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1726–1736 (2021)
Google Scholar
Ma, F., Wu, Y., Yu, X., Yang, Y.: Learning with noisy labels via self-reweighting from class centroids. IEEE Trans. Neural Netw. Learn. Syst. 33(11), 6275–6285 (2021)
Article MathSciNet Google Scholar
Ngo, T.D., Hua, B.S., Nguyen, K.: ISBNet: a 3D point cloud instance segmentation network with instance-aware sampling and box-aware dynamic convolution. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 13550–13559 (2023)
Google Scholar
Qi, X., Liu, C., Li, L., Hou, J., Xin, H., Yu, X.: EmotionGesture: audio-driven diverse emotional co-speech 3D gesture generation (2023)
Google Scholar
Qi, X., Liu, C., Sun, M., Li, L., Fan, C., Yu, X.: Diverse 3D hand gesture prediction from body dynamics by bilateral hand disentanglement. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4616–4626 (2023)
Google Scholar
Schult, J., Engelmann, F., Hermans, A., Litany, O., Tang, S., Leibe, B.: Mask3D for 3D semantic instance segmentation. arXiv preprint arXiv:2210.03105 (2022)
Sun, J., Qing, C., Tan, J., Xu, X.: Superpoint transformer for 3D scene instance segmentation. arXiv preprint arXiv:2211.15766 (2022)
Vu, T., Kim, K., Luu, T.M., Nguyen, T., Kim, J., Yoo, C.D.: SoftGroup++: scalable 3D instance segmentation with octree pyramid grouping. arXiv preprint arXiv:2209.08263 (2022)
Vu, T., Kim, K., Luu, T.M., Nguyen, T., Yoo, C.D.: SoftGroup for 3D instance segmentation on point clouds. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2708–2717 (2022)
Google Scholar
Wu, Y., et al.: PointMatch: a consistency training framework for weakly supervised semantic segmentation of 3D point clouds. arXiv preprint arXiv:2202.10705 (2022)
Wu, Z., Wu, Y., Lin, G., Cai, J., Qian, C.: Dual adaptive transformations for weakly supervised point cloud segmentation. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) ECCV 2022. LNCS, vol. 13691, pp. 78–96. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-19821-2_5
Chapter Google Scholar
Xu, X., Lee, G.H.: Weakly supervised semantic point cloud segmentation: towards 10x fewer labels. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 13706–13715 (2020)
Google Scholar
Xu, Y., Yu, X., Zhang, J., Zhu, L., Wang, D.: Weakly supervised RGB-D salient object detection with prediction consistency training and active scribble boosting. IEEE Trans. Image Process. 31, 2148–2161 (2022)
Article Google Scholar
Yang, B., et al.: Learning object bounding boxes for 3D instance segmentation on point clouds. arXiv preprint arXiv:1906.01140 (2019)
Ye, S., Chen, D., Han, S., Liao, J.: Learning with noisy labels for robust point cloud segmentation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 6443–6452 (2021)
Google Scholar
Yu, Q., Du, H., Liu, C., Yu, X.: When 3D bounding-box meets SAM: point cloud instance segmentation with weak-and-noisy supervision (2023)
Google Scholar
Zhan, H., Zheng, J., Xu, Y., Reid, I., Rezatofighi, H.: ActiveRMAP: radiance field for active mapping and planning. arXiv preprint arXiv:2211.12656 (2022)
Zhang, J., Yu, X., Li, A., Song, P., Liu, B., Dai, Y.: Weakly-supervised salient object detection via scribble annotations. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12546–12555 (2020)
Google Scholar
Zhang, Y., Li, Z., Xie, Y., Qu, Y., Li, C., Mei, T.: Weakly supervised semantic segmentation for large-scale point cloud. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, pp. 3421–3429 (2021)
Google Scholar
Zhou, Y., Zhu, Y., Ye, Q., Qiu, Q., Jiao, J.: Weakly supervised instance segmentation using class peak response. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3791–3800 (2018)
Google Scholar

Download references

Acknowledgements

This research is funded in part by ARC-Discovery grant (DP220200800 to XY) and ARC-DECRA grant (DE230100477 to XY). We thank all anonymous reviewers and ACs for their constructive suggestions.

Author information

Authors and Affiliations

University of Queensland, Brisbane, Australia
Qingtao Yu, Heming Du & Xin Yu
Australian National University, Canberra, Australia
Qingtao Yu & Heming Du

Authors

Qingtao Yu
View author publications
You can also search for this author in PubMed Google Scholar
Heming Du
View author publications
You can also search for this author in PubMed Google Scholar
Xin Yu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xin Yu .

Editor information

Editors and Affiliations

The University of Sydney, Darlington, NSW, Australia
Tongliang Liu
Monash University, Clayton, VIC, Australia
Geoff Webb
The University of Newcastle, Callaghan, NSW, Australia
Lin Yue
CSIRO Data61, Sydney, NSW, Australia
Dadong Wang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Yu, Q., Du, H., Yu, X. (2024). A New Perspective of Weakly Supervised 3D Instance Segmentation via Bounding Boxes. In: Liu, T., Webb, G., Yue, L., Wang, D. (eds) AI 2023: Advances in Artificial Intelligence. AI 2023. Lecture Notes in Computer Science(), vol 14471. Springer, Singapore. https://doi.org/10.1007/978-981-99-8388-9_9

Download citation

DOI: https://doi.org/10.1007/978-981-99-8388-9_9
Published: 27 November 2023
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-8387-2
Online ISBN: 978-981-99-8388-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

A New Perspective of Weakly Supervised 3D Instance Segmentation via Bounding Boxes