Skip to main content

A New Perspective of Weakly Supervised 3D Instance Segmentation via Bounding Boxes

  • Conference paper
  • First Online:
AI 2023: Advances in Artificial Intelligence (AI 2023)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 14471))

Included in the following conference series:

  • 531 Accesses

Abstract

Existing fully supervised method 3D point cloud segmentation methods heavily rely on carefully annotated point labels. In this work, we look at weakly-supervised 3D instance segmentation using bounding boxes supervision. Bounding boxes are much easier to annotate than dense point-wise labels. Moreover, they demonstrated high potential in addressing instance-level segmentation compared to other types of weak annotations. However, existing bounding-box supervised techniques have struggled to keep pace with the development of fully-supervised methods. To tackle this issue, we propose a simple-yet-effective approach to directly leverage the network architecture of fully-supervised methods for such weak supervision scenarios. We found that accurate instance labels for each point can be generated with the given bounding boxes by leveraging 3D geometric prior. Such a process is efficient and does not require any additional training or fine-tuning. The generated point-wise labels can be fed to any advanced fully-supervised model without re-designing specific networks for bounding-box supervision. In this fashion, our designed approach achieves on par performance of fully supervised methods in terms of AP, AP50 and AP25. Remarkably, we outperformed the state-of-the-art bounding-box supervised method by 21%. Compared with existing methods, our method is extremely simple and only involves two small heuristics in the data preprocessing step. In addition, our method is proven to be robust against noisy bounding box scenario through experiments.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 69.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 89.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12346, pp. 213–229. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58452-8_13

    Chapter  Google Scholar 

  2. Chen, S., Fang, J., Zhang, Q., Liu, W., Wang, X.: Hierarchical aggregation for 3D instance segmentation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 15467–15476 (2021)

    Google Scholar 

  3. Cheng, B., Choudhuri, A., Misra, I., Kirillov, A., Girdhar, R., Schwing, A.G.: Mask2Former for video instance segmentation. arXiv preprint arXiv:2112.10764 (2021)

  4. Chibane, J., Engelmann, F., Anh Tran, T., Pons-Moll, G.: Box2Mask: weakly supervised 3D semantic instance segmentation using bounding boxes. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) ECCV 2022. LNCS, vol. 13691, pp. 681–699. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-19821-2_39

    Chapter  Google Scholar 

  5. Dai, A., Chang, A.X., Savva, M., Halber, M., Funkhouser, T., Nießner, M.: ScanNet: richly-annotated 3D reconstructions of indoor scenes. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5828–5839 (2017)

    Google Scholar 

  6. Dai, A., Nießner, M., Zollhöfer, M., Izadi, S., Theobalt, C.: BundleFusion: real-time globally consistent 3D reconstruction using on-the-fly surface reintegration. ACM Trans. Graph. (ToG) 36(4), 1 (2017)

    Article  Google Scholar 

  7. Du, H., Yu, X., Hussain, F., Armin, M.A., Petersson, L., Li, W.: Weakly-supervised point cloud instance segmentation with geometric priors. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 4271–4280 (2023)

    Google Scholar 

  8. Engelmann, F., Bokeloh, M., Fathi, A., Leibe, B., Nießner, M.: 3D-MPA: multi-proposal aggregation for 3D semantic instance segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9031–9040 (2020)

    Google Scholar 

  9. Han, C., Yu, X., Gao, C., Sang, N., Yang, Y.: Single image based 3D human pose estimation via uncertainty learning. Pattern Recogn. 132, 108934 (2022)

    Article  Google Scholar 

  10. Hou, J., Dai, A., Nießner, M.: 3D-SIS: 3D semantic instance segmentation of RGB-D scans. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4421–4430 (2019)

    Google Scholar 

  11. Hou, J., Graham, B., Nießner, M., Xie, S.: Exploring data-efficient 3D scene understanding with contrastive scene contexts. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 15587–15597 (2021)

    Google Scholar 

  12. Jiang, L., Zhao, H., Shi, S., Liu, S., Fu, C.W., Jia, J.: PointGroup: dual-set point grouping for 3D instance segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2020)

    Google Scholar 

  13. Kulharia, V., Chandra, S., Agrawal, A., Torr, P., Tyagi, A.: Box2Seg: attention weighted loss and discriminative feature learning for weakly supervised segmentation. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12372, pp. 290–308. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58583-9_18

    Chapter  Google Scholar 

  14. Landrieu, L., Boussaha, M.: Point cloud over segmentation with graph-structured deep metric learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7440–7449 (2019)

    Google Scholar 

  15. Liang, Z., Li, Z., Xu, S., Tan, M., Jia, K.: Instance segmentation in 3D scenes using semantic superpoint tree networks. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 2783–2792 (2021)

    Google Scholar 

  16. Liao, Y., Zhu, H., Zhang, Y., Ye, C., Chen, T., Fan, J.: Point cloud instance segmentation with semi-supervised bounding-box mining. IEEE Trans. Pattern Anal. Mach. Intell. 44, 10159–10170 (2021)

    Article  Google Scholar 

  17. Liu, C., Furukawa, Y.: MASC: multi-scale affinity with sparse convolution for 3D instance segmentation. arXiv preprint arXiv:1902.04478 (2019)

  18. Liu, C., et al.: Audio-visual segmentation, sound localization, semantic-aware sounding objects localization. arXiv preprint arXiv:2307.16620 (2023)

  19. Liu, C., et al.: BAVS: bootstrapping audio-visual segmentation by integrating foundation knowledge. arXiv preprint arXiv:2308.10175 (2023)

  20. Liu, S.H., Yu, S.Y., Wu, S.C., Chen, H.T., Liu, T.L.: Learning Gaussian instance segmentation in point clouds. arXiv preprint arXiv:2007.09860 (2020)

  21. Liu, Y., Hu, Q., Lei, Y., Xu, K., Li, J., Guo, Y.: Box2Seg: learning semantics of 3D point clouds with box-level supervision. arXiv preprint arXiv:2201.02963 (2022)

  22. Liu, Z., Qi, X., Fu, C.W.: One thing one click: a self-training approach for weakly supervised 3D semantic segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1726–1736 (2021)

    Google Scholar 

  23. Ma, F., Wu, Y., Yu, X., Yang, Y.: Learning with noisy labels via self-reweighting from class centroids. IEEE Trans. Neural Netw. Learn. Syst. 33(11), 6275–6285 (2021)

    Article  MathSciNet  Google Scholar 

  24. Ngo, T.D., Hua, B.S., Nguyen, K.: ISBNet: a 3D point cloud instance segmentation network with instance-aware sampling and box-aware dynamic convolution. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 13550–13559 (2023)

    Google Scholar 

  25. Qi, X., Liu, C., Li, L., Hou, J., Xin, H., Yu, X.: EmotionGesture: audio-driven diverse emotional co-speech 3D gesture generation (2023)

    Google Scholar 

  26. Qi, X., Liu, C., Sun, M., Li, L., Fan, C., Yu, X.: Diverse 3D hand gesture prediction from body dynamics by bilateral hand disentanglement. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4616–4626 (2023)

    Google Scholar 

  27. Schult, J., Engelmann, F., Hermans, A., Litany, O., Tang, S., Leibe, B.: Mask3D for 3D semantic instance segmentation. arXiv preprint arXiv:2210.03105 (2022)

  28. Sun, J., Qing, C., Tan, J., Xu, X.: Superpoint transformer for 3D scene instance segmentation. arXiv preprint arXiv:2211.15766 (2022)

  29. Vu, T., Kim, K., Luu, T.M., Nguyen, T., Kim, J., Yoo, C.D.: SoftGroup++: scalable 3D instance segmentation with octree pyramid grouping. arXiv preprint arXiv:2209.08263 (2022)

  30. Vu, T., Kim, K., Luu, T.M., Nguyen, T., Yoo, C.D.: SoftGroup for 3D instance segmentation on point clouds. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2708–2717 (2022)

    Google Scholar 

  31. Wu, Y., et al.: PointMatch: a consistency training framework for weakly supervised semantic segmentation of 3D point clouds. arXiv preprint arXiv:2202.10705 (2022)

  32. Wu, Z., Wu, Y., Lin, G., Cai, J., Qian, C.: Dual adaptive transformations for weakly supervised point cloud segmentation. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) ECCV 2022. LNCS, vol. 13691, pp. 78–96. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-19821-2_5

    Chapter  Google Scholar 

  33. Xu, X., Lee, G.H.: Weakly supervised semantic point cloud segmentation: towards 10x fewer labels. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 13706–13715 (2020)

    Google Scholar 

  34. Xu, Y., Yu, X., Zhang, J., Zhu, L., Wang, D.: Weakly supervised RGB-D salient object detection with prediction consistency training and active scribble boosting. IEEE Trans. Image Process. 31, 2148–2161 (2022)

    Article  Google Scholar 

  35. Yang, B., et al.: Learning object bounding boxes for 3D instance segmentation on point clouds. arXiv preprint arXiv:1906.01140 (2019)

  36. Ye, S., Chen, D., Han, S., Liao, J.: Learning with noisy labels for robust point cloud segmentation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 6443–6452 (2021)

    Google Scholar 

  37. Yu, Q., Du, H., Liu, C., Yu, X.: When 3D bounding-box meets SAM: point cloud instance segmentation with weak-and-noisy supervision (2023)

    Google Scholar 

  38. Zhan, H., Zheng, J., Xu, Y., Reid, I., Rezatofighi, H.: ActiveRMAP: radiance field for active mapping and planning. arXiv preprint arXiv:2211.12656 (2022)

  39. Zhang, J., Yu, X., Li, A., Song, P., Liu, B., Dai, Y.: Weakly-supervised salient object detection via scribble annotations. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12546–12555 (2020)

    Google Scholar 

  40. Zhang, Y., Li, Z., Xie, Y., Qu, Y., Li, C., Mei, T.: Weakly supervised semantic segmentation for large-scale point cloud. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, pp. 3421–3429 (2021)

    Google Scholar 

  41. Zhou, Y., Zhu, Y., Ye, Q., Qiu, Q., Jiao, J.: Weakly supervised instance segmentation using class peak response. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3791–3800 (2018)

    Google Scholar 

Download references

Acknowledgements

This research is funded in part by ARC-Discovery grant (DP220200800 to XY) and ARC-DECRA grant (DE230100477 to XY). We thank all anonymous reviewers and ACs for their constructive suggestions.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xin Yu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Yu, Q., Du, H., Yu, X. (2024). A New Perspective of Weakly Supervised 3D Instance Segmentation via Bounding Boxes. In: Liu, T., Webb, G., Yue, L., Wang, D. (eds) AI 2023: Advances in Artificial Intelligence. AI 2023. Lecture Notes in Computer Science(), vol 14471. Springer, Singapore. https://doi.org/10.1007/978-981-99-8388-9_9

Download citation

  • DOI: https://doi.org/10.1007/978-981-99-8388-9_9

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-99-8387-2

  • Online ISBN: 978-981-99-8388-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics