Skip to main content
Log in

Learning representative viewpoints in 3D shape recognition

  • Original article
  • Published:
The Visual Computer Aims and scope Submit manuscript

Abstract

Adopting many viewpoints and mining the relationship between them, 3D shape recognition inferring the object’s category from 2D rendered images has proven effective. However, using a limited number of general representative viewpoints to form a reasonable expression of the object is a task with both practical and theoretical significance. This paper proposes a multi-view CNN architecture with independent viewpoint feature extraction and the unity of importance weights, which can dramatically decrease the number of viewpoints by learning the representative ones. First, the view-based and independent view features are extracted by a deep neural network. Second, the network automatically learns relativity between these viewpoints and outputs the importance weights of views. Finally, view features are aggregated to predict the category of objects. Through iterative learning of these critical weights in instances, global representative viewpoints are selected. We assess our method on two challenging datasets, ModelNet and ShapeNet. Rigorous experiments show that our strategy is competitive with the latest method using only six viewpoints and RGB information as input. Meanwhile, our approach also achieves state-of-the-art performance by using 20 viewpoints as input. Specifically, the proposed approach achieves 99.34% and 97.49% accuracy on the ModelNet10 and ModelNet40, and 80.0% mAP on ShapeNet.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

References

  1. Kong, C., Lin, C.-H., Lucey, S.: Using locally corresponding cad models for dense 3d reconstructions from a single image, in: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 4857–4865. (2017)

  2. Murthy, J.K., Krishna, G.S., Chhaya, F., Krishna, K.M.: Reconstructing vehicles from a single image: Shape priors for road scene understanding, in: 2017 IEEE International Conference on Robotics and Automation (ICRA), IEEE, pp. 724–731. (2017)

  3. Chen, X., Ma, H., Wan, J., Li, B., Xia, T.: Multi-view 3d object detection network for autonomous driving, in: Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, pp. 1907–1915. (2017)

  4. Ren, S., He, K., Girshick, R., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 39(6), 1137–1149 (2017)

    Article  Google Scholar 

  5. Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 39(4), 640–651 (2014)

    Google Scholar 

  6. Guo, Y., Wang, F., Xin, J.: Point-wise saliency detection on 3d point clouds via covariance descriptors. Visual Comput. 34(10), 1325–1338 (2018)

    Article  Google Scholar 

  7. Wu, Z., Song, S., Khosla, A., Yu, F., Zhang, L., Tang, X., Xiao, J.: 3d shapenets: A deep representation for volumetric shapes, in: Proceedings of the IEEE conference on computer vision and pattern recognition, 2015, pp. 1912–1920

  8. Savva, M., Yu, F., Su, H., Aono, M., Chen, B., Cohen-Or, D., Deng, W., Su, H., Bai, S., Bai, X., et al.: Shrec16 track: largescale 3d shape retrieval from shapenet core55, in: Proceedings of the eurographics workshop on 3D object retrieval, Vol. 10, 2016

  9. Kanezaki, A., Matsushita, Y., Nishida, Y.: Rotationnet: Joint object categorization and pose estimation using multiviews from unsupervised viewpoints, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 5010–5019

  10. Yavartanoo, M., Kim, E.Y., Lee, K.M.: Spnet: Deep 3d object classification and retrieval using stereographic projection, in: Asian Conference on Computer Vision, Springer, 2018, pp. 691–706

  11. Yu, T., Meng, J., Yuan, J.: Multi-view harmonized bilinear network for 3d object recognition, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 186–194

  12. Wang, C., Pelillo, M., Siddiqi, K.: Dominant set clustering and pooling for multi-view 3d object recognition, arXiv preprint arXiv:1906.01592

  13. Bu, S., Wang, L., Han, P., Liu, Z., Li, K.: 3d shape recognition and retrieval based on multi-modality deep learning. Neurocomputing 259, 183–193 (2017)

    Article  Google Scholar 

  14. Qi, C.R., Su, H., Mo, K., Guibas, L.J.: Pointnet: Deep learning on point sets for 3d classification and segmentation, in: Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 652–660

  15. Klokov, R., Lempitsky, V.: Escape from cells: Deep kd-networks for the recognition of 3d point cloud models, in: Proceedings of the IEEE International Conference on Computer Vision, 2017, pp. 863–872

  16. Liu, X., Han, Z., Liu, Y.-S., Zwicker, M.: Point2sequence: Learning the shape representation of 3d point clouds with an attention-based sequence to sequence network, in: Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33, 2019, pp. 8778–8785

  17. Liu, Y., Fan, B., Xiang, S., Pan, C.: Relation-shape convolutional neural network for point cloud analysis, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 8895–8904

  18. Zhang, K., Hao, M., Wang, J., de Silva, C.W., Fu, C.: Linked dynamic graph cnn: Learning on point cloud via linking hierarchical features, arXiv preprint arXiv:1904.10014

  19. Feng, Y., You, H., Zhang, Z., Ji, R., Gao, Y.: Hypergraph neural networks, in: Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33, 2019, pp. 3558–3565

  20. Li, J., Chen, B.M., Lee, G.H.: So-net: Self-organizing network for point cloud analysis, in: Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 9397–9406

  21. Cheraghian, A., Petersson, L.: 3dcapsule: Extending the capsule architecture to classify 3d point clouds, in: 2019 IEEE Winter Conference on Applications of Computer Vision (WACV), IEEE, 2019, pp. 1194–1202

  22. Maturana, D., Scherer, S.: Voxnet: A 3d convolutional neural network for real-time object recognition, in: 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), IEEE, 2015, pp. 922–928

  23. Brock, A., Lim, T., Ritchie, J.M., Weston, N.: Generative and discriminative voxel modeling with convolutional neural networks, arXiv preprint arXiv:1608.04236

  24. Kumawat, S., Raman, S.: Lp-3dcnn: Unveiling local phase in 3d convolutional neural networks, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 4903–4912

  25. Su, H., Maji, S., Kalogerakis, E., Learned-Miller, E.: Multi-view convolutional neural networks for 3d shape recognition, in: Proceedings of the IEEE international conference on computer vision, 2015, pp. 945–953

  26. Esteves, C., Xu, Y., Allen-Blanchette, C., Daniilidis, K.: Equivariant multi-view networks, in: Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019, pp. 1568–1577

  27. Han, Z., Lu, H., Liu, Z., Vong, C.-M., Liu, Y.-S., Zwicker, M., Han, J., Chen, C.P.: 3d2seqviews: aggregating sequential views for 3d global feature learning by cnn with hierarchical attention aggregation. IEEE Trans. Image Process. 28(8), 3986–3999 (2019)

    Article  MathSciNet  MATH  Google Scholar 

  28. Han, Z., Shang, M., Liu, Z., Vong, C.-M., Liu, Y.-S., Zwicker, M., Han, J., Chen, C.P.: Seqviews2seqlabels: learning 3d global features via aggregating sequential views by rnn with attention. IEEE Trans. Image Process. 28(2), 658–672 (2018)

    Article  MathSciNet  MATH  Google Scholar 

  29. Jiang, J., Bao, D., Chen, Z., Zhao, X., Gao, Y.: Mlvcnn: Multi-loop-view convolutional neural network for 3d shape retrieval, in: Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33, 2019, pp. 8513–8520

  30. Arsalan Soltani, A., Huang, H., Wu, J., Kulkarni, T.D., Tenenbaum, J.B.: Synthesizing 3d shapes via modeling multi-view depth maps and silhouettes with deep generative networks, in: Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 1511–1519

  31. Johns, E., Leutenegger, S., Davison, A.J.: Pairwise decomposition of image sequences for active multi-view recognition, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 3813–3822

  32. Chatfield, K., Simonyan, K., Vedaldi, A., Zisserman, A.: Return of the devil in the details: Delving deep into convolutional nets, arXiv preprint arXiv:1405.3531

  33. Zanuttigh, P., Minto, L.: Deep learning for 3d shape classification from multiple depth maps, in: 2017 IEEE International Conference on Image Processing (ICIP), IEEE, 2017, pp. 3615–3619

  34. Qi, C.R., Su, H., Nießner, M., Dai, A., Yan, M., Guibas, L.J.: Volumetric and multi-view cnns for object classification on 3d data, in: Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 5648–5656

  35. Bai, S., Bai, X., Zhou, Z., Zhang, Z., Jan Latecki, L.: Gift: A real-time and scalable 3d shape search engine, in: Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 5023–5032

  36. Bai, S., Bai, X., Zhou, Z., Zhang, Z., Tian, Q., Latecki, L.J.: Gift: towards scalable 3d shape retrieval. IEEE Trans. Multimedia 19(6), 1257–1271 (2017)

    Article  Google Scholar 

  37. Wei, X., Yu, R., Sun, J.: Hrge-net: Hierarchical relational graph embedding network for multi-view 3d shape recognition, arXiv preprint arXiv:1908.10098

  38. Wei, X., Yu, R., Sun, J.: View-gcn: View-based graph convolutional network for 3d shape analysis, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 1850–1859

  39. Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Going deeper with convolutions, in: Proceedings of the IEEE conference on computer vision and pattern recognition, 2015, pp. 1–9

Download references

Acknowledgements

This work was supported by the National Natural Science Foundation of China (No. U20B2062), the Beijing Municipal Science & Technology Project (No. Z1911000 07419001), the Beijing National Research Center for Information Science and Technology, and the key Laboratory of Opto-Electronic Information Processing, CAS (No. JGA202004027).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Huimin Ma.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chu, H., Le, C., Wang, R. et al. Learning representative viewpoints in 3D shape recognition. Vis Comput 38, 3703–3718 (2022). https://doi.org/10.1007/s00371-021-02203-5

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00371-021-02203-5

Keywords

Navigation