Abstract
Adopting many viewpoints and mining the relationship between them, 3D shape recognition inferring the object’s category from 2D rendered images has proven effective. However, using a limited number of general representative viewpoints to form a reasonable expression of the object is a task with both practical and theoretical significance. This paper proposes a multi-view CNN architecture with independent viewpoint feature extraction and the unity of importance weights, which can dramatically decrease the number of viewpoints by learning the representative ones. First, the view-based and independent view features are extracted by a deep neural network. Second, the network automatically learns relativity between these viewpoints and outputs the importance weights of views. Finally, view features are aggregated to predict the category of objects. Through iterative learning of these critical weights in instances, global representative viewpoints are selected. We assess our method on two challenging datasets, ModelNet and ShapeNet. Rigorous experiments show that our strategy is competitive with the latest method using only six viewpoints and RGB information as input. Meanwhile, our approach also achieves state-of-the-art performance by using 20 viewpoints as input. Specifically, the proposed approach achieves 99.34% and 97.49% accuracy on the ModelNet10 and ModelNet40, and 80.0% mAP on ShapeNet.
Similar content being viewed by others
References
Kong, C., Lin, C.-H., Lucey, S.: Using locally corresponding cad models for dense 3d reconstructions from a single image, in: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 4857–4865. (2017)
Murthy, J.K., Krishna, G.S., Chhaya, F., Krishna, K.M.: Reconstructing vehicles from a single image: Shape priors for road scene understanding, in: 2017 IEEE International Conference on Robotics and Automation (ICRA), IEEE, pp. 724–731. (2017)
Chen, X., Ma, H., Wan, J., Li, B., Xia, T.: Multi-view 3d object detection network for autonomous driving, in: Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, pp. 1907–1915. (2017)
Ren, S., He, K., Girshick, R., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 39(6), 1137–1149 (2017)
Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 39(4), 640–651 (2014)
Guo, Y., Wang, F., Xin, J.: Point-wise saliency detection on 3d point clouds via covariance descriptors. Visual Comput. 34(10), 1325–1338 (2018)
Wu, Z., Song, S., Khosla, A., Yu, F., Zhang, L., Tang, X., Xiao, J.: 3d shapenets: A deep representation for volumetric shapes, in: Proceedings of the IEEE conference on computer vision and pattern recognition, 2015, pp. 1912–1920
Savva, M., Yu, F., Su, H., Aono, M., Chen, B., Cohen-Or, D., Deng, W., Su, H., Bai, S., Bai, X., et al.: Shrec16 track: largescale 3d shape retrieval from shapenet core55, in: Proceedings of the eurographics workshop on 3D object retrieval, Vol. 10, 2016
Kanezaki, A., Matsushita, Y., Nishida, Y.: Rotationnet: Joint object categorization and pose estimation using multiviews from unsupervised viewpoints, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 5010–5019
Yavartanoo, M., Kim, E.Y., Lee, K.M.: Spnet: Deep 3d object classification and retrieval using stereographic projection, in: Asian Conference on Computer Vision, Springer, 2018, pp. 691–706
Yu, T., Meng, J., Yuan, J.: Multi-view harmonized bilinear network for 3d object recognition, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 186–194
Wang, C., Pelillo, M., Siddiqi, K.: Dominant set clustering and pooling for multi-view 3d object recognition, arXiv preprint arXiv:1906.01592
Bu, S., Wang, L., Han, P., Liu, Z., Li, K.: 3d shape recognition and retrieval based on multi-modality deep learning. Neurocomputing 259, 183–193 (2017)
Qi, C.R., Su, H., Mo, K., Guibas, L.J.: Pointnet: Deep learning on point sets for 3d classification and segmentation, in: Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 652–660
Klokov, R., Lempitsky, V.: Escape from cells: Deep kd-networks for the recognition of 3d point cloud models, in: Proceedings of the IEEE International Conference on Computer Vision, 2017, pp. 863–872
Liu, X., Han, Z., Liu, Y.-S., Zwicker, M.: Point2sequence: Learning the shape representation of 3d point clouds with an attention-based sequence to sequence network, in: Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33, 2019, pp. 8778–8785
Liu, Y., Fan, B., Xiang, S., Pan, C.: Relation-shape convolutional neural network for point cloud analysis, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 8895–8904
Zhang, K., Hao, M., Wang, J., de Silva, C.W., Fu, C.: Linked dynamic graph cnn: Learning on point cloud via linking hierarchical features, arXiv preprint arXiv:1904.10014
Feng, Y., You, H., Zhang, Z., Ji, R., Gao, Y.: Hypergraph neural networks, in: Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33, 2019, pp. 3558–3565
Li, J., Chen, B.M., Lee, G.H.: So-net: Self-organizing network for point cloud analysis, in: Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 9397–9406
Cheraghian, A., Petersson, L.: 3dcapsule: Extending the capsule architecture to classify 3d point clouds, in: 2019 IEEE Winter Conference on Applications of Computer Vision (WACV), IEEE, 2019, pp. 1194–1202
Maturana, D., Scherer, S.: Voxnet: A 3d convolutional neural network for real-time object recognition, in: 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), IEEE, 2015, pp. 922–928
Brock, A., Lim, T., Ritchie, J.M., Weston, N.: Generative and discriminative voxel modeling with convolutional neural networks, arXiv preprint arXiv:1608.04236
Kumawat, S., Raman, S.: Lp-3dcnn: Unveiling local phase in 3d convolutional neural networks, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 4903–4912
Su, H., Maji, S., Kalogerakis, E., Learned-Miller, E.: Multi-view convolutional neural networks for 3d shape recognition, in: Proceedings of the IEEE international conference on computer vision, 2015, pp. 945–953
Esteves, C., Xu, Y., Allen-Blanchette, C., Daniilidis, K.: Equivariant multi-view networks, in: Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019, pp. 1568–1577
Han, Z., Lu, H., Liu, Z., Vong, C.-M., Liu, Y.-S., Zwicker, M., Han, J., Chen, C.P.: 3d2seqviews: aggregating sequential views for 3d global feature learning by cnn with hierarchical attention aggregation. IEEE Trans. Image Process. 28(8), 3986–3999 (2019)
Han, Z., Shang, M., Liu, Z., Vong, C.-M., Liu, Y.-S., Zwicker, M., Han, J., Chen, C.P.: Seqviews2seqlabels: learning 3d global features via aggregating sequential views by rnn with attention. IEEE Trans. Image Process. 28(2), 658–672 (2018)
Jiang, J., Bao, D., Chen, Z., Zhao, X., Gao, Y.: Mlvcnn: Multi-loop-view convolutional neural network for 3d shape retrieval, in: Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33, 2019, pp. 8513–8520
Arsalan Soltani, A., Huang, H., Wu, J., Kulkarni, T.D., Tenenbaum, J.B.: Synthesizing 3d shapes via modeling multi-view depth maps and silhouettes with deep generative networks, in: Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 1511–1519
Johns, E., Leutenegger, S., Davison, A.J.: Pairwise decomposition of image sequences for active multi-view recognition, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 3813–3822
Chatfield, K., Simonyan, K., Vedaldi, A., Zisserman, A.: Return of the devil in the details: Delving deep into convolutional nets, arXiv preprint arXiv:1405.3531
Zanuttigh, P., Minto, L.: Deep learning for 3d shape classification from multiple depth maps, in: 2017 IEEE International Conference on Image Processing (ICIP), IEEE, 2017, pp. 3615–3619
Qi, C.R., Su, H., Nießner, M., Dai, A., Yan, M., Guibas, L.J.: Volumetric and multi-view cnns for object classification on 3d data, in: Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 5648–5656
Bai, S., Bai, X., Zhou, Z., Zhang, Z., Jan Latecki, L.: Gift: A real-time and scalable 3d shape search engine, in: Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 5023–5032
Bai, S., Bai, X., Zhou, Z., Zhang, Z., Tian, Q., Latecki, L.J.: Gift: towards scalable 3d shape retrieval. IEEE Trans. Multimedia 19(6), 1257–1271 (2017)
Wei, X., Yu, R., Sun, J.: Hrge-net: Hierarchical relational graph embedding network for multi-view 3d shape recognition, arXiv preprint arXiv:1908.10098
Wei, X., Yu, R., Sun, J.: View-gcn: View-based graph convolutional network for 3d shape analysis, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 1850–1859
Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Going deeper with convolutions, in: Proceedings of the IEEE conference on computer vision and pattern recognition, 2015, pp. 1–9
Acknowledgements
This work was supported by the National Natural Science Foundation of China (No. U20B2062), the Beijing Municipal Science & Technology Project (No. Z1911000 07419001), the Beijing National Research Center for Information Science and Technology, and the key Laboratory of Opto-Electronic Information Processing, CAS (No. JGA202004027).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Chu, H., Le, C., Wang, R. et al. Learning representative viewpoints in 3D shape recognition. Vis Comput 38, 3703–3718 (2022). https://doi.org/10.1007/s00371-021-02203-5
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00371-021-02203-5