Skip to main content
Log in

Generalization to unseen viewpoint images of objects via alleviated pose attentive capsule agreement

  • Original Article
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

Despite their achievements in object recognition, Convolutional Neural Networks (CNNs) particularly fail to generalize to unseen viewpoints of a learned object even with substantial samples. On the other hand, recently emerged capsule networks outperform CNNs in novel viewpoint generalization tasks even with significantly fewer parameters. Capsule networks group the neuron activations for representing higher level attributes and their interactions for achieving equivariance to visual transformations. However, capsule networks have a high computational cost for learning the interactions of capsules in consecutive layers via the, so called, routing algorithm. To address these issues, we propose a novel routing algorithm, Alleviated Pose Attentive Capsule Agreement (ALPACA) which is tailored for capsules that contain pose, feature and existence probability information together to enhance novel viewpoint generalization of capsules on 2D images. For this purpose, we have created a Novel ViewPoint Dataset (NVPD) a viewpoint-controlled texture-free dataset that has 8 different setups where training and test samples are formed by different viewpoints. In addition to NVPD, we have conducted experiments on iLab2M dataset where the dataset is split in terms of the object instances. Experimental results show that ALPACA outperforms its capsule network counterparts and state-of-the-art CNNs on iLab2M and NVPD datasets. Moreover, ALPACA is 10 times faster when compared to routing-based capsule networks. It also outperforms attention-based routing algorithms of the domain while keeping the inference and training times comparable. Lastly, our code, the NVPD dataset, test setups, and implemented models are freely available at https://github.com/Boazrciasn/ALPACA.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Data Availability

Code and the dataset used in this research is freely available at https://github.com/Boazrciasn/ALPACA.

References

  1. Ren S, He K, Girshick R, Sun J (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Adv Neural Inf Process Syst 45:91–99

    Google Scholar 

  2. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

  3. Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: unified, real-time object detection. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

  4. Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

  5. Alcorn MA, Li Q, Gong Z, Wang C, Mai L, Ku W-S, Nguyen A (2019) Strike (with) a pose: Neural networks are easily fooled by strange poses of familiar objects. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4845–4854

  6. Sabour S, Frosst N, Hinton GE (2017) Dynamic routing between capsules. Adv Neural Inf Process Syst 30:3856–3866

    Google Scholar 

  7. Hinton GE, Frosst N, Sabour S (2018) Matrix capsules with EM routing. In: International Conference on Learning Representations (ICLR)

  8. Ribeiro FDS, Leontidis G, Kollias S (2020) Capsule routing via variational bayes. Proc AAAI Conf Artif Intell 34:3749–3756

    Google Scholar 

  9. Peer D, Stabinger S, Rodríguez-Sánchez A (2021) Limitation of capsule networks. Pattern Recogn Lett 144:68–74

    Article  Google Scholar 

  10. Zhao Y, Birdal T, Lenssen JE, Menegatti E, Guibas L, Tombari F (2020) Quaternion equivariant capsule networks for 3d point clouds. In: European Conference on Computer Vision, pp. 1–19. Springer

  11. Özcan B, Kinli F, Kiraç F (2021) Quaternion capsule networks. In: 2020 25th International Conference on Pattern Recognition (ICPR), pp. 6858–6865. IEEE

  12. Leksut JT, Zhao J, Itti L (2020) Learning visual variation for object recognition. Image Vision Comput 98:103912

    Article  Google Scholar 

  13. Hinton GE, Krizhevsky A, Wang SD (2011) Transforming auto-encoders. In: Proceedings of the 21th International Conference on Artificial Neural Networks. ICANN’11, pp. 44–51

  14. Lecun Y, Huang F, Bottou L (2004) Learning methods for generic object recognition with invariance to pose and lighting. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), p. 97. https://doi.org/10.1109/CVPR.2004.1315150

  15. LaLonde R, Bagci U (2018) Capsules for object segmentation. http://arxiv.org/abs/1804.04241

  16. Koresh HJD, Chacko S, Periyanayagi M (2021) A modified capsule network algorithm for oct corneal image segmentation. Pattern Recogn Lett 143:104–112

    Article  Google Scholar 

  17. Duarte K, Rawat Y, Shah M (2018) VideoCapsuleNet: a simplified network for action detection. Adv Neural Inf Process Syst 31:7610–7619

    Google Scholar 

  18. Kinli F, Ozcan B, Kirac F (2019) Fashion image retrieval with capsule networks. In: The IEEE International Conference on Computer Vision (ICCV) Workshops

  19. Kınlı, F., Kıraç, F (2020) Fashioncapsnet: Clothing classification with capsule networks. Bilişim Teknolojileri Dergisi 13: 87–96 . doi: https://doi.org/10.17671/gazibtd.580222

  20. Nguyen HH, Yamagishi J, Echizen I (2019)Capsule-forensics: using capsule networks to detect forged images and videos. In: ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 2307–2311. IEEE

  21. Paoletti ME, Haut JM, Fernandez-Beltran R, Plaza J, Plaza A, Li J, Pla F (2018) Capsule networks for hyperspectral image classification. IEEE Trans Geosci Remote Sens 57(4):2145–2160

    Article  Google Scholar 

  22. Yang M, Zhao W, Ye J, Lei Z, Zhao Z, Zhang S(2018) Investigating capsule networks with dynamic routing for text classification. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pp. 3110–3119

  23. Wang M, Xie J, Tan Z, Su J, Xiong D, Li L (2018) Towards linear time neural machine translation with capsule networks. http://arxiv.org/abs/1811.00287

  24. Zhao W, Peng H, Eger S, Cambria E, Yang M (2019) Towards scalable and reliable capsule networks for challenging nlp applications. http://arxiv.org/abs/1906.02829

  25. Hirose A (2004) Complex-valued neural networks: theories and applications (series on innovative intelligence, 5)

  26. Zimmermann HG, Minin A, Kusherbaeva V (2011) Comparison of the complex valued and real valued neural networks trained with gradient descent and random search algorithms. In: Proc. of ESANN 2011

  27. Nitta T (2002) On the critical points of the complex-valued neural network. In: Proceedings of the 9th International Conference on Neural Information Processing, 2002. ICONIP’02, vol. 3, pp. 1099–1103. IEEE

  28. Hirose A, Yoshida S (2012) Generalization characteristics of complex-valued feedforward neural networks in relation to signal coherence. IEEE Trans Neural Netw Learn Syst 23:541–551

    Article  Google Scholar 

  29. Danihelka I, Wayne G, Uria B, Kalchbrenner N, Graves A (2016) Associative long short-term memory. In: Proceedings of The 33rd International Conference on Machine Learning, vol. 48, pp. 1986–1994

  30. Arjovsky M, Shah A, Bengio Y (2016) Unitary evolution recurrent neural networks. In: Proceedings of the 33rd International Conference on International Conference on Machine Learning - Volume 48. ICML’16, pp. 1120–1128

  31. Gaudet CJ, Maida AS (2018) Deep quaternion networks. In: 2018 International Joint Conference on Neural Networks (IJCNN), pp. 1–8. IEEE

  32. Zhu X, Xu Y, Xu H, Chen C (2018) Quaternion convolutional neural networks. In: The European Conference on Computer Vision (ECCV)

  33. Parcollet T, Zhang Y, Morchid M, Trabelsi C, Linarès G, De Mori R, Bengio Y (2018) Quaternion convolutional neural networks for end-to-end automatic speech recognition. https://doi.org/10.21437/Interspeech.2018-1898

  34. Ribeiro FDS, Leontidis G, Kollias SD (2020) Introducing routing uncertainty in capsule networks. In: NeurIPS

  35. Hahn T, Pyeon M, Kim G (2019) Self-routing capsule networks. In: Wallach H, Larochelle H, Beygelzimer A, d’Alché-Buc F, Fox E, Garnett R (eds) Advances in neural information processing systems. Curran Associates Inc, USA

    Google Scholar 

  36. Choi J, Seo H, Im S, Kang M (2019) Attention routing between capsules. In: Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops, pp. 0–0

  37. Tsai Y-HH, Srivastava N, Goh H, Salakhutdinov R (2020) Capsules with inverted dot-product attention routing. http://arxiv.org/abs/2002.04764

  38. Ahmed K, Torresani L (2019) Star-caps: Capsule networks with straight-through attentive routing. In: NeurIPS, pp. 9098–9107

  39. Yu Z-X, He Y, Zhu C, Tian S, Yin X-C (2019) Carnet: Densely connected capsules with capsule-wise attention routing. In: Cyberspace Data and Intelligence, and Cyber-Living, Syndrome, and Health, pp. 309–320. Springer

  40. Parcollet, T, Ravanelli, M, Morchid, M, Linarès, G, Trabelsi, C, De Mori, R, Bengio, Y.: Quaternion Recurrent Neural Networks. In: International Conference on Learning Representations (ICLR) (2019)

  41. Laue, S, Mitterreiter, M, Giesen, J.: Computing higher order derivatives of matrix and tensor expressions. In: NeurIPS, pp. 2755–2764 (2018)

  42. Jablonski B (2008) Anisotropic filtering of multidimensional rotational trajectories as a generalization of 2d diffusion process. Multidimens Syst Signal Process 19(3–4):379–399

    Article  MATH  Google Scholar 

  43. Jabłoński, B.: Application of quaternion scale space approach for motion processing. In: Image Processing and Communications Challenges 3, pp. 141–148. Springer, (2011)

  44. Nair, V, Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: ICML, pp. 807–814. Omnipress, (2010)

  45. Vaswani, A, Shazeer, N, Parmar, N, Uszkoreit, J, Jones, L, Gomez, A.N, Kaiser, L, Polosukhin, I.: Attention is all you need. http://arxiv.org/abs/1706.03762 (2017)

  46. He, K, Zhang, X, Ren, S, Sun, J.: Identity mappings in deep residual networks. In: European Conference on Computer Vision, pp. 630–645 (2016). Springer

  47. Kinli FO, Kiraç FM (2020) Fashioncapsnet: clothing classification with capsule networks. Bilişim Teknolojileri Dergisi 13(1):87–96

    Article  Google Scholar 

  48. He, K, Zhang, X, Ren, S, Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)

  49. Iandola, F, Moskewicz, M, Karayev, S, Girshick, R, Darrell, T, Keutzer, K.: Densenet: Implementing efficient convnet descriptor pyramids. http://arxiv.org/abs/1404.1869 (2014)

  50. Iandola, F.N, Han, S, Moskewicz, M.W, Ashraf, K, Dally, W.J, Keutzer, K.: Squeezenet: Alexnet-level accuracy with 50x fewer parameters and \(<\) 0.5 mb model size. http://arxiv.org/abs/1602.07360 (2016)

  51. Chang, A.X, Funkhouser, T, Guibas, L, Hanrahan, P, Huang, Q, Li, Z, Savarese, S, Savva, M, Song, S, Su, H, Xiao, J, Yi, L, Yu, F.: ShapeNet: An Information-Rich 3D Model Repository. Technical Report http://arxiv.org/abs/1512.03012 [cs.GR], Stanford University — Princeton University — Toyota Technological Institute at Chicago (2015)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Barış Özcan.

Ethics declarations

Conflict of interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Özcan, B., Kınlı, F. & Kıraç, F. Generalization to unseen viewpoint images of objects via alleviated pose attentive capsule agreement. Neural Comput & Applic 35, 3521–3536 (2023). https://doi.org/10.1007/s00521-022-07900-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-022-07900-3

Keywords

Navigation