Generalization to unseen viewpoint images of objects via alleviated pose attentive capsule agreement

Özcan, Barış; Kınlı, Furkan; Kıraç, Furkan

doi:10.1007/s00521-022-07900-3

Generalization to unseen viewpoint images of objects via alleviated pose attentive capsule agreement

Original Article
Published: 13 October 2022

Volume 35, pages 3521–3536, (2023)
Cite this article

Neural Computing and Applications Aims and scope Submit manuscript

202 Accesses
1 Altmetric
Explore all metrics

Abstract

Despite their achievements in object recognition, Convolutional Neural Networks (CNNs) particularly fail to generalize to unseen viewpoints of a learned object even with substantial samples. On the other hand, recently emerged capsule networks outperform CNNs in novel viewpoint generalization tasks even with significantly fewer parameters. Capsule networks group the neuron activations for representing higher level attributes and their interactions for achieving equivariance to visual transformations. However, capsule networks have a high computational cost for learning the interactions of capsules in consecutive layers via the, so called, routing algorithm. To address these issues, we propose a novel routing algorithm, Alleviated Pose Attentive Capsule Agreement (ALPACA) which is tailored for capsules that contain pose, feature and existence probability information together to enhance novel viewpoint generalization of capsules on 2D images. For this purpose, we have created a Novel ViewPoint Dataset (NVPD) a viewpoint-controlled texture-free dataset that has 8 different setups where training and test samples are formed by different viewpoints. In addition to NVPD, we have conducted experiments on iLab2M dataset where the dataset is split in terms of the object instances. Experimental results show that ALPACA outperforms its capsule network counterparts and state-of-the-art CNNs on iLab2M and NVPD datasets. Moreover, ALPACA is 10 times faster when compared to routing-based capsule networks. It also outperforms attention-based routing algorithms of the domain while keeping the inference and training times comparable. Lastly, our code, the NVPD dataset, test setups, and implemented models are freely available at https://github.com/Boazrciasn/ALPACA.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

When and how convolutional neural networks generalize to out-of-distribution category–viewpoint combinations

Article 21 February 2022

Joint Viewpoint and Keypoint Estimation with Real and Synthetic Data

3DPointCaps++: Learning 3D Representations with Capsule Networks

Article Open access 30 July 2022

Data Availability

Code and the dataset used in this research is freely available at https://github.com/Boazrciasn/ALPACA.

References

Ren S, He K, Girshick R, Sun J (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Adv Neural Inf Process Syst 45:91–99
Google Scholar
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: unified, real-time object detection. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
Alcorn MA, Li Q, Gong Z, Wang C, Mai L, Ku W-S, Nguyen A (2019) Strike (with) a pose: Neural networks are easily fooled by strange poses of familiar objects. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4845–4854
Sabour S, Frosst N, Hinton GE (2017) Dynamic routing between capsules. Adv Neural Inf Process Syst 30:3856–3866
Google Scholar
Hinton GE, Frosst N, Sabour S (2018) Matrix capsules with EM routing. In: International Conference on Learning Representations (ICLR)
Ribeiro FDS, Leontidis G, Kollias S (2020) Capsule routing via variational bayes. Proc AAAI Conf Artif Intell 34:3749–3756
Google Scholar
Peer D, Stabinger S, Rodríguez-Sánchez A (2021) Limitation of capsule networks. Pattern Recogn Lett 144:68–74
Article Google Scholar
Zhao Y, Birdal T, Lenssen JE, Menegatti E, Guibas L, Tombari F (2020) Quaternion equivariant capsule networks for 3d point clouds. In: European Conference on Computer Vision, pp. 1–19. Springer
Özcan B, Kinli F, Kiraç F (2021) Quaternion capsule networks. In: 2020 25th International Conference on Pattern Recognition (ICPR), pp. 6858–6865. IEEE
Leksut JT, Zhao J, Itti L (2020) Learning visual variation for object recognition. Image Vision Comput 98:103912
Article Google Scholar
Hinton GE, Krizhevsky A, Wang SD (2011) Transforming auto-encoders. In: Proceedings of the 21th International Conference on Artificial Neural Networks. ICANN’11, pp. 44–51
Lecun Y, Huang F, Bottou L (2004) Learning methods for generic object recognition with invariance to pose and lighting. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), p. 97. https://doi.org/10.1109/CVPR.2004.1315150
LaLonde R, Bagci U (2018) Capsules for object segmentation. http://arxiv.org/abs/1804.04241
Koresh HJD, Chacko S, Periyanayagi M (2021) A modified capsule network algorithm for oct corneal image segmentation. Pattern Recogn Lett 143:104–112
Article Google Scholar
Duarte K, Rawat Y, Shah M (2018) VideoCapsuleNet: a simplified network for action detection. Adv Neural Inf Process Syst 31:7610–7619
Google Scholar
Kinli F, Ozcan B, Kirac F (2019) Fashion image retrieval with capsule networks. In: The IEEE International Conference on Computer Vision (ICCV) Workshops
Kınlı, F., Kıraç, F (2020) Fashioncapsnet: Clothing classification with capsule networks. Bilişim Teknolojileri Dergisi 13: 87–96 . doi: https://doi.org/10.17671/gazibtd.580222
Nguyen HH, Yamagishi J, Echizen I (2019)Capsule-forensics: using capsule networks to detect forged images and videos. In: ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 2307–2311. IEEE
Paoletti ME, Haut JM, Fernandez-Beltran R, Plaza J, Plaza A, Li J, Pla F (2018) Capsule networks for hyperspectral image classification. IEEE Trans Geosci Remote Sens 57(4):2145–2160
Article Google Scholar
Yang M, Zhao W, Ye J, Lei Z, Zhao Z, Zhang S(2018) Investigating capsule networks with dynamic routing for text classification. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pp. 3110–3119
Wang M, Xie J, Tan Z, Su J, Xiong D, Li L (2018) Towards linear time neural machine translation with capsule networks. http://arxiv.org/abs/1811.00287
Zhao W, Peng H, Eger S, Cambria E, Yang M (2019) Towards scalable and reliable capsule networks for challenging nlp applications. http://arxiv.org/abs/1906.02829
Hirose A (2004) Complex-valued neural networks: theories and applications (series on innovative intelligence, 5)
Zimmermann HG, Minin A, Kusherbaeva V (2011) Comparison of the complex valued and real valued neural networks trained with gradient descent and random search algorithms. In: Proc. of ESANN 2011
Nitta T (2002) On the critical points of the complex-valued neural network. In: Proceedings of the 9th International Conference on Neural Information Processing, 2002. ICONIP’02, vol. 3, pp. 1099–1103. IEEE
Hirose A, Yoshida S (2012) Generalization characteristics of complex-valued feedforward neural networks in relation to signal coherence. IEEE Trans Neural Netw Learn Syst 23:541–551
Article Google Scholar
Danihelka I, Wayne G, Uria B, Kalchbrenner N, Graves A (2016) Associative long short-term memory. In: Proceedings of The 33rd International Conference on Machine Learning, vol. 48, pp. 1986–1994
Arjovsky M, Shah A, Bengio Y (2016) Unitary evolution recurrent neural networks. In: Proceedings of the 33rd International Conference on International Conference on Machine Learning - Volume 48. ICML’16, pp. 1120–1128
Gaudet CJ, Maida AS (2018) Deep quaternion networks. In: 2018 International Joint Conference on Neural Networks (IJCNN), pp. 1–8. IEEE
Zhu X, Xu Y, Xu H, Chen C (2018) Quaternion convolutional neural networks. In: The European Conference on Computer Vision (ECCV)
Parcollet T, Zhang Y, Morchid M, Trabelsi C, Linarès G, De Mori R, Bengio Y (2018) Quaternion convolutional neural networks for end-to-end automatic speech recognition. https://doi.org/10.21437/Interspeech.2018-1898
Ribeiro FDS, Leontidis G, Kollias SD (2020) Introducing routing uncertainty in capsule networks. In: NeurIPS
Hahn T, Pyeon M, Kim G (2019) Self-routing capsule networks. In: Wallach H, Larochelle H, Beygelzimer A, d’Alché-Buc F, Fox E, Garnett R (eds) Advances in neural information processing systems. Curran Associates Inc, USA
Google Scholar
Choi J, Seo H, Im S, Kang M (2019) Attention routing between capsules. In: Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops, pp. 0–0
Tsai Y-HH, Srivastava N, Goh H, Salakhutdinov R (2020) Capsules with inverted dot-product attention routing. http://arxiv.org/abs/2002.04764
Ahmed K, Torresani L (2019) Star-caps: Capsule networks with straight-through attentive routing. In: NeurIPS, pp. 9098–9107
Yu Z-X, He Y, Zhu C, Tian S, Yin X-C (2019) Carnet: Densely connected capsules with capsule-wise attention routing. In: Cyberspace Data and Intelligence, and Cyber-Living, Syndrome, and Health, pp. 309–320. Springer
Parcollet, T, Ravanelli, M, Morchid, M, Linarès, G, Trabelsi, C, De Mori, R, Bengio, Y.: Quaternion Recurrent Neural Networks. In: International Conference on Learning Representations (ICLR) (2019)
Laue, S, Mitterreiter, M, Giesen, J.: Computing higher order derivatives of matrix and tensor expressions. In: NeurIPS, pp. 2755–2764 (2018)
Jablonski B (2008) Anisotropic filtering of multidimensional rotational trajectories as a generalization of 2d diffusion process. Multidimens Syst Signal Process 19(3–4):379–399
Article MATH Google Scholar
Jabłoński, B.: Application of quaternion scale space approach for motion processing. In: Image Processing and Communications Challenges 3, pp. 141–148. Springer, (2011)
Nair, V, Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: ICML, pp. 807–814. Omnipress, (2010)
Vaswani, A, Shazeer, N, Parmar, N, Uszkoreit, J, Jones, L, Gomez, A.N, Kaiser, L, Polosukhin, I.: Attention is all you need. http://arxiv.org/abs/1706.03762 (2017)
He, K, Zhang, X, Ren, S, Sun, J.: Identity mappings in deep residual networks. In: European Conference on Computer Vision, pp. 630–645 (2016). Springer
Kinli FO, Kiraç FM (2020) Fashioncapsnet: clothing classification with capsule networks. Bilişim Teknolojileri Dergisi 13(1):87–96
Article Google Scholar
He, K, Zhang, X, Ren, S, Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Iandola, F, Moskewicz, M, Karayev, S, Girshick, R, Darrell, T, Keutzer, K.: Densenet: Implementing efficient convnet descriptor pyramids. http://arxiv.org/abs/1404.1869 (2014)
Iandola, F.N, Han, S, Moskewicz, M.W, Ashraf, K, Dally, W.J, Keutzer, K.: Squeezenet: Alexnet-level accuracy with 50x fewer parameters and \(<\) 0.5 mb model size. http://arxiv.org/abs/1602.07360 (2016)
Chang, A.X, Funkhouser, T, Guibas, L, Hanrahan, P, Huang, Q, Li, Z, Savarese, S, Savva, M, Song, S, Su, H, Xiao, J, Yi, L, Yu, F.: ShapeNet: An Information-Rich 3D Model Repository. Technical Report http://arxiv.org/abs/1512.03012 [cs.GR], Stanford University — Princeton University — Toyota Technological Institute at Chicago (2015)

Download references

Author information

Authors and Affiliations

Özyeğin University, Istanbul, Turkey
Barış Özcan, Furkan Kınlı & Furkan Kıraç

Authors

Barış Özcan
View author publications
You can also search for this author in PubMed Google Scholar
Furkan Kınlı
View author publications
You can also search for this author in PubMed Google Scholar
Furkan Kıraç
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Barış Özcan.

Ethics declarations

Conflict of interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Özcan, B., Kınlı, F. & Kıraç, F. Generalization to unseen viewpoint images of objects via alleviated pose attentive capsule agreement. Neural Comput & Applic 35, 3521–3536 (2023). https://doi.org/10.1007/s00521-022-07900-3

Download citation

Received: 18 March 2022
Accepted: 28 September 2022
Published: 13 October 2022
Issue Date: February 2023
DOI: https://doi.org/10.1007/s00521-022-07900-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Generalization to unseen viewpoint images of objects via alleviated pose attentive capsule agreement

Abstract

Access this article

Similar content being viewed by others

When and how convolutional neural networks generalize to out-of-distribution category–viewpoint combinations

Joint Viewpoint and Keypoint Estimation with Real and Synthetic Data

3DPointCaps++: Learning 3D Representations with Capsule Networks

Data Availability

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Generalization to unseen viewpoint images of objects via alleviated pose attentive capsule agreement

Abstract

Access this article

Similar content being viewed by others

When and how convolutional neural networks generalize to out-of-distribution category–viewpoint combinations

Joint Viewpoint and Keypoint Estimation with Real and Synthetic Data

3DPointCaps++: Learning 3D Representations with Capsule Networks

Data Availability

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation