Why Capsule Neural Networks Do Not Scale: Challenging the Dynamic Parse-Tree Assumption

Authors

  • Matthias Mitterreiter Friedrich-Schiller-University, Jena, Germany Data Assessment Solutions GmbH, Hannover, Germany
  • Marcel Koch Ernst Abbe University of Applied Sciences, Jena, Germany
  • Joachim Giesen Friedrich-Schiller-University, Jena, Germany
  • Sören Laue Technical University Kaiserslautern, Germany

DOI:

https://doi.org/10.1609/aaai.v37i8.26104

Keywords:

ML: Representation Learning, CV: Representation Learning for Vision, ML: Deep Neural Architectures, ML: Scalability of ML Systems

Abstract

Capsule neural networks replace simple, scalar-valued neurons with vector-valued capsules. They are motivated by the pattern recognition system in the human brain, where complex objects are decomposed into a hierarchy of simpler object parts. Such a hierarchy is referred to as a parse-tree. Conceptually, capsule neural networks have been defined to mimic this behavior. The capsule neural network (CapsNet), by Sabour, Frosst, and Hinton, is the first actual implementation of the conceptual idea of capsule neural networks. CapsNets achieved state-of-the-art performance on simple image recognition tasks with fewer parameters and greater robustness to affine transformations than comparable approaches. This sparked extensive follow-up research. However, despite major efforts, no work was able to scale the CapsNet architecture to more reasonable-sized datasets. Here, we provide a reason for this failure and argue that it is most likely not possible to scale CapsNets beyond toy examples. In particular, we show that the concept of a parse-tree, the main idea behind capsule neuronal networks, is not present in CapsNets. We also show theoretically and experimentally that CapsNets suffer from a vanishing gradient problem that results in the starvation of many capsules during training.

Downloads

Published

2023-06-26

How to Cite

Mitterreiter, M., Koch, M., Giesen, J., & Laue, S. (2023). Why Capsule Neural Networks Do Not Scale: Challenging the Dynamic Parse-Tree Assumption. Proceedings of the AAAI Conference on Artificial Intelligence, 37(8), 9209-9216. https://doi.org/10.1609/aaai.v37i8.26104

Issue

Section

AAAI Technical Track on Machine Learning III