The Impact of Activation Sparsity on Overfitting in Convolutional Neural Networks

Huesmann, Karim; Rodriguez, Luis Garcia; Linsen, Lars; Risse, Benjamin

doi:10.1007/978-3-030-68796-0_10

Karim Huesmann¹⁶,
Luis Garcia Rodriguez¹⁶,
Lars Linsen¹⁶ &
…
Benjamin Risse¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 12663))

Included in the following conference series:

International Conference on Pattern Recognition

2660 Accesses
1 Citations

Abstract

Overfitting is one of the fundamental challenges when training convolutional neural networks and is usually identified by a diverging training and test loss. The underlying dynamics of how the flow of activations induce overfitting is however poorly understood. In this study we introduce a perplexity-based sparsity definition to derive and visualise layer-wise activation measures. These novel explainable AI strategies reveal a surprising relationship between activation sparsity and overfitting, namely an increase in sparsity in the feature extraction layers shortly before the test loss starts rising. This tendency is preserved across network architectures and reguralisation strategies so that our measures can be used as a reliable indicator for overfitting while decoupling the network’s generalisation capabilities from its loss-based definition. Moreover, our differentiable sparsity formulation can be used to explicitly penalise the emergence of sparsity during training so that the impact of reduced sparsity on overfitting can be studied in real-time. Applying this penalty and analysing activation sparsity for well known regularisers and in common network architectures supports the hypothesis that reduced activation sparsity can effectively improve the generalisation and classification performance. In line with other recent work on this topic, our methods reveal novel insights into the contradicting concepts of activation sparsity and network capacity by demonstrating that dense activations can enable discriminative feature learning while efficiently exploiting the capacity of deep models without suffering from overfitting, even when trained excessively.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Ahmad, S., Scheinkman, L.: How can we be so dense? The benefits of using highly sparse representations. arXiv preprint arXiv:1903.11257 (2019)
Ayinde, B.O., Inanc, T., Zurada, J.M.: On correlation of features extracted by deep neural networks. In: International Joint Conference on Neural Networks (IJCNN), Proceedings, pp. 1–8 (2019)
Google Scholar
Ayinde, B.O., Inanc, T., Zurada, J.M.: Regularizing deep neural networks by enhancing diversity in feature extraction. IEEE Trans. Neural Netw. Learn. Syst. 30, 2650–2661 (2019)
Article Google Scholar
Ayinde, B.O., Zurada, J.M.: Deep learning of constrained autoencoders for enhanced understanding of data. IEEE Trans. Neural Netw. Learn. Syst. 29(9), 3969–3979 (2017)
Article Google Scholar
Bao, Y., Jiang, H., Dai, L., Liu, C.: Incoherent training of deep neural networks to de-correlate bottleneck features for speech recognition. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 6980–6984. IEEE (2013)
Google Scholar
Bengio, Y., Bergstra, J.S.: Slow, decorrelated features for pretraining complex cell-like networks. In: Advances in Neural Information Processing Systems, pp. 99–107 (2009)
Google Scholar
Changpinyo, S., Sandler, M., Zhmoginov, A.: The power of sparsity in convolutional neural networks. arXiv preprint arXiv:1702.06257 (2017)
Chollet, F.: Xception: deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017)
Google Scholar
Cogswell, M., Ahmed, F., Girshick, R., Zitnick, L., Batra, D.: Reducing overfitting in deep networks by decorrelating representations. arXiv preprint arXiv:1511.06068 (2015)
Cubuk, E.D., Zoph, B., Mane, D., Vasudevan, V., Le, Q.V.: AutoAugment: learning augmentation strategies from data. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Proceedings, pp. 113–123 (2019)
Google Scholar
Frankle, J., Carbin, M.: The lottery ticket hypothesis: finding sparse, trainable neural networks. In: International Conference on Learning Representations (2018)
Google Scholar
Gale, T., Elsen, E., Hooker, S.: The state of sparsity in deep neural networks. arXiv preprint arXiv:1902.09574 (2019)
Gavrilov, A.D., Jordache, A., Vasdani, M., Deng, J.: Preventing model overfitting and underfitting in convolutional neural networks. Int. J. Softw. Sci. Comput. Intell. (IJSSCI) 10(4), 19–28 (2018)
Article Google Scholar
Gomez, A.N., et al.: Learning sparse networks using targeted dropout. arXiv (2019)
Google Scholar
Goodfellow, I.J., Bengio, Y., Courville, A.C.: Deep Learning. Adaptive Computation and Machine Learning. MIT Press, Cambridge (2016)
MATH Google Scholar
Guo, H., Mao, Y., Zhang, R.: Mixup as locally linear out-of-manifold regularization. In: AAAI Conference on Artificial Intelligence (AAAI), Proceedings, pp. 3714–3722 (2019)
Google Scholar
Guo, Y., Zhang, C., Zhang, C., Chen, Y.: Sparse DNNs with improved adversarial robustness. In: Advances in Neural Information Processing Systems, pp. 242–251 (2018)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Google Scholar
Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning (ICML), Proceedings, pp. 448–456 (2015)
Google Scholar
Khan, A., Sohail, A., Zahoora, U., Qureshi, A.S.: A survey of the recent architectures of deep convolutional neural networks. Artif. Intell. Rev. 53(8), 5455–5516 (2020). https://doi.org/10.1007/s10462-020-09825-6
Article Google Scholar
Klemm, S., Ortkemper, R.D., Jiang, X.: Deploying deep learning into practice: a case study on fundus segmentation. In: Zheng, Y., Williams, B.M., Chen, K. (eds.) MIUA 2019. CCIS, vol. 1065, pp. 411–422. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-39343-4_35
Chapter Google Scholar
Li, M., Soltanolkotabi, M., Oymak, S.: Gradient descent with early stopping is provably robust to label noise for overparameterized neural networks. In: International Conference on Artificial Intelligence and Statistics, pp. 4313–4324. PMLR (2020)
Google Scholar
Liu, Z., Li, J., Shen, Z., Huang, G., Yan, S., Zhang, C.: Learning efficient convolutional networks through network slimming. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2736–2744 (2017)
Google Scholar
Liu, Z., Sun, M., Zhou, T., Huang, G., Darrell, T.: Rethinking the value of network pruning. In: International Conference on Learning Representations (2018)
Google Scholar
Mehta, D., Kim, K.I., Theobalt, C.: On implicit filter level sparsity in convolutional neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 520–528 (2019)
Google Scholar
Miller, D.J., Rao, A.V., Rose, K., Gersho, A.: A global optimization technique for statistical classifier design. IEEE Trans. Signal Process. 44(12), 3108–3122 (1996)
Article Google Scholar
Molchanov, P., Tyree, S., Karras, T., Aila, T., Kautz, J.: Pruning convolutional neural networks for resource efficient inference. In: 5th International Conference on Learning Representations, ICLR 2017-Conference Track Proceedings (2019)
Google Scholar
Nowlan, S.J., Hinton, G.E.: Simplifying neural networks by soft weight-sharing. Neural Comput. 4(4), 473–493 (1992)
Article Google Scholar
Pereyra, G., Tucker, G., Chorowski, J., Kaiser, L., Hinton, G.E.: Regularizing neural networks by penalizing confident output distributions. In: International Conference on Learning Representations (ICLR), Proceedings (2017)
Google Scholar
Prechelt, L.: Early stopping - but when? In: Orr, G.B., Müller, K.-R. (eds.) Neural Networks: Tricks of the Trade. LNCS, vol. 1524, pp. 55–69. Springer, Heidelberg (1998). https://doi.org/10.1007/3-540-49430-8_3
Chapter Google Scholar
Rhu, M., O’Connor, M., Chatterjee, N., Pool, J., Kwon, Y., Keckler, S.W.: Compressing DMA engine: leveraging activation sparsity for training deep neural networks. In: 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA), pp. 78–91 (2018)
Google Scholar
Seelig, J.D., et al.: Two-photon calcium imaging from head-fixed drosophila during optomotor walking behavior. Nat. Methods 7(7), 535 (2010)
Article Google Scholar
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. Computational and Biological Learning Society (2015)
Google Scholar
Srivastava, N., Hinton, G.E., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. (JMLR) 15(1), 1929–1958 (2014)
MathSciNet MATH Google Scholar
Tu, Z., et al.: A survey of variational and CNN-based optical flow techniques. Sig. Process. Image Commun. 72, 9–24 (2019)
Article Google Scholar
Vinje, W.E., Gallant, J.L.: Sparse coding and decorrelation in primary visual cortex during natural vision. Science 287(5456), 1273–1276 (2000)
Article Google Scholar
Werpachowski, R., György, A., Szepesvári, C.: Detecting overfitting via adversarial examples. In: Advances in Neural Information Processing Systems, pp. 7856–7866 (2019)
Google Scholar
Yaguchi, A., Suzuki, T., Asano, W., Nitta, S., Sakata, Y., Tanizawa, A.: Adam induces implicit weight sparsity in rectifier neural networks. In: IEEE International Conference on Machine Learning and Applications (ICMLA), Proceedings, pp. 318–325 (2018)
Google Scholar
Yang, Q., Mao, J., Wang, Z., Li, H.: DASNet: dynamic activation sparsity for neural network efficiency improvement. In: 2019 IEEE 31st International Conference on Tools with Artificial Intelligence (ICTAI), pp. 1401–1405. IEEE (2019)
Google Scholar
Zhang, C., Bengio, S., Hardt, M., Recht, B., Vinyals, O.: Understanding deep learning requires rethinking generalization. In: International Conference on Learning Representations (ICLR), Proceedings (2017)
Google Scholar
Zhang, C., Vinyals, O., Munos, R., Bengio, S.: A study on overfitting in deep reinforcement learning. arXiv preprint arXiv:1804.06893 (2018)
Zhou, H., Lan, J., Liu, R., Yosinski, J.: Deconstructing lottery tickets: zeros, signs, and the supermask. In: Advances in Neural Information Processing Systems, pp. 3592–3602 (2019)
Google Scholar

Download references

Acknowledgements

BR would like to thank the Ministeriums für Kultur und Wissenschaft des Landes Nordrhein-Westfalen for the AI Starter support (ID 005-2010-005). Moreover, the authors would like to sincerely thank Sören Klemm for his valuable ideas and input throughout this project and the WWU IT for the usage of the PALMA2 supercomputer. This work was partially supported by the Deutsche Forschungsgemeinschaft (DFG) under contract LI 1530/21-2.

Author information

Authors and Affiliations

Faculty of Mathematics and Computer Science, University of Muenster, Münster, Germany
Karim Huesmann, Luis Garcia Rodriguez, Lars Linsen & Benjamin Risse

Authors

Karim Huesmann
View author publications
You can also search for this author in PubMed Google Scholar
Luis Garcia Rodriguez
View author publications
You can also search for this author in PubMed Google Scholar
Lars Linsen
View author publications
You can also search for this author in PubMed Google Scholar
Benjamin Risse
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Benjamin Risse .

Editor information

Editors and Affiliations

Dipartimento di Ingegneria dell’Informazione, University of Firenze, Firenze, Italy
Alberto Del Bimbo
Dipartimento di Ingegneria “Enzo Ferrari”, Università di Modena e Reggio Emilia, Modena, Italy
Rita Cucchiara
Department of Computer Science, Boston University, Boston, MA, USA
Stan Sclaroff
Dipartimento di Matematica e Informatica, University of Catania, Catania, Italy
Giovanni Maria Farinella
Cloud & AI, JD.COM, Beijing, China
Tao Mei
Dipartimento di Ingegneria dell’Informazione, University of Firenze, Firenze, Italy
Marco Bertini
Computational Sciences Department, National Institute of Astrophysics, Optics and Electronics (INAOE), Tonantzintla, Puebla, Mexico
Hugo Jair Escalante
Dipartimento di Ingegneria “Enzo Ferrari”, Università di Modena e Reggio Emilia, Modena, Italy
Roberto Vezzani

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Huesmann, K., Rodriguez, L.G., Linsen, L., Risse, B. (2021). The Impact of Activation Sparsity on Overfitting in Convolutional Neural Networks. In: Del Bimbo, A., et al. Pattern Recognition. ICPR International Workshops and Challenges. ICPR 2021. Lecture Notes in Computer Science(), vol 12663. Springer, Cham. https://doi.org/10.1007/978-3-030-68796-0_10

Download citation

DOI: https://doi.org/10.1007/978-3-030-68796-0_10
Published: 21 February 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-68795-3
Online ISBN: 978-3-030-68796-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The International Association for Pattern Recognition (opens in a new tab)