Why Neural Networks Work

Mukherjee, Sayandev; Huberman, Bernardo A.

doi:10.1007/978-3-031-47724-9_24

Sayandev Mukherjee¹⁰ &
Bernardo A. Huberman¹⁰

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 823))

Included in the following conference series:

Proceedings of SAI Intelligent Systems Conference

64 Accesses

Abstract

We argue that many properties of fully-connected feedforward neural networks (FCNNs), also called multi-layer perceptrons (MLPs), are explainable from the analysis of a single pair of operations, namely a random projection into a higher-dimensional space than the input, followed by a sparsification operation. For convenience, we call this pair of successive operations expand-and-sparsify following the terminology of Dasgupta. We show how expand-and-sparsify can explain the observed phenomena that have been discussed in the literature, such as the so-called Lottery Ticket Hypothesis, the surprisingly good performance of randomly-initialized untrained neural networks, the efficacy of Dropout in training and most importantly, the mysterious generalization ability of overparameterized models, first highlighted by Zhang et al. and subsequently identified even in non-neural network models by Belkin et al.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 149.00; Price excludes VAT (USA)

Softcover Book: USD 199.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Subsequently this property was also reported in non-neural network ML models by Belkin et al. [7].
2.
The latter result, however, is proved for a sparsification yielding the same number of significant entries as the dimension of the input, and relies on an approximation of the Binomial by the Normal distribution whose accuracy was not validated in the paper.
3.
For example, the threshold entries may be selected such that on average, k of the d entries of the projected vector exceed their corresponding thresholds [1].
4.
In [1], the authors say that this is to ensure that every input in the submanifold excites at least one neuron in the hidden layer.
5.
These perturbations have been analyzed asymptotically by Chizat and Bach in [9] and more realistically in great detail in the book-length treatment of Roberts and Yaida [10].
6.
The observation that a trained neural network can be pruned aggressively without significant impact on performance has been previously noted in the literature and described as part of the Lottery Ticket Hypothesis [2].

References

Dasgupta, S., Tosh, C.: Expressivity of expand-and-sparsify representations (2020). arXiv:2006.03741
Frankle, J., Carbin, M.: The lottery ticket hypothesis–finding sparse, trainable neural networks. In: ICLR (2019). https://openreview.net/forum?id=rJl-b3RcF7
Frankle, J., Schwab, D.J., Morcos, A.S.: Training batchnorm and only batchnorm: on the expressive power of random features in CNNs (2020). arXiv.org/abs/2003.00152
Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15(56), 1929–1958 (2014)
MathSciNet Google Scholar
Zhang, C., Bengio, S., Hardt, M., Recht, B., Vinyals, O.: Understanding deep learning requires rethinking generalization. In: Proceedings of 5th International Conference on Learning Representations (ICLR 2017). https://openreview.net/forum?id=Sy8gdB9xx
Zhang, C., Bengio, S., Hardt, M., Recht, B., Vinyals, O.: Understanding deep learning (still) requires rethinking generalization. Commun. ACM 64(3), 107–115 (2021)
Article Google Scholar
Belkin, M., Hsu, D., Ma, S., Mandal, S.: Reconciling modern machine-learning practice and the classical bias-variance trade-off. Proc. Natl. Acad. Sci. 116(32), 15849–15854 (2019)
Article MathSciNet Google Scholar
Bartlett, P.L., Foster, D.J., Telgarsky, M.J.: Spectrally-normalized margin bounds for neural networks. In: Advances in Neural Information Processing Systems 30 (NeurIPS 2017) (2017)
Google Scholar
Chizat, L., Oyallon, E., Bach, F.: On lazy training in differentiable programming. In: Advances in Neural Information Processing Systems (NeurIPS 2019), vol. 32 (2019)
Google Scholar
Roberts, D.A., Yaida, S., Hanin, B.: The Principles of Deep Learning Theory. Cambridge University Press (2022)
Google Scholar
Ho, T.K.: The random subspace method for constructing decision forests. IEEE Trans. Pattern Anal. Mach. Intell. 20(8), 832–844 (1998)
Article Google Scholar
Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)
Article Google Scholar
Rahimi, A., Recht, B.: Random features for large-scale kernel machines. In: Advances in Neural Information Processing Systems 20 (NeurIPS 2007) (2007)
Google Scholar
Dasgupta, S., Gupta, A.: An elementary proof of a theorem of Johnson and Lindenstrauss. Random Struct. Algorithms 22(1), 60–65 (2003)
Article MathSciNet Google Scholar
Wikipedia: Johnson-Lindenstrauss lemma. https://en.wikipedia.org/wiki/Johnson-Lindenstrauss_lemma
Dasgupta, S., Stevens, C.F., Navlakha, S.: A neural algorithm for a fundamental computing problem. Science 358, 793–796 (2017)
Article MathSciNet Google Scholar
Papadimitriou, C.H., Vempala, S.S.: Random projection in the brain and computation with assemblies of neurons. In: 10th Innovations in Theoretical Computer Science (ITCS 2019), vol. 57, pp. 1–19 (2019)
Google Scholar
Dasgupta, S., Sheehan, T.C., Stevens, C.F., Navlakha, S.: A neural data structure for novelty detection. Proc. Natl. Acad. Sci. 115(51), 13093–13098 (2018)
Article MathSciNet Google Scholar
Kainen, P.C., Kurková, V.: Quasiorthogonal dimension of Euclidean spaces. Appl. Math. Lett. 6(3), 7–10 (1993)
Article MathSciNet Google Scholar
Gorban, A.N., Tyukin, I.Y., Prokhorov, D.V., Sofeikov, K.I.: Approximation with random bases: Pro et Contra. Inf. Sci. 364–365, 129–145 (2016)
Article Google Scholar
Glorot, X., Bengio, Y.: Understanding the difficulty of training deep feedforward neural networks. In: Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, published in Proceedings of Machine Learning Research, vol. 9, pp. 249–256 (2010)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Delving deep into rectifiers: surpassing human-level performance on imagenet classification. In: ICCV, pp. 1026–1034 (2015)
Google Scholar
LeCun, Y., Cortes, C., Burges, C.J.C.: The MNIST database of handwritten digits. http://yann.lecun.com/exdb/mnist/
Bartlett, P., Montanari, A., Rakhlin, A.: Deep learning: a statistical viewpoint. Acta Numer 30, 87–201 (2021)
Article MathSciNet Google Scholar
Hara, K., Saitoh, D., Shouno, H.: Analysis of dropout learning regarded as ensemble learning. In: Proceedings of 25th International Conference Artificial Neural Networks (ICANN 2016)
Google Scholar
Burkholz, R., Laha, N., Mukherjee, R., Gotovos, A.: On the existence of universal lottery tickets. In: Proceedings of 10th International Conference on Learning Representations (ICLR 2022). http://openreview.net/forum?id=SYB4WrJql1n
Burkholz, R.: Convolutional and residual networks provably contain lottery tickets. In: ICML 2022. Proceedings of Machine Learning Research, vol. 162, pp. 2414–2433 (2022)
Google Scholar

Download references

Author information

Authors and Affiliations

Next-Gen Systems, CableLabs, USA
Sayandev Mukherjee & Bernardo A. Huberman

Authors

Sayandev Mukherjee
View author publications
You can also search for this author in PubMed Google Scholar
Bernardo A. Huberman
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sayandev Mukherjee .

Editor information

Editors and Affiliations

Faculty of Science and Engineering, Saga University, Saga, Japan
Kohei Arai

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Mukherjee, S., Huberman, B.A. (2024). Why Neural Networks Work. In: Arai, K. (eds) Intelligent Systems and Applications. IntelliSys 2023. Lecture Notes in Networks and Systems, vol 823. Springer, Cham. https://doi.org/10.1007/978-3-031-47724-9_24

Download citation

DOI: https://doi.org/10.1007/978-3-031-47724-9_24
Published: 19 April 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-47723-2
Online ISBN: 978-3-031-47724-9
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics