Abstract
We argue that many properties of fully-connected feedforward neural networks (FCNNs), also called multi-layer perceptrons (MLPs), are explainable from the analysis of a single pair of operations, namely a random projection into a higher-dimensional space than the input, followed by a sparsification operation. For convenience, we call this pair of successive operations expand-and-sparsify following the terminology of Dasgupta. We show how expand-and-sparsify can explain the observed phenomena that have been discussed in the literature, such as the so-called Lottery Ticket Hypothesis, the surprisingly good performance of randomly-initialized untrained neural networks, the efficacy of Dropout in training and most importantly, the mysterious generalization ability of overparameterized models, first highlighted by Zhang et al. and subsequently identified even in non-neural network models by Belkin et al.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Subsequently this property was also reported in non-neural network ML models by Belkin et al. [7].
- 2.
The latter result, however, is proved for a sparsification yielding the same number of significant entries as the dimension of the input, and relies on an approximation of the Binomial by the Normal distribution whose accuracy was not validated in the paper.
- 3.
For example, the threshold entries may be selected such that on average, k of the d entries of the projected vector exceed their corresponding thresholds [1].
- 4.
In [1], the authors say that this is to ensure that every input in the submanifold excites at least one neuron in the hidden layer.
- 5.
- 6.
The observation that a trained neural network can be pruned aggressively without significant impact on performance has been previously noted in the literature and described as part of the Lottery Ticket Hypothesis [2].
References
Dasgupta, S., Tosh, C.: Expressivity of expand-and-sparsify representations (2020). arXiv:2006.03741
Frankle, J., Carbin, M.: The lottery ticket hypothesis–finding sparse, trainable neural networks. In: ICLR (2019). https://openreview.net/forum?id=rJl-b3RcF7
Frankle, J., Schwab, D.J., Morcos, A.S.: Training batchnorm and only batchnorm: on the expressive power of random features in CNNs (2020). arXiv.org/abs/2003.00152
Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15(56), 1929–1958 (2014)
Zhang, C., Bengio, S., Hardt, M., Recht, B., Vinyals, O.: Understanding deep learning requires rethinking generalization. In: Proceedings of 5th International Conference on Learning Representations (ICLR 2017). https://openreview.net/forum?id=Sy8gdB9xx
Zhang, C., Bengio, S., Hardt, M., Recht, B., Vinyals, O.: Understanding deep learning (still) requires rethinking generalization. Commun. ACM 64(3), 107–115 (2021)
Belkin, M., Hsu, D., Ma, S., Mandal, S.: Reconciling modern machine-learning practice and the classical bias-variance trade-off. Proc. Natl. Acad. Sci. 116(32), 15849–15854 (2019)
Bartlett, P.L., Foster, D.J., Telgarsky, M.J.: Spectrally-normalized margin bounds for neural networks. In: Advances in Neural Information Processing Systems 30 (NeurIPS 2017) (2017)
Chizat, L., Oyallon, E., Bach, F.: On lazy training in differentiable programming. In: Advances in Neural Information Processing Systems (NeurIPS 2019), vol. 32 (2019)
Roberts, D.A., Yaida, S., Hanin, B.: The Principles of Deep Learning Theory. Cambridge University Press (2022)
Ho, T.K.: The random subspace method for constructing decision forests. IEEE Trans. Pattern Anal. Mach. Intell. 20(8), 832–844 (1998)
Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)
Rahimi, A., Recht, B.: Random features for large-scale kernel machines. In: Advances in Neural Information Processing Systems 20 (NeurIPS 2007) (2007)
Dasgupta, S., Gupta, A.: An elementary proof of a theorem of Johnson and Lindenstrauss. Random Struct. Algorithms 22(1), 60–65 (2003)
Wikipedia: Johnson-Lindenstrauss lemma. https://en.wikipedia.org/wiki/Johnson-Lindenstrauss_lemma
Dasgupta, S., Stevens, C.F., Navlakha, S.: A neural algorithm for a fundamental computing problem. Science 358, 793–796 (2017)
Papadimitriou, C.H., Vempala, S.S.: Random projection in the brain and computation with assemblies of neurons. In: 10th Innovations in Theoretical Computer Science (ITCS 2019), vol. 57, pp. 1–19 (2019)
Dasgupta, S., Sheehan, T.C., Stevens, C.F., Navlakha, S.: A neural data structure for novelty detection. Proc. Natl. Acad. Sci. 115(51), 13093–13098 (2018)
Kainen, P.C., Kurková, V.: Quasiorthogonal dimension of Euclidean spaces. Appl. Math. Lett. 6(3), 7–10 (1993)
Gorban, A.N., Tyukin, I.Y., Prokhorov, D.V., Sofeikov, K.I.: Approximation with random bases: Pro et Contra. Inf. Sci. 364–365, 129–145 (2016)
Glorot, X., Bengio, Y.: Understanding the difficulty of training deep feedforward neural networks. In: Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, published in Proceedings of Machine Learning Research, vol. 9, pp. 249–256 (2010)
He, K., Zhang, X., Ren, S., Sun, J.: Delving deep into rectifiers: surpassing human-level performance on imagenet classification. In: ICCV, pp. 1026–1034 (2015)
LeCun, Y., Cortes, C., Burges, C.J.C.: The MNIST database of handwritten digits. http://yann.lecun.com/exdb/mnist/
Bartlett, P., Montanari, A., Rakhlin, A.: Deep learning: a statistical viewpoint. Acta Numer 30, 87–201 (2021)
Hara, K., Saitoh, D., Shouno, H.: Analysis of dropout learning regarded as ensemble learning. In: Proceedings of 25th International Conference Artificial Neural Networks (ICANN 2016)
Burkholz, R., Laha, N., Mukherjee, R., Gotovos, A.: On the existence of universal lottery tickets. In: Proceedings of 10th International Conference on Learning Representations (ICLR 2022). http://openreview.net/forum?id=SYB4WrJql1n
Burkholz, R.: Convolutional and residual networks provably contain lottery tickets. In: ICML 2022. Proceedings of Machine Learning Research, vol. 162, pp. 2414–2433 (2022)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Mukherjee, S., Huberman, B.A. (2024). Why Neural Networks Work. In: Arai, K. (eds) Intelligent Systems and Applications. IntelliSys 2023. Lecture Notes in Networks and Systems, vol 823. Springer, Cham. https://doi.org/10.1007/978-3-031-47724-9_24
Download citation
DOI: https://doi.org/10.1007/978-3-031-47724-9_24
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-47723-2
Online ISBN: 978-3-031-47724-9
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)