Skip to main content

Why Neural Networks Work

  • Conference paper
  • First Online:
Intelligent Systems and Applications (IntelliSys 2023)

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 823))

Included in the following conference series:

  • 64 Accesses

Abstract

We argue that many properties of fully-connected feedforward neural networks (FCNNs), also called multi-layer perceptrons (MLPs), are explainable from the analysis of a single pair of operations, namely a random projection into a higher-dimensional space than the input, followed by a sparsification operation. For convenience, we call this pair of successive operations expand-and-sparsify following the terminology of Dasgupta. We show how expand-and-sparsify can explain the observed phenomena that have been discussed in the literature, such as the so-called Lottery Ticket Hypothesis, the surprisingly good performance of randomly-initialized untrained neural networks, the efficacy of Dropout in training and most importantly, the mysterious generalization ability of overparameterized models, first highlighted by Zhang et al. and subsequently identified even in non-neural network models by Belkin et al.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 149.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 199.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Subsequently this property was also reported in non-neural network ML models by Belkin et al. [7].

  2. 2.

    The latter result, however, is proved for a sparsification yielding the same number of significant entries as the dimension of the input, and relies on an approximation of the Binomial by the Normal distribution whose accuracy was not validated in the paper.

  3. 3.

    For example, the threshold entries may be selected such that on average, k of the d entries of the projected vector exceed their corresponding thresholds [1].

  4. 4.

    In [1], the authors say that this is to ensure that every input in the submanifold excites at least one neuron in the hidden layer.

  5. 5.

    These perturbations have been analyzed asymptotically by Chizat and Bach in [9] and more realistically in great detail in the book-length treatment of Roberts and Yaida [10].

  6. 6.

    The observation that a trained neural network can be pruned aggressively without significant impact on performance has been previously noted in the literature and described as part of the Lottery Ticket Hypothesis [2].

References

  1. Dasgupta, S., Tosh, C.: Expressivity of expand-and-sparsify representations (2020). arXiv:2006.03741

  2. Frankle, J., Carbin, M.: The lottery ticket hypothesis–finding sparse, trainable neural networks. In: ICLR (2019). https://openreview.net/forum?id=rJl-b3RcF7

  3. Frankle, J., Schwab, D.J., Morcos, A.S.: Training batchnorm and only batchnorm: on the expressive power of random features in CNNs (2020). arXiv.org/abs/2003.00152

  4. Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15(56), 1929–1958 (2014)

    MathSciNet  Google Scholar 

  5. Zhang, C., Bengio, S., Hardt, M., Recht, B., Vinyals, O.: Understanding deep learning requires rethinking generalization. In: Proceedings of 5th International Conference on Learning Representations (ICLR 2017). https://openreview.net/forum?id=Sy8gdB9xx

  6. Zhang, C., Bengio, S., Hardt, M., Recht, B., Vinyals, O.: Understanding deep learning (still) requires rethinking generalization. Commun. ACM 64(3), 107–115 (2021)

    Article  Google Scholar 

  7. Belkin, M., Hsu, D., Ma, S., Mandal, S.: Reconciling modern machine-learning practice and the classical bias-variance trade-off. Proc. Natl. Acad. Sci. 116(32), 15849–15854 (2019)

    Article  MathSciNet  Google Scholar 

  8. Bartlett, P.L., Foster, D.J., Telgarsky, M.J.: Spectrally-normalized margin bounds for neural networks. In: Advances in Neural Information Processing Systems 30 (NeurIPS 2017) (2017)

    Google Scholar 

  9. Chizat, L., Oyallon, E., Bach, F.: On lazy training in differentiable programming. In: Advances in Neural Information Processing Systems (NeurIPS 2019), vol. 32 (2019)

    Google Scholar 

  10. Roberts, D.A., Yaida, S., Hanin, B.: The Principles of Deep Learning Theory. Cambridge University Press (2022)

    Google Scholar 

  11. Ho, T.K.: The random subspace method for constructing decision forests. IEEE Trans. Pattern Anal. Mach. Intell. 20(8), 832–844 (1998)

    Article  Google Scholar 

  12. Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)

    Article  Google Scholar 

  13. Rahimi, A., Recht, B.: Random features for large-scale kernel machines. In: Advances in Neural Information Processing Systems 20 (NeurIPS 2007) (2007)

    Google Scholar 

  14. Dasgupta, S., Gupta, A.: An elementary proof of a theorem of Johnson and Lindenstrauss. Random Struct. Algorithms 22(1), 60–65 (2003)

    Article  MathSciNet  Google Scholar 

  15. Wikipedia: Johnson-Lindenstrauss lemma. https://en.wikipedia.org/wiki/Johnson-Lindenstrauss_lemma

  16. Dasgupta, S., Stevens, C.F., Navlakha, S.: A neural algorithm for a fundamental computing problem. Science 358, 793–796 (2017)

    Article  MathSciNet  Google Scholar 

  17. Papadimitriou, C.H., Vempala, S.S.: Random projection in the brain and computation with assemblies of neurons. In: 10th Innovations in Theoretical Computer Science (ITCS 2019), vol. 57, pp. 1–19 (2019)

    Google Scholar 

  18. Dasgupta, S., Sheehan, T.C., Stevens, C.F., Navlakha, S.: A neural data structure for novelty detection. Proc. Natl. Acad. Sci. 115(51), 13093–13098 (2018)

    Article  MathSciNet  Google Scholar 

  19. Kainen, P.C., Kurková, V.: Quasiorthogonal dimension of Euclidean spaces. Appl. Math. Lett. 6(3), 7–10 (1993)

    Article  MathSciNet  Google Scholar 

  20. Gorban, A.N., Tyukin, I.Y., Prokhorov, D.V., Sofeikov, K.I.: Approximation with random bases: Pro et Contra. Inf. Sci. 364–365, 129–145 (2016)

    Article  Google Scholar 

  21. Glorot, X., Bengio, Y.: Understanding the difficulty of training deep feedforward neural networks. In: Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, published in Proceedings of Machine Learning Research, vol. 9, pp. 249–256 (2010)

    Google Scholar 

  22. He, K., Zhang, X., Ren, S., Sun, J.: Delving deep into rectifiers: surpassing human-level performance on imagenet classification. In: ICCV, pp. 1026–1034 (2015)

    Google Scholar 

  23. LeCun, Y., Cortes, C., Burges, C.J.C.: The MNIST database of handwritten digits. http://yann.lecun.com/exdb/mnist/

  24. Bartlett, P., Montanari, A., Rakhlin, A.: Deep learning: a statistical viewpoint. Acta Numer 30, 87–201 (2021)

    Article  MathSciNet  Google Scholar 

  25. Hara, K., Saitoh, D., Shouno, H.: Analysis of dropout learning regarded as ensemble learning. In: Proceedings of 25th International Conference Artificial Neural Networks (ICANN 2016)

    Google Scholar 

  26. Burkholz, R., Laha, N., Mukherjee, R., Gotovos, A.: On the existence of universal lottery tickets. In: Proceedings of 10th International Conference on Learning Representations (ICLR 2022). http://openreview.net/forum?id=SYB4WrJql1n

  27. Burkholz, R.: Convolutional and residual networks provably contain lottery tickets. In: ICML 2022. Proceedings of Machine Learning Research, vol. 162, pp. 2414–2433 (2022)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sayandev Mukherjee .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Mukherjee, S., Huberman, B.A. (2024). Why Neural Networks Work. In: Arai, K. (eds) Intelligent Systems and Applications. IntelliSys 2023. Lecture Notes in Networks and Systems, vol 823. Springer, Cham. https://doi.org/10.1007/978-3-031-47724-9_24

Download citation

Publish with us

Policies and ethics