Skip to main content
Log in

Revisiting spatial dropout for regularizing convolutional neural networks

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Overfitting is one of the most challenging problems in deep neural networks with a large number of trainable parameters. To prevent networks from overfitting, the dropout method, which is a strong regularization technique, has been widely used in fully-connected neural networks. In several state-of-the-art convolutional neural network architectures for object classification, however, dropout was partially or not even applied since its accuracy gain was relatively insignificant in most cases. Also, the batch normalization technique reduced the need for the dropout method because of its regularization effect. In this paper, we show that conventional element-wise dropout can be ineffective for convolutional layers. We found that dropout between channels in the CNNs can be functionally similar to dropout in the FCNNs, and spatial dropout can be an effective way to take advantage of the dropout technique for regularizing. To prove our points, we conducted several experiments using the CIFAR-10 and CIFAR-100 databases. For comparison, we only replaced the dropout layers with spatial dropout layers and kept all other hyperparameters and methods intact. DenseNet-BC with spatial dropout showed promising results (3.32% error rates with CIFAR-10, 3.0 M parameters) compared to other existing competitive methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  1. Deng, J, Dong, W, Socher, R, Li, LJ, Li, K, Fei-Fei, L, (2009). Imagenet: A large-scale hierarchical image database, Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE conference on. Ieee, pp 248–255.

  2. Fortunato, M, Blundell, C, Vinyals, O, (2017). Bayesian recurrent neural networks. arXiv preprint arXiv:1704.02798

  3. Gal, Y, Ghahramani, Z, (2016). A theoretically grounded application of dropout in recurrent neural networks, advances in neural information processing systems, pp. 1019-1027

  4. Ghiasi, G, Lin, TY, Le, QV, (2018). Dropblock: a regularization method for convolutional networks, Advances in Neural Information Processing Systems, pp 10727–10737.

  5. Gross, S, Wilber, M, (2016). Training and investigating residual nets. Facebook AI research, CA.[online]. Avilable: http://torch.ch/blog/2016/02/04/resnets.html

  6. He, K, Zhang, X, Ren, S, Sun, J, (2016). Deep residual learning for image recognition, proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770-778

  7. Hinton, GE, Srivastava, N, Krizhevsky, A, Sutskever, I, Salakhutdinov, RR, (2012). Improving neural networks by preventing co-adaptation of feature detectors. arXiv preprint arXiv:1207.0580

  8. Huang, G, Liu, Z, Weinberger, KQ, van der Maaten, L, (2016). Densely connected convolutional networks. arXiv preprint arXiv:1608.06993

  9. Huang, G, Liu, S, van der Maaten, L, Weinberger, KQ, (2017a). CondenseNet: An Efficient DenseNet using Learned Group Convolutions group 3, 11

  10. Huang, G, Liu, Z, Weinberger, KQ, van der Maaten, L, (2017b). Densely connected convolutional networks, Proceedings of the IEEE conference on computer vision and pattern recognition, p 3.

  11. Huang, Y, Cheng, Y, Chen, D, Lee, H, Ngiam, J, Le, QV, Chen, Z, (2018). Gpipe: efficient training of giant neural networks using pipeline parallelism. arXiv preprint arXiv:1811.06965

  12. Ioffe, S, Szegedy, C, (2015). Batch normalization: accelerating deep network training by reducing internal covariate shift. arXiv preprint arXiv:1502.03167

  13. Khan SH, Hayat M, Porikli F (2019) Regularization of deep neural networks with spectral dropout. Neural Netw 110:82–90

    Article  Google Scholar 

  14. Krizhevsky, A, (2009). Learning multiple layers of features from tiny images. Tech Rep

  15. Krizhevsky, A, Sutskever, I, Hinton, GE, (2012). Imagenet classification with deep convolutional neural networks, Advances in neural information processing systems, pp 1097–1105.

  16. LeCun Y, Boser B, Denker JS, Henderson D, Howard RE, Hubbard W, Jackel LD (1989) Backpropagation applied to handwritten zip code recognition. Neural Comput 1:541–551

    Article  Google Scholar 

  17. Lee, CY, Xie, S, Gallagher, P, Zhang, Z, Tu, Z, (2015). Deeply-supervised nets, artificial intelligence and statistics, pp. 562-570

  18. Nair, V, Hinton, GE, (2010). Rectified linear units improve restricted boltzmann machines, proceedings of the 27th international conference on machine learning (ICML-10), pp. 807-814

  19. Simonyan, K, Zisserman, A, (2014). Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556

  20. Srivastava N, Hinton G, Krizhevsky A, Sutskever I, Salakhutdinov R (2014) Dropout: a simple way to prevent neural networks from overfitting. J Mach Learn Res 15:1929–1958

    MathSciNet  MATH  Google Scholar 

  21. Tan, M, Le, QV, (2019). EfficientNet: rethinking model scaling for convolutional neural networks. arXiv preprint arXiv:1905.11946.

  22. Tompson, J, Goroshin, R, Jain, A, LeCun, Y, Bregler, C, (2015). Efficient object localization using convolutional networks, proceedings of the IEEE conference on computer vision and pattern recognition, pp. 648-656.

  23. Torralba A, Fergus R, Freeman WT (2008) 80 million tiny images: a large data set for nonparametric object and scene recognition. IEEE Trans Pattern Anal Mach Intell 30:1958–1970

    Article  Google Scholar 

  24. Xie, S, Girshick, R, Dollár, P, Tu, Z, He, K, (2017). Aggregated residual transformations for deep neural networks, computer vision and pattern recognition (CVPR), 2017 IEEE conference on. IEEE, pp. 5987-5995.

  25. Zagoruyko, S, Komodakis, N, (2016). Wide residual networks. arXiv preprint arXiv:1605.07146

  26. Zaremba, W, Sutskever, I, Vinyals, O, (2014). Recurrent neural network regularization. arXiv preprint arXiv:1409.2329

  27. Zoph, B, Vasudevan, V, Shlens, J, Le, QV, (2017). Learning transferable architectures for scalable image recognition. arXiv preprint arXiv:1707.07012

Download references

Acknowledgements

This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIT) (2017R1E1A2A01079495).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Chulhee Lee.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Lee, S., Lee, C. Revisiting spatial dropout for regularizing convolutional neural networks. Multimed Tools Appl 79, 34195–34207 (2020). https://doi.org/10.1007/s11042-020-09054-7

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-020-09054-7

Keywords

Navigation