Abstract
Overfitting is one of the most challenging problems in deep neural networks with a large number of trainable parameters. To prevent networks from overfitting, the dropout method, which is a strong regularization technique, has been widely used in fully-connected neural networks. In several state-of-the-art convolutional neural network architectures for object classification, however, dropout was partially or not even applied since its accuracy gain was relatively insignificant in most cases. Also, the batch normalization technique reduced the need for the dropout method because of its regularization effect. In this paper, we show that conventional element-wise dropout can be ineffective for convolutional layers. We found that dropout between channels in the CNNs can be functionally similar to dropout in the FCNNs, and spatial dropout can be an effective way to take advantage of the dropout technique for regularizing. To prove our points, we conducted several experiments using the CIFAR-10 and CIFAR-100 databases. For comparison, we only replaced the dropout layers with spatial dropout layers and kept all other hyperparameters and methods intact. DenseNet-BC with spatial dropout showed promising results (3.32% error rates with CIFAR-10, 3.0 M parameters) compared to other existing competitive methods.
Similar content being viewed by others
References
Deng, J, Dong, W, Socher, R, Li, LJ, Li, K, Fei-Fei, L, (2009). Imagenet: A large-scale hierarchical image database, Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE conference on. Ieee, pp 248–255.
Fortunato, M, Blundell, C, Vinyals, O, (2017). Bayesian recurrent neural networks. arXiv preprint arXiv:1704.02798
Gal, Y, Ghahramani, Z, (2016). A theoretically grounded application of dropout in recurrent neural networks, advances in neural information processing systems, pp. 1019-1027
Ghiasi, G, Lin, TY, Le, QV, (2018). Dropblock: a regularization method for convolutional networks, Advances in Neural Information Processing Systems, pp 10727–10737.
Gross, S, Wilber, M, (2016). Training and investigating residual nets. Facebook AI research, CA.[online]. Avilable: http://torch.ch/blog/2016/02/04/resnets.html
He, K, Zhang, X, Ren, S, Sun, J, (2016). Deep residual learning for image recognition, proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770-778
Hinton, GE, Srivastava, N, Krizhevsky, A, Sutskever, I, Salakhutdinov, RR, (2012). Improving neural networks by preventing co-adaptation of feature detectors. arXiv preprint arXiv:1207.0580
Huang, G, Liu, Z, Weinberger, KQ, van der Maaten, L, (2016). Densely connected convolutional networks. arXiv preprint arXiv:1608.06993
Huang, G, Liu, S, van der Maaten, L, Weinberger, KQ, (2017a). CondenseNet: An Efficient DenseNet using Learned Group Convolutions group 3, 11
Huang, G, Liu, Z, Weinberger, KQ, van der Maaten, L, (2017b). Densely connected convolutional networks, Proceedings of the IEEE conference on computer vision and pattern recognition, p 3.
Huang, Y, Cheng, Y, Chen, D, Lee, H, Ngiam, J, Le, QV, Chen, Z, (2018). Gpipe: efficient training of giant neural networks using pipeline parallelism. arXiv preprint arXiv:1811.06965
Ioffe, S, Szegedy, C, (2015). Batch normalization: accelerating deep network training by reducing internal covariate shift. arXiv preprint arXiv:1502.03167
Khan SH, Hayat M, Porikli F (2019) Regularization of deep neural networks with spectral dropout. Neural Netw 110:82–90
Krizhevsky, A, (2009). Learning multiple layers of features from tiny images. Tech Rep
Krizhevsky, A, Sutskever, I, Hinton, GE, (2012). Imagenet classification with deep convolutional neural networks, Advances in neural information processing systems, pp 1097–1105.
LeCun Y, Boser B, Denker JS, Henderson D, Howard RE, Hubbard W, Jackel LD (1989) Backpropagation applied to handwritten zip code recognition. Neural Comput 1:541–551
Lee, CY, Xie, S, Gallagher, P, Zhang, Z, Tu, Z, (2015). Deeply-supervised nets, artificial intelligence and statistics, pp. 562-570
Nair, V, Hinton, GE, (2010). Rectified linear units improve restricted boltzmann machines, proceedings of the 27th international conference on machine learning (ICML-10), pp. 807-814
Simonyan, K, Zisserman, A, (2014). Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556
Srivastava N, Hinton G, Krizhevsky A, Sutskever I, Salakhutdinov R (2014) Dropout: a simple way to prevent neural networks from overfitting. J Mach Learn Res 15:1929–1958
Tan, M, Le, QV, (2019). EfficientNet: rethinking model scaling for convolutional neural networks. arXiv preprint arXiv:1905.11946.
Tompson, J, Goroshin, R, Jain, A, LeCun, Y, Bregler, C, (2015). Efficient object localization using convolutional networks, proceedings of the IEEE conference on computer vision and pattern recognition, pp. 648-656.
Torralba A, Fergus R, Freeman WT (2008) 80 million tiny images: a large data set for nonparametric object and scene recognition. IEEE Trans Pattern Anal Mach Intell 30:1958–1970
Xie, S, Girshick, R, Dollár, P, Tu, Z, He, K, (2017). Aggregated residual transformations for deep neural networks, computer vision and pattern recognition (CVPR), 2017 IEEE conference on. IEEE, pp. 5987-5995.
Zagoruyko, S, Komodakis, N, (2016). Wide residual networks. arXiv preprint arXiv:1605.07146
Zaremba, W, Sutskever, I, Vinyals, O, (2014). Recurrent neural network regularization. arXiv preprint arXiv:1409.2329
Zoph, B, Vasudevan, V, Shlens, J, Le, QV, (2017). Learning transferable architectures for scalable image recognition. arXiv preprint arXiv:1707.07012
Acknowledgements
This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIT) (2017R1E1A2A01079495).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Lee, S., Lee, C. Revisiting spatial dropout for regularizing convolutional neural networks. Multimed Tools Appl 79, 34195–34207 (2020). https://doi.org/10.1007/s11042-020-09054-7
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-020-09054-7