Deep network compression with teacher latent subspace learning and LASSO

Oyedotun, Oyebade K.; Shabayek, Abd El Rahman; Aouada, Djamila; Ottersten, Björn

doi:10.1007/s10489-020-01858-2

Deep network compression with teacher latent subspace learning and LASSO

Published: 05 September 2020

Volume 51, pages 834–853, (2021)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

Oyebade K. Oyedotun ORCID: orcid.org/0000-0002-4652-1691¹,
Abd El Rahman Shabayek¹,
Djamila Aouada¹ &
…
Björn Ottersten¹

416 Accesses
8 Citations
Explore all metrics

Abstract

Deep neural networks have been shown to excel in understanding multimedia by using latent representations to learn complex and useful abstractions. However, they remain unpractical for embedded devices due to memory constraints, high latency, and considerable power consumption at runtime. In this paper, we propose the compression of deep models based on learning lower dimensional subspaces from their latent representations while maintaining a minimal loss of performance. We leverage on the premise that deep convolutional neural networks extract many redundant features to learn new subspaces for feature representation. We construct a compressed model by reconstruction from representations captured by an already trained large model. As compared to state-of-the-art, the proposed approach does not rely on labeled data. Moreover, it allows the use of sparsity inducing LASSO parameter penalty to achieve better compression results than when used to train models from scratch. We perform extensive experiments using VGG-16 and wide ResNet models on CIFAR-10, CIFAR-100, MNIST and SVHN datasets. For instance, VGG-16 with 8.96M parameters trained on CIFAR-10 was pruned by 81.03 % with only 0.26 % generalization performance loss. Correspondingly, the size of the VGG-16 model is reduced from 35MB to 6.72MB to facilitate compact storage. Furthermore, the associated inference time for the same VGG-16 model is reduced from 1.1 secs to 0.6 secs so that inference is accelerated. Particularly, the proposed student models outperform state-of-the-art approaches and the same models trained from scratch.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Deep Learning: A Comprehensive Overview on Techniques, Taxonomy, Applications and Research Directions

Article 18 August 2021

Iqbal H. Sarker

Review of deep learning: concepts, CNN architectures, challenges, applications, future directions

Article Open access 31 March 2021

Laith Alzubaidi, Jinglan Zhang, … Laith Farhan

Knowledge Distillation: A Survey

Article 22 March 2021

Jianping Gou, Baosheng Yu, … Dacheng Tao

Notes

Mapping from input data space to output (softmax) space
Mapping from a hypothetical hidden layer to the adjoining one
LASSO and L1-norm are used interchangably
Codes will be made publicly available upon paper acceptance

References

Yang J, Nguyen MN, San PP, Li X, Krishnaswamy S (2015) Deep convolutional neural networks on multichannel time series for human activity recognition. In: IJCAI, pp 3995–4001
Oyedotun OK, Khashman A (2017) Deep learning in vision-based static hand gesture recognition. Neural Comput & Applic 28(12):3941–3951
Article Google Scholar
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
Kim J, Kwon Lee J, Mu Lee K (2016) Accurate image super-resolution using very deep convolutional networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1646–1654
Srivastava RK, Greff K, Schmidhuber J (2015) Training very deep networks. In: Advances in neural information processing systems, pp 2377–2385
Zagoruyko S, Komodakis N (2016) Wide residual networks. In: BMVC
Mhaskar H, Liao Q, Poggio T (2016) Learning functions: when is deep better than shallow. arXiv:1603.00988
Bianchini M, Scarselli F (2014) On the complexity of neural network classifiers: a comparison between shallow and deep architectures. IEEE Trans Neural Netw Learn Syst 25(8):1553–1565
Article Google Scholar
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems, pp 1097–1105
Rastegari M, Ordonez V, Redmon J, Farhadi A (2016) Xnor-net: Imagenet classification using binary convolutional neural networks. In: European conference on computer vision. Springer, pp 525–542
Hinton G, Vinyals O, Dean J (2015) Distilling the knowledge in a neural network. In: Advances in neural information processing systems workshop, pp 1–9
Lu L, Guo M, Renals S (2017) Knowledge distillation for small-footprint highway networks. In: 2017 IEEE International conference on acoustics, speech and signal processing (ICASSP). IEEE, pp 4820–4824
Chen G, Choi W, Yu X, Han T, Chandraker M (2017) Learning efficient object detection models with knowledge distillation. In: Advances in neural information processing systems, pp 742–751
Zhu X, Gong S, et al. (2018) Knowledge distillation by on-the-fly native ensemble. In: Advances in neural information processing systems, pp 7517–7527
Romero A, Ballas N, Kahou SE, Chassang A, Gatta C, Bengio Y (2015) Fitnets: hints for thin deep nets. In: International conference on learning representations (ICLR), pp 1–13
Han S, Pool J, Tran J, Dally W (2015) Learning both weights and connections for efficient neural network. In: Advances in neural information processing systems, pp 1135–1143
Tibshirani R (1996) Regression shrinkage and Selection via the lasso. J R Stat Soc Ser B Methodol: 267–288
Kim J, Kim Y, Kim Y (2008) A gradient-based optimization algorithm for lasso. J Comput Graph Stat 17(4):994–1009
Article MathSciNet MATH Google Scholar
Srinivas S, Babu RV (2015) Data-free parameter pruning for deep neural networks. arXiv:1507.06149
Cheng Y, Yu FX, Feris RS, Kumar S, Choudhary A, Chang S-F (2015) An exploration of parameter redundancy in deep networks with circulant projections. In: Proceedings of the IEEE international conference on computer vision, pp 2857–2865
Arpit D, Jastrzebski S, Ballas N, Krueger D, Bengio E, Kanwal MS, Maharaj T, Fischer A, Courville A, Bengio Y, et al. (2017) A closer look at memorization in deep networks. arXiv:1706.05394
Krizhevsky A, Nair V, Hinton G (2019) Cifar-10, cifar-100 (Canadian Institute for Advanced Research), http://www.cs.toronto.edu/kriz/cifar.html
LeCun Y, Cortes C (2019) Mnist handwritten digit database, http://yann.lecun.com/exdb/mnist/
Netzer Y, Wang T, Coates A, Bissacco A, Wu B, Ng AY (2019) The street view house numbers (svhn) dataset, http://ufldl.stanford.edu/housenumbers/
Lin M, Chen Q, Yan S (2013) Network in network. arXiv:1312.4400
Goodfellow IJ, Warde-Farley D, Mirza M, Courville A, Bengio Y (2013) Maxout networks. arXiv:1302.4389
Springenberg JT, Dosovitskiy A, Brox T, Riedmiller M (2014) Striving for simplicity: the all convolutional net. arXiv:1412.6806
Lee C-Y, Xie S, Gallagher P, Zhang Z, Tu Z (2015) Deeply-supervised nets. In: Artificial intelligence and statistics, pp 562–570
Zhang W, Li Y, Wang S (2019) Learning document representation via topic-enhanced lstm model. Knowl-Based Syst 174:194–204
Article Google Scholar
Zhao L, Zhou Y, Lu H, Fujita H (2019) Parallel computing method of deep belief networks and its application to traffic flow prediction. Knowl-Based Syst 163:972–987
Article Google Scholar
Courbariaux M, Bengio Y, David J-P (2015) Binaryconnect: training deep neural networks with binary weights during propagations. In: Advances in neural information processing systems, pp 3123–3131
Li F, Zhang B, Liu B (2016) Ternary weight networks. arXiv:1605.04711
Denil M, Shakibi B, Dinh L, de Freitas N, et al. (2013) Predicting parameters in deep learning. In: Advances in neural information processing systems, pp 2148–2156
Tai C, Xiao T, Zhang Y, Wang X, et al. (2015) Convolutional neural networks with low-rank regularization. arXiv:1511.06067
Jaderberg M, Vedaldi A, Zisserman A (2014) Speeding up convolutional neural networks with low rank expansions. arXiv:1405.3866
Liu B, Wang M, Foroosh H, Tappen M, Pensky M (2015) Sparse convolutional neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 806–814
Buciluǎ C, Caruana R, Niculescu-Mizil A (2006) Model compression. In: Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, pp 535–541
Cheng Y, Wang D, Zhou P, Zhang T (2018) Model compression and acceleration for deep neural networks: the principles, progress, and challenges. IEEE Signal Proc Mag 35(1):126–136
Article Google Scholar
Wang K, Liu Z, Lin Y, Lin J, Han S (2019) Haq: hardware-aware automated quantization with mixed precision. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 8612–8620
Zhao R, Hu Y, Dotzel J, De Sa C, Zhang Z (2019) Improving neural network quantization without retraining using outlier channel splitting. In: International conference on machine learning, pp 7543–7552
Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M, Berg AC, Fei-Fei L (2015) Imagenet large scale visual recognition challenge. Int J Comput Vis (IJCV): 211–252
Srinivas S, Subramanya A, Venkatesh Babu R (2017) Training sparse neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp 138–145
Joly A, Schnitzler F, Geurts P, Wehenkel L (2012) L1-based compression of random forest models. In: 20th European symposium on artificial neural networks
Zhou Y, Jin R, Hoi S C-H (2010) Exclusive lasso for multi-task feature selection. In: Proceedings of the thirteenth international conference on artificial intelligence and statistics, pp 988–995
Huang P, Zhang S, Li M, Wang J, Ma C, Wang B, Lv X (2020) Classification of cervical biopsy images based on lasso and el-svm. IEEE Access 8:24219–24228
Article Google Scholar
Simsek S, Kursuncu U, Kibis E, AnisAbdellatif M, Dag A (2020) A hybrid data mining approach for identifying the temporal effects of variables associated with breast cancer survival. Expert Syst Appl 139:112863
Article Google Scholar
Souza PVC, Guimaraes AJ, Araujo VS, Batista LO, Rezende TS (2020) An interpretable machine learning model for human fall detection systems using hybrid intelligent models. In: Challenges and trends in multimodal fall detection for healthcare. Springer, pp 181–205
Niu T, Wang J, Lu H, Yang W, Du P (2020) Developing a deep learning framework with two-stage feature selection for multivariate financial time series forecasting. Expert Syst Appl 148:113237
Article Google Scholar
de Campos Souza PV, Torres LCB, Guimaraes AJ, Araujo VS, Araujo VJS, Rezende TS (2019) Data density-based clustering for regularized fuzzy neural networks based on nullneurons and robust activation function. Soft Comput 23(23):12475–12489
Article Google Scholar
Wang X, Zhang R, Sun Y, Qi J (2018) Kdgan: knowledge distillation with generative adversarial networks. In: Advances in neural information processing systems, pp 775–786
Glorot X, Bengio Y (2010) Understanding the difficulty of training deep feedforward neural networks. In: Proceedings of the thirteenth international conference on artificial intelligence and statistics, pp 249–256
He K, Zhang X, Ren S, Sun J (2015) Delving deep into rectifiers: surpassing human-level performance on imagenet classification. In: Proceedings of the IEEE international conference on computer vision, pp 1026–1034
Ioffe S, Szegedy C (2015) Batch normalization: accelerating deep network training by reducing internal covariate shift. In: International conference on machine learning, pp 448–456
Bengio Y (2012) Practical recommendations for gradient-based training of deep architectures. In: Neural networks: tricks of the trade. Springer, pp 437–478
Belilovsky E, Eickenberg M, Oyallon E (2019) Greedy layerwise learning can scale to imagenet. In: International conference on machine learning, pp 583–593
Jangid M, Srivastava S (2018) Handwritten devanagari character recognition using layer-wise training of deep convolutional neural networks and adaptive gradient methods. J Imaging 4(2):41
Article Google Scholar
Erhan D, Manzagol P-A, Bengio Y, Bengio S, Vincent P (2009) The difficulty of training deep architectures and the effect of unsupervised pre-training. In: Artificial intelligence and statistics, pp 153–160
Erhan D, Bengio Y, Courville A, Manzagol P-A, Vincent P, Bengio S (2010) Why does unsupervised pre-training help deep learning? J Mach Learn Res 11:625–660
MathSciNet MATH Google Scholar
Ghadiyaram D, Tran D, Mahajan D (2019) Large-scale weakly-supervised pre-training for video action recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 12046–12055
Schneider S, Baevski A, Collobert R, Auli M (2019) wav2vec: unsupervised pre-training for speech recognition. In: Proceedings of the interspeech 2019, pp 3465–3469
Lugosch L, Ravanelli M, Ignoto P, Tomar VS, Bengio Y (2019) Speech model pre-training for end-to-end spoken language understanding. In: Proceedings of the interspeech 2019, pp 814–818
Rick Chang J, Li C-L, Poczos B, Vijaya Kumar B, Sankaranarayanan AC (2017) One network to solve them all–solving linear inverse problems using deep projection models. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5888–5897
Vincent P, Larochelle H, Lajoie I, Bengio Y, Manzagol P-A (2010) Stacked denoising autoencoders: learning useful representations in a deep network with a local denoising criterion. J Mach Learn Res 11:3371–3408
MathSciNet MATH Google Scholar
Kingma D, Ba J (2014) Adam: a method for stochastic optimization. arXiv:1412.6980
Srivastava N, Hinton GE, Krizhevsky A, Sutskever I, Salakhutdinov R (2014) Dropout: a simple way to prevent neural networks from overfitting. J Mach Learn Res 15(1):1929–1958
MathSciNet MATH Google Scholar
Ioffe S, Szegedy C (2015) Batch normalization: accelerating deep network training by reducing internal covariate shift. In: International conference on machine learning, pp 448–456
Li H, Kadav A, Durdanovic I, Samet H, Graf HP (2017) Pruning filters for efficient convnets. In: International conference on learning representation
Huang Q, Zhou K, You S, Neumann U (2018) Learning to prune filters in convolutional neural networks. In: IEEE Winter conference on applications of computer vision (WACV), 2018. IEEE, pp 709–718
Zhong J, Ding G, Guo Y, Han J, Wang B (2018) Where to prune: using lstm to guide end-to-end pruning. In: IJCAI, pp 3205–3211

Download references

Author information

Authors and Affiliations

Interdisciplinary Centre for Security, Reliability and Trust (SnT), University of Luxembourg, Luxembourg, 1855, Luxembourg
Oyebade K. Oyedotun, Abd El Rahman Shabayek, Djamila Aouada & Björn Ottersten

Authors

Oyebade K. Oyedotun
View author publications
You can also search for this author in PubMed Google Scholar
Abd El Rahman Shabayek
View author publications
You can also search for this author in PubMed Google Scholar
Djamila Aouada
View author publications
You can also search for this author in PubMed Google Scholar
Björn Ottersten
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Oyebade K. Oyedotun.

Ethics declarations

Conflict of interests

The authors declare that they have no conflict of interest.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This work was funded by the National Research Fund (FNR), Luxembourg, under the project reference R-AGR-0424-05-D/Bjö rn Ottersten and CPPP17/IS/11643091/IDform/Aouada

Rights and permissions

Reprints and permissions

About this article

Cite this article

Oyedotun, O.K., Shabayek, A.E.R., Aouada, D. et al. Deep network compression with teacher latent subspace learning and LASSO. Appl Intell 51, 834–853 (2021). https://doi.org/10.1007/s10489-020-01858-2

Download citation

Published: 05 September 2020
Issue Date: February 2021
DOI: https://doi.org/10.1007/s10489-020-01858-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Deep network compression with teacher latent subspace learning and LASSO

Abstract

Access this article

Similar content being viewed by others

Deep Learning: A Comprehensive Overview on Techniques, Taxonomy, Applications and Research Directions

Review of deep learning: concepts, CNN architectures, challenges, applications, future directions

Knowledge Distillation: A Survey

Notes

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interests

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Deep network compression with teacher latent subspace learning and LASSO

Abstract

Access this article

Similar content being viewed by others

Deep Learning: A Comprehensive Overview on Techniques, Taxonomy, Applications and Research Directions

Review of deep learning: concepts, CNN architectures, challenges, applications, future directions

Knowledge Distillation: A Survey

Notes

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interests

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation