sqFm: a novel adaptive optimization scheme for deep learning model

Bhakta, Shubhankar; Nandi, Utpal; Mondal, Madhab; Mahapatra, Kuheli Ray; Chowdhuri, Partha; Pal, Pabitra

doi:10.1007/s12065-023-00897-1

sqFm: a novel adaptive optimization scheme for deep learning model

Research Paper
Published: 17 January 2024

(2024)
Cite this article

Evolutionary Intelligence Aims and scope Submit manuscript

Shubhankar Bhakta¹,
Utpal Nandi¹,
Madhab Mondal²,
Kuheli Ray Mahapatra³,
Partha Chowdhuri¹ &
…
Pabitra Pal⁴

We’re sorry, something doesn't seem to be working properly.

Please try refreshing the page. If that doesn't work, please contact support so we can address the problem.

Abstract

For deep model training, an optimization technique is required that minimizes loss and maximizes accuracy. The development of an effective optimization method is one of the most important study areas. The diffGrad optimization method uses gradient changes during optimization phases but does not update 2nd order moments based on 1st order moments, and the AngularGrad optimization method uses the angular value of the gradient, which necessitates additional calculation. Due to these factors, both of those approaches result in zigzag trajectories that take a long time and require additional calculations to attain a global minimum. To overcome those limitations, a novel adaptive deep learning optimization method based on square of first momentum (sqFm) has been proposed. By adjusting 2nd order moments depending on 1st order moments and changing step size according to the present gradient on the non-negative function, the suggested sqFm delivers a smoother trajectory and better image classification accuracy. The empirical research comparing the performance of the proposed sqFm with Adam, diffGrad, and AngularGrad applying non-convex functions demonstrates that the suggested method delivers the best convergence and parameter values. In comparison to SGD, Adam, diffGrad, RAdam, and AngularGrad(tan) using the Rosenbrock function, the proposed sqFm method can attain the global minima gradually with less overshoot. Additionally, it is demonstrated that the proposed sqFm gives consistently good classification accuracy when training CNN networks (ResNet16, ResNet50, VGG34, ResNet18, and DenseNet121) on the CIFAR10, CIFAR100, and MNIST datasets, in contrast to SGDM, diffGrad, Adam, AngularGrad(Cos), and AngularGrad(Tan). The proposed method also gives the best classification accuracy than SGD, Adam, AdaBelief, Yogi, RAdam, and AngularGrad using the ImageNet dataset on the ResNet18 network. Source code link: https://github.com/UtpalNandi/sqFm-A-novel-adaptive-optimization-scheme-for-deep-learning-model.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Deep Learning: A Comprehensive Overview on Techniques, Taxonomy, Applications and Research Directions

Article 18 August 2021

A survey on Image Data Augmentation for Deep Learning

Article Open access 06 July 2019

A review of object detection based on deep learning

Article 12 June 2020

Availability of data and materials

Data will be made available on reasonable request.

Code availability

Custom code is available.

References

LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521:436–44. https://doi.org/10.1038/nature14539
Article Google Scholar
Subramanian M, Shanmugavadivel K, Nandhini P (2022) On fine-tuning deep learning models using transfer learning and hyper-parameters optimization for disease identification in maize leaves. Neural Comput Appl. https://doi.org/10.1007/s00521-022-07246-w
Ikonomakis E, Kotsiantis S, Tampakas V (2005) Text classification using machine learning techniques. WSEAS Trans Comput 4:966–974
Google Scholar
Young T, Hazarika D, Poria S, Cambria E (2018) Recent trends in deep learning based natural language processing [review article]. IEEE Comput Intell Mag 13(3):55–75. https://doi.org/10.1109/MCI.2018.2840738
Article Google Scholar
Ren F, Bracewell D (2009) Advanced information retrieval. Electron Notes Theor Comput Sci 225:303–317. https://doi.org/10.1016/j.entcs.2008.12.082
Article Google Scholar
Nassif AB, Shahin I, Attili I, Azzeh M, Shaalan K (2019) Speech recognition using deep neural networks: a systematic review. IEEE Access 7:19143–19165. https://doi.org/10.1109/ACCESS.2019.2896880
Article Google Scholar
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Proceedings of the 25th international conference on neural information processing systems, NIPS’12, vol 1, pp 1097–1105. Curran Associates Inc., Red Hook
Maas AL, Hannun AY, Ng AY (2013) Rectifier Nonlinearities Improve Neural Netwo rk Acoustic Models. In: Proceedings of the 30th International Conference on Machine Learning, Vol 28, No. 1, pp 3. http://ai.stanford.edu/~amaas/papers/relu/_hybrid/_icml2013/_final.pdf
Clevert D-A, Unterthiner T, Hochreiter S (2015) Fast and accurate deep network learning by exponential linear units (ELUS). Under Review of ICLR2016 (1997)
Shaziya H (2020). A study of the optimization algorithms in deep learning. https://doi.org/10.1109/ICISC44355.2019.9036442
Dubey SR, Chakraborty S, Roy S, Mukherjee S, Singh S, Chaudhuri B (2019) diffgrad: an optimization method for convolutional neural networks. IEEE Trans Neural Netw Learn Syst. https://doi.org/10.1109/TNNLS.2019.2955777
Yong H, Huang J, Hua X, Zhang L (2020) Gradient centralization: a new optimization technique for deep neural networks. https://doi.org/10.1007/978-3-030-58452-8_37
Reddi SJ, Kale S, Kumar S (2018) On the convergence of adam and beyond. In: International conference on learning representations. https://openreview.net/forum?id=ryQu7f-RZ
Roy SK, Paoletti ME, Haut JM, Dubey SR, Kar P, Plaza A, Chaudhuri BB (2021) AngularGrad: a new optimization technique for angular convergence of convolutional neural networks. https://doi.org/10.48550/arXiv.2105.10190
Heo B, Chun S, Oh SJ, Han D, Yun S, Kim G, Uh Y, Ha J-W (2021) Adamp: slowing down the slowdown for momentum optimizers on scale-invariant weights. In: International conference on learning representations. https://openreview.net/forum?id=Iz3zU3M316D
Bhakta S, Nandi U, Si T, Ghosal S, Changdar C, Pal, R (2022) Diffmoment: an adaptive optimization technique for convolutional neural network. Appl Intell. https://doi.org/10.1007/s10489-022-04382-7
Bhakta S, Nandi U, Changdar C, Marjit Singh M (2023) Angularparameter: a novel optimization technique for deep learning models. In: Sisodia DS, Garg L, Pachori RB, Tanveer M (eds) Machine intelligence techniques for data analysis and signal processing. Springer, Singapore, pp 201–212. https://doi.org/10.1007/978-981-99-0085-5_17
Chapter Google Scholar
Wang H, Luo Y, An W, Sun Q, Xu J, Zhang L (2020) PID controller-based stochastic optimization acceleration for deep neural networks. IEEE Trans Neural Netw Learn Syst 31(12):5079–5091. https://doi.org/10.1109/TNNLS.2019.2963066
Article Google Scholar
Huang H, Wang C, Dong B (2019) Nostalgic adam: weighting more of the past gradients when designing the adaptive learning rate. https://doi.org/10.24963/ijcai.2019/355
Zaheer M, Reddi S, Sachan D, Kale S, Kumar S (2018) Adaptive methods for nonconvex optimization. In: Bengio S, Wallach H, Larochelle H, Grauman K, Cesa-Bianchi N, Garnett R (eds) Advances in neural information processing systems, vol 31. https://proceedings.neurips.cc/paper_files/paper/2018/file/90365351ccc7437a1309dc64e4db32a3-Paper.pdf
Alzubaidi L, Zhang J, Humaidi A, Al-Dujaili A, Duan Y, Al-Shamma O, Santamaría J, Fadhel M, Al-Amidie M, Farhan L (2021) Review of deep learning: concepts, CNN architectures, challenges, applications, future directions. J Big Data. https://doi.org/10.1186/s40537-021-00444-8
Nandi U, Ghorai A, Singh M, Changdar C, Bhakta S, Pal R (2022) Indian sign language alphabet recognition system using CNN with diffgrad optimizer and stochastic pooling. Multimedia Tools Appl. https://doi.org/10.1007/s11042-021-11595-4
Ghorai A, Nandi U, Changdar C, Si T, Singh M, Mondal JK (2023) Indian sign language recognition system using network deconvolution and spatial transformer network. Neural Comput Appl. https://doi.org/10.1007/s00521-023-08860-y
Article Google Scholar
Bottou L (2010) Large-scale machine learning with stochastic gradient descent. In: Lechevallier Y, Saporta G (eds) Proceedings of COMPSTAT’2010. Physica-Verlag, Heidelberg, pp 177–186
Google Scholar
Robbins HE (1951) A stochastic approximation method. Ann Math Stat 22:400–407
Article MathSciNet Google Scholar
Qian N (1999) On the momentum term in gradient descent learning algorithms. Neural Netw 12(1):145–151. https://doi.org/10.1016/S0893-6080(98)00116-6
Article Google Scholar
Sutskever I, Martens J, Dahl G, Hinton G (2013) On the importance of initialization and momentum in deep learning. In: Dasgupta S, McAllester D (eds) Proceedings of the 30th international conference on machine learning, Proceedings of machine learning research, vol 28. PMLR, Atlanta, pp 1139–1147. https://proceedings.mlr.press/v28/sutskever13.html
Botev A, Lever G, Barber D (2017) Nesterov’s accelerated gradient and momentum as approximations to regularised update descent. In: 2017 international joint conference on neural networks (IJCNN), pp 1899–1903. https://doi.org/10.1109/IJCNN.2017.7966082
Zaheer R, Shaziya H (2019) A study of the optimization algorithms in deep learning. In: 2019 3rd international conference on inventive systems and control (ICISC), pp 536–539. https://doi.org/10.1109/ICISC44355.2019.9036442
Park S, Jung S, Pardalos P (2020) Combining stochastic adaptive cubic regularization with negative curvature for nonconvex optimization. J Optim Theory Appl. https://doi.org/10.1007/s10957-019-01624-6
Fang J-K, Fong C-M, Yang P, Hung C-K, Lu W-l, Chang C-W (2020) Adagrad gradient descent method for AI image management. In: 2020 IEEE international conference on consumer electronics—Taiwan (ICCE-Taiwan), pp 1–2
Lydia A, Francis S (2019) Adagrad—an optimizer for stochastic gradient descent. Int J Inf Comput Sci 6:566–568
Google Scholar
Zeiler MD (2012) ADADELTA: an adaptive learning rate method. CoRR arXiv:1212.5701
Kingma D, Ba J (2015) Adam: a method for stochastic optimization. In: Bengio Y, LeCun Y (eds) 3rd international conference on learning representations, ICLR 2015, San Diego, CA, USA, May 7–9, 2015, conference track proceedings. arXiv:1412.6980
Zhang Z (2018) Improved adam optimizer for deep neural networks. In: 2018 IEEE/ACM 26th international symposium on quality of service (IWQoS), pp 1–2. https://doi.org/10.1109/IWQoS.2018.8624183
Chouzenoux E, Fest J-B (2022) Sabrina: a stochastic subspace majorization-minimization algorithm. J Optim Theory Appl 195(3):919–952. https://doi.org/10.1007/s10957-022-02122-y
Article MathSciNet Google Scholar
Mustapha A, Mohamed L, Ali K (2021) Comparative study of optimization techniques in deep learning: Application in the ophthalmology field. J Phys: Confer Ser 1743(1):012002. https://doi.org/10.1088/1742-6596/1743/1/012002
Article Google Scholar
Lacotte J, Pilanci M (2020) All local minima are global for two-layer Relu neural networks: the hidden convex optimization landscape. arXiv:2006.05900
Kawaguchi K, Kaelbling L (2020) Elimination of all bad local minima in deep learning. In: Chiappa S, Calandra R (eds) Proceedings of the 23rd international conference on artificial intelligence and statistics. Proceedings of machine learning research, vol 108. PMLR, pp 853–863. https://proceedings.mlr.press/v108/kawaguchi20b.html
Amari S-I (1998) Natural gradient works efficiently in learning. Neural Comput 10(2):251–276. https://doi.org/10.1162/089976698300017746
Article Google Scholar
Dorronsoro JR, González AM, Cruz CS (2001) Natural gradient learning in NLDA networks. In: Proceedings of the 6th international work-conference on artificial and natural neural networks: connectionist models of neurons, learning processes and artificial intelligence—part I. IWANN ’01. Springer, Berlin, Heidelberg, pp 427–434
Liu L, Jiang H, He P, Chen W, Liu X, Gao J, Han J (2020) On the variance of the adaptive learning rate and beyond. In: International conference on learning representations. https://openreview.net/forum?id=rkgz2aEKDr
Zhuang J, Tang TM, Ding Y, Tatikonda SC, Dvornek NC, Papademetris X, Duncan JS (2020) Adabelief optimizer: adapting stepsizes by the belief in observed gradients. Adv Neural Inf Process Syst. https://doi.org/10.48550/arXiv.2010.07468
Defazio A, Jelassi S (2022) Adaptivity without compromise: a momentumized, adaptive, dual averaged gradient method for stochastic optimization. J Mach Learn Res 23(144): 1–34. http://jmlr.org/papers/v23/21-0226.html
Jain P, Kar P (2017) Non-convex optimization for machine learning. Found Trends Mach Learn 10(3–4):142–363. https://doi.org/10.1561/2200000058
Article Google Scholar
Danilova M, Dvurechensky P, Gasnikov A, Gorbunov E, Guminov S, Kamzolov D, Shibaev I (2022) In: Nikeghbali A, Pardalos PM, Raigorodskii AM, Rassias MT (eds) Recent theoretical advances in non-convex optimization. Springer, Cham, pp 79–163. https://doi.org/10.1007/978-3-031-00832-0_3
Rosenbrock HH (1960) An automatic method for finding the greatest or least value of a function. Comput J 3(3):175–184. https://doi.org/10.1093/comjnl/3.3.175. https://academic.oup.com/comjnl/article-pdf/3/3/175/988633/030175.pdf
Deng L (2012) The mnist database of handwritten digit images for machine learning research. IEEE Signal Process Mag 29(6):141–142
Article Google Scholar
Krizhevsky A, Hinton G (2009) Learning multiple layers of features from tiny images. Technical Report 0, University of Toronto, Toronto, Ontario
Deng J, Dong W, Socher R, Li L-J, Li K, Fei-Fei L (2009) Imagenet: a large-scale hierarchical image database. In: 2009 IEEE conference on computer vision and pattern recognition, pp 248–255. https://doi.org/10.1109/CVPR.2009.5206848
Zhang Z-H, Yang Z, Sun Y, Wu Y-F, Xing Y-D (2019) Lenet-5 convolution neural network with mish activation function and fixed memory step gradient descent method. In: 2019 16th international computer conference on wavelet active media technology and information processing, pp 196–199. https://doi.org/10.1109/ICCWAMTIP47768.2019.9067661
Tripathi AM, Mishra A (2022) Revamped knowledge distillation for sound classification. In: 2022 international joint conference on neural networks (IJCNN), pp 1–8. https://doi.org/10.1109/IJCNN55064.2022.9892474
Bantupalli K, Xie Y (2018) American sign language recognition using deep learning and computer vision. In: 2018 IEEE international conference on big data (big data), pp 4896–4899. https://doi.org/10.1109/BigData.2018.8622141
Ren S, He K, Girshick R, Sun J (2017) Faster r-CNN: towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intell 39(6):1137–1149. https://doi.org/10.1109/TPAMI.2016.2577031
Article Google Scholar
Lin T-Y, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollár P, Zitnick CL (2014) Microsoft coco: common objects in context. In: Fleet D, Pajdla T, Schiele B, Tuytelaars T (eds) Computer vision—ECCV 2014. Springer, Cham, pp 740–755
Chapter Google Scholar

Download references

Acknowledgements

We’d like to thank to the Dept. of Computer Science, Vidyasagar University, Paschim Medinipur, Midnapore 721102, West Bengal, India to provide infrastructures to carry out our experiments.

Funding

Not applicable.

Author information

Authors and Affiliations

Department of Computer Science, Vidyasagar University, Paschim Medinipur, West Bengal, 721102, India
Shubhankar Bhakta, Utpal Nandi & Partha Chowdhuri
Department of Mathematics, Mahishadal Girls’ College, Purba Medinipur, West Bengal, 721628, India
Madhab Mondal
Department of Computer Science, Bajkul Milani Mahavidyalaya, Purba Medinipur, West Bengal, 721655, India
Kuheli Ray Mahapatra
Department of Computer Applications, Maulana Abul Kalam Azad University of Technology, Haringhata, West Bengal, 741249, India
Pabitra Pal

Authors

Shubhankar Bhakta
View author publications
You can also search for this author in PubMed Google Scholar
Utpal Nandi
View author publications
You can also search for this author in PubMed Google Scholar
Madhab Mondal
View author publications
You can also search for this author in PubMed Google Scholar
Kuheli Ray Mahapatra
View author publications
You can also search for this author in PubMed Google Scholar
Partha Chowdhuri
View author publications
You can also search for this author in PubMed Google Scholar
Pabitra Pal
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Shubhankar Bhakta: Conceptualization, Implementation and Drafting; Utpal Nandi: Conceptualization, Investigation, Methodology, Analysis, and Supervision; Others: Review and Editing.

Corresponding author

Correspondence to Utpal Nandi.

Ethics declarations

Conflict of interest

There is no conflicts of interest/competing interest.

Ethical approval

The authors approve that the research presented in this paper is conducted following the principles of ethical and professional conduct.

Consent to participate

Not applicable.

Consent for publication

Not applicable, the authors used publicly available data only and provide the corresponding references.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Bhakta, S., Nandi, U., Mondal, M. et al. sqFm: a novel adaptive optimization scheme for deep learning model. Evol. Intel. (2024). https://doi.org/10.1007/s12065-023-00897-1

Download citation

Received: 18 October 2023
Revised: 30 November 2023
Accepted: 16 December 2023
Published: 17 January 2024
DOI: https://doi.org/10.1007/s12065-023-00897-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

sqFm: a novel adaptive optimization scheme for deep learning model

Abstract

Access this article

Similar content being viewed by others

Deep Learning: A Comprehensive Overview on Techniques, Taxonomy, Applications and Research Directions

A survey on Image Data Augmentation for Deep Learning

A review of object detection based on deep learning

Availability of data and materials

Code availability

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Consent to participate

Consent for publication

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

sqFm: a novel adaptive optimization scheme for deep learning model

Abstract

Access this article

Similar content being viewed by others

Deep Learning: A Comprehensive Overview on Techniques, Taxonomy, Applications and Research Directions

A survey on Image Data Augmentation for Deep Learning

A review of object detection based on deep learning

Availability of data and materials

Code availability

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Consent to participate

Consent for publication

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation