Skip to main content
Log in

sqFm: a novel adaptive optimization scheme for deep learning model

  • Research Paper
  • Published:
Evolutionary Intelligence Aims and scope Submit manuscript

    We’re sorry, something doesn't seem to be working properly.

    Please try refreshing the page. If that doesn't work, please contact support so we can address the problem.

Abstract

For deep model training, an optimization technique is required that minimizes loss and maximizes accuracy. The development of an effective optimization method is one of the most important study areas. The diffGrad optimization method uses gradient changes during optimization phases but does not update 2nd order moments based on 1st order moments, and the AngularGrad optimization method uses the angular value of the gradient, which necessitates additional calculation. Due to these factors, both of those approaches result in zigzag trajectories that take a long time and require additional calculations to attain a global minimum. To overcome those limitations, a novel adaptive deep learning optimization method based on square of first momentum (sqFm) has been proposed. By adjusting 2nd order moments depending on 1st order moments and changing step size according to the present gradient on the non-negative function, the suggested sqFm delivers a smoother trajectory and better image classification accuracy. The empirical research comparing the performance of the proposed sqFm with Adam, diffGrad, and AngularGrad applying non-convex functions demonstrates that the suggested method delivers the best convergence and parameter values. In comparison to SGD, Adam, diffGrad, RAdam, and AngularGrad(tan) using the Rosenbrock function, the proposed sqFm method can attain the global minima gradually with less overshoot. Additionally, it is demonstrated that the proposed sqFm gives consistently good classification accuracy when training CNN networks (ResNet16, ResNet50, VGG34, ResNet18, and DenseNet121) on the CIFAR10, CIFAR100, and MNIST datasets, in contrast to SGDM, diffGrad, Adam, AngularGrad(Cos), and AngularGrad(Tan). The proposed method also gives the best classification accuracy than SGD, Adam, AdaBelief, Yogi, RAdam, and AngularGrad using the ImageNet dataset on the ResNet18 network. Source code link: https://github.com/UtpalNandi/sqFm-A-novel-adaptive-optimization-scheme-for-deep-learning-model.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Availability of data and materials

Data will be made available on reasonable request.

Code availability

Custom code is available.

References

  1. LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521:436–44. https://doi.org/10.1038/nature14539

    Article  Google Scholar 

  2. Subramanian M, Shanmugavadivel K, Nandhini P (2022) On fine-tuning deep learning models using transfer learning and hyper-parameters optimization for disease identification in maize leaves. Neural Comput Appl. https://doi.org/10.1007/s00521-022-07246-w

  3. Ikonomakis E, Kotsiantis S, Tampakas V (2005) Text classification using machine learning techniques. WSEAS Trans Comput 4:966–974

    Google Scholar 

  4. Young T, Hazarika D, Poria S, Cambria E (2018) Recent trends in deep learning based natural language processing [review article]. IEEE Comput Intell Mag 13(3):55–75. https://doi.org/10.1109/MCI.2018.2840738

    Article  Google Scholar 

  5. Ren F, Bracewell D (2009) Advanced information retrieval. Electron Notes Theor Comput Sci 225:303–317. https://doi.org/10.1016/j.entcs.2008.12.082

    Article  Google Scholar 

  6. Nassif AB, Shahin I, Attili I, Azzeh M, Shaalan K (2019) Speech recognition using deep neural networks: a systematic review. IEEE Access 7:19143–19165. https://doi.org/10.1109/ACCESS.2019.2896880

    Article  Google Scholar 

  7. Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Proceedings of the 25th international conference on neural information processing systems, NIPS’12, vol 1, pp 1097–1105. Curran Associates Inc., Red Hook

  8. Maas AL, Hannun AY, Ng AY (2013) Rectifier Nonlinearities Improve Neural Netwo rk Acoustic Models. In: Proceedings of the 30th International Conference on Machine Learning, Vol 28, No. 1, pp 3. http://ai.stanford.edu/~amaas/papers/relu/_hybrid/_icml2013/_final.pdf

  9. Clevert D-A, Unterthiner T, Hochreiter S (2015) Fast and accurate deep network learning by exponential linear units (ELUS). Under Review of ICLR2016 (1997)

  10. Shaziya H (2020). A study of the optimization algorithms in deep learning. https://doi.org/10.1109/ICISC44355.2019.9036442

  11. Dubey SR, Chakraborty S, Roy S, Mukherjee S, Singh S, Chaudhuri B (2019) diffgrad: an optimization method for convolutional neural networks. IEEE Trans Neural Netw Learn Syst. https://doi.org/10.1109/TNNLS.2019.2955777

  12. Yong H, Huang J, Hua X, Zhang L (2020) Gradient centralization: a new optimization technique for deep neural networks. https://doi.org/10.1007/978-3-030-58452-8_37

  13. Reddi SJ, Kale S, Kumar S (2018) On the convergence of adam and beyond. In: International conference on learning representations. https://openreview.net/forum?id=ryQu7f-RZ

  14. Roy SK, Paoletti ME, Haut JM, Dubey SR, Kar P, Plaza A, Chaudhuri BB (2021) AngularGrad: a new optimization technique for angular convergence of convolutional neural networks. https://doi.org/10.48550/arXiv.2105.10190

  15. Heo B, Chun S, Oh SJ, Han D, Yun S, Kim G, Uh Y, Ha J-W (2021) Adamp: slowing down the slowdown for momentum optimizers on scale-invariant weights. In: International conference on learning representations. https://openreview.net/forum?id=Iz3zU3M316D

  16. Bhakta S, Nandi U, Si T, Ghosal S, Changdar C, Pal, R (2022) Diffmoment: an adaptive optimization technique for convolutional neural network. Appl Intell. https://doi.org/10.1007/s10489-022-04382-7

  17. Bhakta S, Nandi U, Changdar C, Marjit Singh M (2023) Angularparameter: a novel optimization technique for deep learning models. In: Sisodia DS, Garg L, Pachori RB, Tanveer M (eds) Machine intelligence techniques for data analysis and signal processing. Springer, Singapore, pp 201–212. https://doi.org/10.1007/978-981-99-0085-5_17

    Chapter  Google Scholar 

  18. Wang H, Luo Y, An W, Sun Q, Xu J, Zhang L (2020) PID controller-based stochastic optimization acceleration for deep neural networks. IEEE Trans Neural Netw Learn Syst 31(12):5079–5091. https://doi.org/10.1109/TNNLS.2019.2963066

    Article  Google Scholar 

  19. Huang H, Wang C, Dong B (2019) Nostalgic adam: weighting more of the past gradients when designing the adaptive learning rate. https://doi.org/10.24963/ijcai.2019/355

  20. Zaheer M, Reddi S, Sachan D, Kale S, Kumar S (2018) Adaptive methods for nonconvex optimization. In: Bengio S, Wallach H, Larochelle H, Grauman K, Cesa-Bianchi N, Garnett R (eds) Advances in neural information processing systems, vol 31. https://proceedings.neurips.cc/paper_files/paper/2018/file/90365351ccc7437a1309dc64e4db32a3-Paper.pdf

  21. Alzubaidi L, Zhang J, Humaidi A, Al-Dujaili A, Duan Y, Al-Shamma O, Santamaría J, Fadhel M, Al-Amidie M, Farhan L (2021) Review of deep learning: concepts, CNN architectures, challenges, applications, future directions. J Big Data. https://doi.org/10.1186/s40537-021-00444-8

  22. Nandi U, Ghorai A, Singh M, Changdar C, Bhakta S, Pal R (2022) Indian sign language alphabet recognition system using CNN with diffgrad optimizer and stochastic pooling. Multimedia Tools Appl. https://doi.org/10.1007/s11042-021-11595-4

  23. Ghorai A, Nandi U, Changdar C, Si T, Singh M, Mondal JK (2023) Indian sign language recognition system using network deconvolution and spatial transformer network. Neural Comput Appl. https://doi.org/10.1007/s00521-023-08860-y

    Article  Google Scholar 

  24. Bottou L (2010) Large-scale machine learning with stochastic gradient descent. In: Lechevallier Y, Saporta G (eds) Proceedings of COMPSTAT’2010. Physica-Verlag, Heidelberg, pp 177–186

    Google Scholar 

  25. Robbins HE (1951) A stochastic approximation method. Ann Math Stat 22:400–407

    Article  MathSciNet  Google Scholar 

  26. Qian N (1999) On the momentum term in gradient descent learning algorithms. Neural Netw 12(1):145–151. https://doi.org/10.1016/S0893-6080(98)00116-6

    Article  Google Scholar 

  27. Sutskever I, Martens J, Dahl G, Hinton G (2013) On the importance of initialization and momentum in deep learning. In: Dasgupta S, McAllester D (eds) Proceedings of the 30th international conference on machine learning, Proceedings of machine learning research, vol 28. PMLR, Atlanta, pp 1139–1147. https://proceedings.mlr.press/v28/sutskever13.html

  28. Botev A, Lever G, Barber D (2017) Nesterov’s accelerated gradient and momentum as approximations to regularised update descent. In: 2017 international joint conference on neural networks (IJCNN), pp 1899–1903. https://doi.org/10.1109/IJCNN.2017.7966082

  29. Zaheer R, Shaziya H (2019) A study of the optimization algorithms in deep learning. In: 2019 3rd international conference on inventive systems and control (ICISC), pp 536–539. https://doi.org/10.1109/ICISC44355.2019.9036442

  30. Park S, Jung S, Pardalos P (2020) Combining stochastic adaptive cubic regularization with negative curvature for nonconvex optimization. J Optim Theory Appl. https://doi.org/10.1007/s10957-019-01624-6

  31. Fang J-K, Fong C-M, Yang P, Hung C-K, Lu W-l, Chang C-W (2020) Adagrad gradient descent method for AI image management. In: 2020 IEEE international conference on consumer electronics—Taiwan (ICCE-Taiwan), pp 1–2

  32. Lydia A, Francis S (2019) Adagrad—an optimizer for stochastic gradient descent. Int J Inf Comput Sci 6:566–568

    Google Scholar 

  33. Zeiler MD (2012) ADADELTA: an adaptive learning rate method. CoRR arXiv:1212.5701

  34. Kingma D, Ba J (2015) Adam: a method for stochastic optimization. In: Bengio Y, LeCun Y (eds) 3rd international conference on learning representations, ICLR 2015, San Diego, CA, USA, May 7–9, 2015, conference track proceedings. arXiv:1412.6980

  35. Zhang Z (2018) Improved adam optimizer for deep neural networks. In: 2018 IEEE/ACM 26th international symposium on quality of service (IWQoS), pp 1–2. https://doi.org/10.1109/IWQoS.2018.8624183

  36. Chouzenoux E, Fest J-B (2022) Sabrina: a stochastic subspace majorization-minimization algorithm. J Optim Theory Appl 195(3):919–952. https://doi.org/10.1007/s10957-022-02122-y

    Article  MathSciNet  Google Scholar 

  37. Mustapha A, Mohamed L, Ali K (2021) Comparative study of optimization techniques in deep learning: Application in the ophthalmology field. J Phys: Confer Ser 1743(1):012002. https://doi.org/10.1088/1742-6596/1743/1/012002

    Article  Google Scholar 

  38. Lacotte J, Pilanci M (2020) All local minima are global for two-layer Relu neural networks: the hidden convex optimization landscape. arXiv:2006.05900

  39. Kawaguchi K, Kaelbling L (2020) Elimination of all bad local minima in deep learning. In: Chiappa S, Calandra R (eds) Proceedings of the 23rd international conference on artificial intelligence and statistics. Proceedings of machine learning research, vol 108. PMLR, pp 853–863. https://proceedings.mlr.press/v108/kawaguchi20b.html

  40. Amari S-I (1998) Natural gradient works efficiently in learning. Neural Comput 10(2):251–276. https://doi.org/10.1162/089976698300017746

    Article  Google Scholar 

  41. Dorronsoro JR, González AM, Cruz CS (2001) Natural gradient learning in NLDA networks. In: Proceedings of the 6th international work-conference on artificial and natural neural networks: connectionist models of neurons, learning processes and artificial intelligence—part I. IWANN ’01. Springer, Berlin, Heidelberg, pp 427–434

  42. Liu L, Jiang H, He P, Chen W, Liu X, Gao J, Han J (2020) On the variance of the adaptive learning rate and beyond. In: International conference on learning representations. https://openreview.net/forum?id=rkgz2aEKDr

  43. Zhuang J, Tang TM, Ding Y, Tatikonda SC, Dvornek NC, Papademetris X, Duncan JS (2020) Adabelief optimizer: adapting stepsizes by the belief in observed gradients. Adv Neural Inf Process Syst. https://doi.org/10.48550/arXiv.2010.07468

  44. Defazio A, Jelassi S (2022) Adaptivity without compromise: a momentumized, adaptive, dual averaged gradient method for stochastic optimization. J Mach Learn Res 23(144): 1–34. http://jmlr.org/papers/v23/21-0226.html

  45. Jain P, Kar P (2017) Non-convex optimization for machine learning. Found Trends Mach Learn 10(3–4):142–363. https://doi.org/10.1561/2200000058

    Article  Google Scholar 

  46. Danilova M, Dvurechensky P, Gasnikov A, Gorbunov E, Guminov S, Kamzolov D, Shibaev I (2022) In: Nikeghbali A, Pardalos PM, Raigorodskii AM, Rassias MT (eds) Recent theoretical advances in non-convex optimization. Springer, Cham, pp 79–163. https://doi.org/10.1007/978-3-031-00832-0_3

  47. Rosenbrock HH (1960) An automatic method for finding the greatest or least value of a function. Comput J 3(3):175–184. https://doi.org/10.1093/comjnl/3.3.175. https://academic.oup.com/comjnl/article-pdf/3/3/175/988633/030175.pdf

  48. Deng L (2012) The mnist database of handwritten digit images for machine learning research. IEEE Signal Process Mag 29(6):141–142

    Article  Google Scholar 

  49. Krizhevsky A, Hinton G (2009) Learning multiple layers of features from tiny images. Technical Report 0, University of Toronto, Toronto, Ontario

  50. Deng J, Dong W, Socher R, Li L-J, Li K, Fei-Fei L (2009) Imagenet: a large-scale hierarchical image database. In: 2009 IEEE conference on computer vision and pattern recognition, pp 248–255. https://doi.org/10.1109/CVPR.2009.5206848

  51. Zhang Z-H, Yang Z, Sun Y, Wu Y-F, Xing Y-D (2019) Lenet-5 convolution neural network with mish activation function and fixed memory step gradient descent method. In: 2019 16th international computer conference on wavelet active media technology and information processing, pp 196–199. https://doi.org/10.1109/ICCWAMTIP47768.2019.9067661

  52. Tripathi AM, Mishra A (2022) Revamped knowledge distillation for sound classification. In: 2022 international joint conference on neural networks (IJCNN), pp 1–8. https://doi.org/10.1109/IJCNN55064.2022.9892474

  53. Bantupalli K, Xie Y (2018) American sign language recognition using deep learning and computer vision. In: 2018 IEEE international conference on big data (big data), pp 4896–4899. https://doi.org/10.1109/BigData.2018.8622141

  54. Ren S, He K, Girshick R, Sun J (2017) Faster r-CNN: towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intell 39(6):1137–1149. https://doi.org/10.1109/TPAMI.2016.2577031

    Article  Google Scholar 

  55. Lin T-Y, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollár P, Zitnick CL (2014) Microsoft coco: common objects in context. In: Fleet D, Pajdla T, Schiele B, Tuytelaars T (eds) Computer vision—ECCV 2014. Springer, Cham, pp 740–755

    Chapter  Google Scholar 

Download references

Acknowledgements

We’d like to thank to the Dept. of Computer Science, Vidyasagar University, Paschim Medinipur, Midnapore 721102, West Bengal, India to provide infrastructures to carry out our experiments.

Funding

Not applicable.

Author information

Authors and Affiliations

Authors

Contributions

Shubhankar Bhakta: Conceptualization, Implementation and Drafting; Utpal Nandi: Conceptualization, Investigation, Methodology, Analysis, and Supervision; Others: Review and Editing.

Corresponding author

Correspondence to Utpal Nandi.

Ethics declarations

Conflict of interest

There is no conflicts of interest/competing interest.

Ethical approval

The authors approve that the research presented in this paper is conducted following the principles of ethical and professional conduct.

Consent to participate

Not applicable.

Consent for publication

Not applicable, the authors used publicly available data only and provide the corresponding references.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Bhakta, S., Nandi, U., Mondal, M. et al. sqFm: a novel adaptive optimization scheme for deep learning model. Evol. Intel. (2024). https://doi.org/10.1007/s12065-023-00897-1

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s12065-023-00897-1

Keywords

Navigation