Abstract
In this paper we propose an adaptive trust-region method for smooth unconstrained optimization. The update rule for the trust-region radius relies only on gradient evaluations. Assuming that the gradient of the objective function is Lipschitz continuous, we establish worst-case complexity bounds for the number of gradient evaluations required by the proposed method to generate approximate stationary points. As a corollary, we establish a global convergence result. We also present numerical results on benchmark problems. In terms of the number of calls of the oracle, the proposed method compares favorably with trust-region methods that use evaluations of the objective function.
Similar content being viewed by others
Notes
The performance profiles were generated using the code perf.m freely available in the website http://www.mcs.anl.gov/~more/cops/.
Looking closely problems Powell badly scaled, Brown badly scaled and Meyer, we see that for these problems \(\Vert \nabla f(x_{0})\Vert \ge 10^{3}\). Due to (2.4) and (2.2), large values for \(\Vert \nabla f(x_{0})\Vert\) make \(\varDelta _{k}\) become extremely small very quickly, which severely slows down the progress of the iterates towards stationary points. This remark suggests that when initializing AdaTrust2, starting points with very large norm of the gradient should be avoided.
References
Abdelhamid, N., Ayesh, A., Thabtah, F.: Phishing detection based Associative Classification data mining. Expert Syst. Appl. 41, 5948–5959 (2014)
Aeberhard, S., Coomans, D., de Vel, O.: Comparison of Classifiers in High Dimensional Settings, Tech. Rep. no. 92-02, (1992), Dept. of Computer Science and Dept. of Mathematics and Statistics, James Cook University of North Queensland
Balima, O., Boulanger, J., Charette, A., Marceau, D.: New developments in frequency domain optical tomography. Part II: application with a L-BFGS associated to an inexact line search. J. Quant. Spectrosc. Radiat. Transf. 112, 1235–1240 (2011)
Bartholomew-Biggs, M., Brown, S., Christianson, B., Dixon, L.: Automatic differentiation of algorithms. J. Comput. Appl. Math. 124, 171–190 (2000)
Baydin, A.G., Pearlmutter, B.A., Radul, A.A., Siskind, J.M.: Automatic differentiation in machine learning: a survey. J. Mach. Learn. Res. 18, 1–43 (2018)
Birgin, E.G., Gardenghi, J.L., Martínez, J.M., Santos, S.A.: On the use of third-order models with fourth-order regularization for unconstrained optimization. Optim. Lett. 14, 815–838 (2020)
Charytanowicz, M., Niewczas, J., Kulczycki, P., Kowalski, P.A., Lukasik, S., Zak, S.: A complete gradient clustering algorithm for features analysis of X-ray images. In: Pietka, E., Kawa, J. (eds.) Information Technologies in Biomedicine, pp. 15–24. Springer, Berlin (2010)
Conn, A.R., Gould, N.I.M., Toint, Ph.L.: Trust-Region Methods. SIAM, Philadelphia (2000)
Dietterich, T.G., Lathrop, R.H., Lozano-Perez, T.: Solving the multiple-instance problem with axis-parallel rectangles. Artif. Intell. 89, 31–71 (1997)
Ding, J., Pan, Z., Chen, L.: Parameter identification of multibody systems based on second order sensitivity analysis. Int. J. Non-Linear Mech. 47, 1105–1110 (2012)
Dolan, E., Moré, J.J.: Benchmarking optimization software with performance profiles. Math. Program. 91, 201–2013 (2002)
Dua, D., Graff, C.: UCI Machine Learning Repository. University of California, School of Information and Computer Science, Irvine (2019). http://archive.ics.uci.edu/ml
Fan, J., Yuan, Y.: A new trust region algorithm with trust region radius converging to zero. In: Li, D. (ed.) Proceeding of the 5th International Conference on Optimization: Techiniques and Applications, pp. 786-794. Hong Kong (2001)
Fisher, R.A.: The use of multiple measurements in taxonomic problems. Annu. Eugenics 7(Part II), 179–188 (1936)
Fletcher, R.: An efficient, global convergent algorithm for unconstrained and linearly constrained optimization problems. Technical Report TP 431, AERE, Harwell Laboratory, Oxfordshire, England (1970)
Fletcher, R.: Practical Methods of Optimization, Volume 1: Unconstrained Optimization. Wiley, Chichester, England (1980)
Gorman, R.P., Sejnowski, T.J.: Analysis of hidden units in a layered network trained to classify sonar targets. Neural Netw. 1, 75–89 (1988)
Grapiglia, G.N., Yuan, J., Yuan, Y.: On the convergence and worst-case complexity of trust-region and regularization methods for unconstrained optimization. Math. Program. 152, 491–520 (2015)
Grapiglia, G.N., Yuan, J., Yuan, Y.: Nonlinear stepsize control algorithms: complexity bounds for first-and second-order optimality. J. Optim. Theory Appl. 171, 980–997 (2016)
Gratton, S., Sartenaer, A., Toint, Ph.L.: Recursive trust-region methods for multiscale nonlinear optimization. SIAM J. Optim. 19, 414–444 (2008)
Griewank, A., Walther, A.: Evaluationg Derivatives: Principles and Techniques of Algorithmic Differentiation. SIAM, Philadelphia (2008)
Hebden, M.D.: An algorithm for minimization using exact second order derivatives. Technical Report TP 515, AERE, Harwell Laboratory, Oxfordshire, England (1973)
Heusinger, A., Kanzow, C.: Optimization reformulations of the generalized Nash equilibrium problem using Nikaido-Isoda-type functions. Comput. Optim. Appl. 43, 353–377 (2009)
Koziel, S., Mosler, F., Reitzinger, S., Thoma, P.: Robust microwave design optimization using adjoint sensitivity and trust regions. Int. J. RF Microwave Comput. Aided Eng. 22, 10–19 (2012)
Moré, J.J., Garbow, B.S., Hillstrom, K.E.: Testing unconstrained optimization software. ACM Trans. Math. Softw. 7, 17–41 (1981)
Powell, M.J.D.: A new algorithm for unconstrained optimization. In: Rosen, J.B., Mangasarian, O.L., Ritter, K. (eds.) Nonlinear Programming, pp. 31–66. Academic Press, New York (1970)
Powell, M.J.D.: Convergence properties of a class of minimization algorithms. In: O.L. Mangasarian, R.R. Meyer and S.M. Robinson (eds.) Nonlinear Programming, pp. 1–27 (1975)
Rossi, R.A., Ahmed, N.K.: The network data repository with interative graph analytics and visualization (2015). http://networkrepository.com
Sigillito, V.G., Wing, S.P., Hutton, L.V., Baker, K.B.: Classification of radar returns from the ionosphere using neural networks. Johns Hopkins APL Tech. Digest 10, 262–266 (1989)
Steihaug, T.: The conjugate gradient and trust regions in large scale optimization. SIAM J. Numer. Anal. 20, 626–637 (1983)
Street, W.N., Wolberg, W.H., Mangasarian, O.L.: Nuclear feature extraction for breast tumor diagnosis. IS&T/SPIE 1993 International Symposium on Electronic Imaging: Science and Technology, volume 1905, pages 861-870, San Jose, CA (1993)
Toint, Ph.L.: Towards an efficient sparsity exploiting Newton method for minimization. In: Duff, I.S. (ed.) Sparse Matrices and Their Uses, pp. 57–88. Academic Press, London (1981)
Walmag, J.M.B., Dellez, E.J.M.: A trust-region method applied to parameter identification of a simple prey-predator model. Appl. Math. Model. 29, 289–307 (2005)
Wu, X., Ward, R., Bottou, L.: WNGrad: learn the learning rate in gradient descent. ArXiv:1803.02865, November 2020
Yuan, Y.: Recent advances in trust region algorithms. Math. Program. Ser. B 151, 249–281 (2015)
Zhang, H., Li, X., Song, H., Liu, S.: An adaptive subspace trust-region method for frequency-domain seismic full waveform inversion. Comput. Geosci. 78, 1–14 (2015)
Acknowledgements
The authors are very grateful to the two anonymous referees, whose comments helped to improve the paper.
Funding
G. N. Grapiglia was partially supported by the National Council for Scientific and Technological Development (CNPq) - Brazil (Grant 312777/2020-5). G.F.D. Stella was supported by the Coordination for the Improvement of Higher Education Personnel (CAPES) - Brazil.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix
Appendix
1.1 Proof of Lemma 4
By definition, \(k_{i}\le q\). If \(k_{i}\ge q-1\), then \(|I(k_{i},q)|\le 2\) and so (2.22) holds. Now, suppose that \(k_{i}<q-1\). By (2.4) we have
Summing up these equalities, it follows from (2.11) and (2.21) that
and so
Since \(b_q<{\tilde{L}}\), we obtain
which gives
Therefore, (2.22) also holds in this case. \(\square\)
1.2 Proof of Lemma 5
Let \(k\in I(p+1,k_{i+1}-1)\). Then, by (2.4), \(b_{j}\ge {\tilde{L}}\) for \(j=p,\ldots ,k-1\). Consequently, by Lemma 3, we have
Summing up these inequalities we get
and so, by A2,
On the other hand, by (2.4) and A1, we also have
By (2.2),
Then, combining (4.3) and (4.4) and using \(b_{j}\ge {\tilde{L}}\), it follows that
Now, combining (4.5) and (4.2), we obtain
and so
Our first goal is to refine the upper bound for \(b_{k}\) in (4.6). For that, we will break the analysis into a few cases and subcases related with the position of p in the set \(I(k_{i},k_{i+1}-2)\).
Case I \(p=k_{i}\).
In this case, it follows from (4.6) that
Case II \(p\in I(k_{i}+1,k_{i+1}-2)\).
By A1 and the trust-region constraint, we have
Moreover, given \(j\in I(k_{i},p-1)\), we also have
Case II(a) \(p=k_{i}+1\).
In this case, by (4.8) we have
On the other hand, by (4.9) with \(j=k_{i}\), we get
Thus, combining (4.6), (4.10) and (4.11), it follows that
for all \(k\in I(p,k_{i+1}-1)\).
Case II(b) \(p\in I(k_{i}+2,k_{i+1}-2)\).
In this case, \(b_{p-1}\ge b_{p-2}\) and so, by (4.8), we get
Since \(b_{\min }\le b_{p-1}<{\tilde{L}}\), it follows that
On the other hand, by (4.9) we have
and so
Thus, combining (4.6), (4.13) and (4.14), it follows that
for all \(k\in I(p,k_{i+1}-1)\).
Summarizing all cases and subcases above, it follows from (4.7), (4.12), and (4.15) that
for all \(k\in I(p,k_{i+1}-1)\), regardless of the position of p in the set \(I(k_{i},k_{i+1}-2)\). Finally, by (2.4) and Lemma 3,
Summing up these inequalities, it follows from A2, (2.11) and (4.16) that
By (4.11) and (4.14), we also have
Thus, combining (4.17), (4.18) and using \(\Vert \nabla f(x_{k_{i}})\Vert \le \Vert \nabla f(x_{0})\Vert\) (by Lemma 1), we get
\(\square\)
Rights and permissions
About this article
Cite this article
Grapiglia, G.N., Stella, G.F.D. An adaptive trust-region method without function evaluations. Comput Optim Appl 82, 31–60 (2022). https://doi.org/10.1007/s10589-022-00356-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10589-022-00356-0