Skip to main content
Log in

An adaptive trust-region method without function evaluations

  • Published:
Computational Optimization and Applications Aims and scope Submit manuscript

Abstract

In this paper we propose an adaptive trust-region method for smooth unconstrained optimization. The update rule for the trust-region radius relies only on gradient evaluations. Assuming that the gradient of the objective function is Lipschitz continuous, we establish worst-case complexity bounds for the number of gradient evaluations required by the proposed method to generate approximate stationary points. As a corollary, we establish a global convergence result. We also present numerical results on benchmark problems. In terms of the number of calls of the oracle, the proposed method compares favorably with trust-region methods that use evaluations of the objective function.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Data availability

The data used to form the test problems in Subsection 3.2 are freely available in [12, 28].

Notes

  1. The performance profiles were generated using the code perf.m freely available in the website http://www.mcs.anl.gov/~more/cops/.

  2. Looking closely problems Powell badly scaled, Brown badly scaled and Meyer, we see that for these problems \(\Vert \nabla f(x_{0})\Vert \ge 10^{3}\). Due to (2.4) and (2.2), large values for \(\Vert \nabla f(x_{0})\Vert\) make \(\varDelta _{k}\) become extremely small very quickly, which severely slows down the progress of the iterates towards stationary points. This remark suggests that when initializing AdaTrust2, starting points with very large norm of the gradient should be avoided.

References

  1. Abdelhamid, N., Ayesh, A., Thabtah, F.: Phishing detection based Associative Classification data mining. Expert Syst. Appl. 41, 5948–5959 (2014)

    Article  Google Scholar 

  2. Aeberhard, S., Coomans, D., de Vel, O.: Comparison of Classifiers in High Dimensional Settings, Tech. Rep. no. 92-02, (1992), Dept. of Computer Science and Dept. of Mathematics and Statistics, James Cook University of North Queensland

  3. Balima, O., Boulanger, J., Charette, A., Marceau, D.: New developments in frequency domain optical tomography. Part II: application with a L-BFGS associated to an inexact line search. J. Quant. Spectrosc. Radiat. Transf. 112, 1235–1240 (2011)

  4. Bartholomew-Biggs, M., Brown, S., Christianson, B., Dixon, L.: Automatic differentiation of algorithms. J. Comput. Appl. Math. 124, 171–190 (2000)

    Article  MathSciNet  Google Scholar 

  5. Baydin, A.G., Pearlmutter, B.A., Radul, A.A., Siskind, J.M.: Automatic differentiation in machine learning: a survey. J. Mach. Learn. Res. 18, 1–43 (2018)

    MATH  Google Scholar 

  6. Birgin, E.G., Gardenghi, J.L., Martínez, J.M., Santos, S.A.: On the use of third-order models with fourth-order regularization for unconstrained optimization. Optim. Lett. 14, 815–838 (2020)

    Article  MathSciNet  Google Scholar 

  7. Charytanowicz, M., Niewczas, J., Kulczycki, P., Kowalski, P.A., Lukasik, S., Zak, S.: A complete gradient clustering algorithm for features analysis of X-ray images. In: Pietka, E., Kawa, J. (eds.) Information Technologies in Biomedicine, pp. 15–24. Springer, Berlin (2010)

    Chapter  Google Scholar 

  8. Conn, A.R., Gould, N.I.M., Toint, Ph.L.: Trust-Region Methods. SIAM, Philadelphia (2000)

    Book  Google Scholar 

  9. Dietterich, T.G., Lathrop, R.H., Lozano-Perez, T.: Solving the multiple-instance problem with axis-parallel rectangles. Artif. Intell. 89, 31–71 (1997)

    Article  Google Scholar 

  10. Ding, J., Pan, Z., Chen, L.: Parameter identification of multibody systems based on second order sensitivity analysis. Int. J. Non-Linear Mech. 47, 1105–1110 (2012)

    Article  Google Scholar 

  11. Dolan, E., Moré, J.J.: Benchmarking optimization software with performance profiles. Math. Program. 91, 201–2013 (2002)

    Article  MathSciNet  Google Scholar 

  12. Dua, D., Graff, C.: UCI Machine Learning Repository. University of California, School of Information and Computer Science, Irvine (2019). http://archive.ics.uci.edu/ml

  13. Fan, J., Yuan, Y.: A new trust region algorithm with trust region radius converging to zero. In: Li, D. (ed.) Proceeding of the 5th International Conference on Optimization: Techiniques and Applications, pp. 786-794. Hong Kong (2001)

  14. Fisher, R.A.: The use of multiple measurements in taxonomic problems. Annu. Eugenics 7(Part II), 179–188 (1936)

  15. Fletcher, R.: An efficient, global convergent algorithm for unconstrained and linearly constrained optimization problems. Technical Report TP 431, AERE, Harwell Laboratory, Oxfordshire, England (1970)

  16. Fletcher, R.: Practical Methods of Optimization, Volume 1: Unconstrained Optimization. Wiley, Chichester, England (1980)

  17. Gorman, R.P., Sejnowski, T.J.: Analysis of hidden units in a layered network trained to classify sonar targets. Neural Netw. 1, 75–89 (1988)

    Article  Google Scholar 

  18. Grapiglia, G.N., Yuan, J., Yuan, Y.: On the convergence and worst-case complexity of trust-region and regularization methods for unconstrained optimization. Math. Program. 152, 491–520 (2015)

    Article  MathSciNet  Google Scholar 

  19. Grapiglia, G.N., Yuan, J., Yuan, Y.: Nonlinear stepsize control algorithms: complexity bounds for first-and second-order optimality. J. Optim. Theory Appl. 171, 980–997 (2016)

    Article  MathSciNet  Google Scholar 

  20. Gratton, S., Sartenaer, A., Toint, Ph.L.: Recursive trust-region methods for multiscale nonlinear optimization. SIAM J. Optim. 19, 414–444 (2008)

    Article  MathSciNet  Google Scholar 

  21. Griewank, A., Walther, A.: Evaluationg Derivatives: Principles and Techniques of Algorithmic Differentiation. SIAM, Philadelphia (2008)

    Book  Google Scholar 

  22. Hebden, M.D.: An algorithm for minimization using exact second order derivatives. Technical Report TP 515, AERE, Harwell Laboratory, Oxfordshire, England (1973)

  23. Heusinger, A., Kanzow, C.: Optimization reformulations of the generalized Nash equilibrium problem using Nikaido-Isoda-type functions. Comput. Optim. Appl. 43, 353–377 (2009)

    Article  MathSciNet  Google Scholar 

  24. Koziel, S., Mosler, F., Reitzinger, S., Thoma, P.: Robust microwave design optimization using adjoint sensitivity and trust regions. Int. J. RF Microwave Comput. Aided Eng. 22, 10–19 (2012)

    Article  Google Scholar 

  25. Moré, J.J., Garbow, B.S., Hillstrom, K.E.: Testing unconstrained optimization software. ACM Trans. Math. Softw. 7, 17–41 (1981)

    Article  MathSciNet  Google Scholar 

  26. Powell, M.J.D.: A new algorithm for unconstrained optimization. In: Rosen, J.B., Mangasarian, O.L., Ritter, K. (eds.) Nonlinear Programming, pp. 31–66. Academic Press, New York (1970)

    Chapter  Google Scholar 

  27. Powell, M.J.D.: Convergence properties of a class of minimization algorithms. In: O.L. Mangasarian, R.R. Meyer and S.M. Robinson (eds.) Nonlinear Programming, pp. 1–27 (1975)

  28. Rossi, R.A., Ahmed, N.K.: The network data repository with interative graph analytics and visualization (2015). http://networkrepository.com

  29. Sigillito, V.G., Wing, S.P., Hutton, L.V., Baker, K.B.: Classification of radar returns from the ionosphere using neural networks. Johns Hopkins APL Tech. Digest 10, 262–266 (1989)

    Google Scholar 

  30. Steihaug, T.: The conjugate gradient and trust regions in large scale optimization. SIAM J. Numer. Anal. 20, 626–637 (1983)

    Article  MathSciNet  Google Scholar 

  31. Street, W.N., Wolberg, W.H., Mangasarian, O.L.: Nuclear feature extraction for breast tumor diagnosis. IS&T/SPIE 1993 International Symposium on Electronic Imaging: Science and Technology, volume 1905, pages 861-870, San Jose, CA (1993)

  32. Toint, Ph.L.: Towards an efficient sparsity exploiting Newton method for minimization. In: Duff, I.S. (ed.) Sparse Matrices and Their Uses, pp. 57–88. Academic Press, London (1981)

    Google Scholar 

  33. Walmag, J.M.B., Dellez, E.J.M.: A trust-region method applied to parameter identification of a simple prey-predator model. Appl. Math. Model. 29, 289–307 (2005)

    Article  Google Scholar 

  34. Wu, X., Ward, R., Bottou, L.: WNGrad: learn the learning rate in gradient descent. ArXiv:1803.02865, November 2020

  35. Yuan, Y.: Recent advances in trust region algorithms. Math. Program. Ser. B 151, 249–281 (2015)

    Article  MathSciNet  Google Scholar 

  36. Zhang, H., Li, X., Song, H., Liu, S.: An adaptive subspace trust-region method for frequency-domain seismic full waveform inversion. Comput. Geosci. 78, 1–14 (2015)

    Article  Google Scholar 

Download references

Acknowledgements

The authors are very grateful to the two anonymous referees, whose comments helped to improve the paper.

Funding

G. N. Grapiglia was partially supported by the National Council for Scientific and Technological Development (CNPq) - Brazil (Grant 312777/2020-5). G.F.D. Stella was supported by the Coordination for the Improvement of Higher Education Personnel (CAPES) - Brazil.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Geovani N. Grapiglia.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Appendix

1.1 Proof of Lemma 4

By definition, \(k_{i}\le q\). If \(k_{i}\ge q-1\), then \(|I(k_{i},q)|\le 2\) and so (2.22) holds. Now, suppose that \(k_{i}<q-1\). By (2.4) we have

$$\begin{aligned} b_{j+1}-b_{j}=\dfrac{\Vert \nabla f(x_{j+1})\Vert ^{2}}{b_{j}}\quad \text {for}\quad j=k_{i},\ldots ,q-1. \end{aligned}$$

Summing up these equalities, it follows from (2.11) and (2.21) that

$$\begin{aligned} b_{q}-b_{k_{i}}=\sum _{j\in I(k_{i},q-1)}\dfrac{\Vert \nabla f(x_{j+1})\Vert ^{2}}{b_{j}}> |I(k_{i},q-1)|{\tilde{L}}^{-1}\epsilon ^{2}, \end{aligned}$$

and so

$$\begin{aligned} |I(k_{i},q-1)|<{\tilde{L}}\left( b_{q}-b_{k_{i}}\right) \epsilon ^{-2}<{\tilde{L}}b_{q}\epsilon ^{-2}. \end{aligned}$$

Since \(b_q<{\tilde{L}}\), we obtain

$$\begin{aligned} |I(k_{i},q-1)|< {\tilde{L}}^{2}\epsilon ^{-2}, \end{aligned}$$

which gives

$$\begin{aligned} |I(k_{i},q)|<{\tilde{L}}^{2}\epsilon ^{-2}+1, \end{aligned}$$

Therefore, (2.22) also holds in this case. \(\square\)

1.2 Proof of Lemma 5

Let \(k\in I(p+1,k_{i+1}-1)\). Then, by (2.4), \(b_{j}\ge {\tilde{L}}\) for \(j=p,\ldots ,k-1\). Consequently, by Lemma 3, we have

$$\begin{aligned} f(x_{j})-f(x_{j+1})\ge \dfrac{\Vert \nabla f(x_{j})\Vert ^{2}}{4b_{j}}\quad \text {for all}\quad j=p,\ldots ,k-1. \end{aligned}$$

Summing up these inequalities we get

$$\begin{aligned} f(x_{p})-f(x_{k})\ge \dfrac{1}{4}\sum _{j=p}^{k-1}\dfrac{\Vert \nabla f(x_{j})\Vert ^{2}}{b_{j}}, \end{aligned}$$
(4.1)

and so, by A2,

$$\begin{aligned} \sum _{j=p}^{k-1}\dfrac{\Vert \nabla f(x_{j})\Vert ^{2}}{b_{j}}\le 4\left( f(x_{p})-f(x_{k})\right) \le 4\left( f(x_{p})-f_{low}\right) . \end{aligned}$$
(4.2)

On the other hand, by (2.4) and A1, we also have

$$\begin{aligned} b_{k}-b_{p}= & {} \sum _{j=p}^{k-1}\left( b_{j+1}-b_{j}\right) =\sum _{j=p}^{k-1}\dfrac{\Vert \nabla f(x_{j+1})\Vert ^{2}}{b_{j}}\nonumber \\\le & {} \sum _{j=p}^{k-1}\dfrac{\left( \Vert \nabla f(x_{j})-\nabla f(x_{j+1})\Vert +\Vert \nabla f(x_{j})\Vert \right) ^{2}}{b_{j}}\nonumber \\\le & {} 2\sum _{j=p}^{k-1}\dfrac{\Vert \nabla f(x_{j})-\nabla f(x_{j+1})\Vert ^{2}+\Vert \nabla f(x_{j})\Vert ^{2}}{b_{j}}\nonumber \\\le & {} 2\sum _{j=p}^{k-1}\dfrac{L^{2}\Vert x_{j}-x_{j+1}\Vert ^{2}+\Vert \nabla f(x_{j})\Vert ^{2}}{b_{j}}. \end{aligned}$$
(4.3)

By (2.2),

$$\begin{aligned} \Vert x_{j}-x_{j+1}\Vert =\Vert d_{j}\Vert \le \delta _{j}\Vert \nabla f(x_{j})\Vert =\dfrac{\Vert \nabla f(x_{j})\Vert }{b_{j}}. \end{aligned}$$
(4.4)

Then, combining (4.3) and (4.4) and using \(b_{j}\ge {\tilde{L}}\), it follows that

$$\begin{aligned} b_{k}-b_{p}\le & {} 2\sum _{j=p}^{k-1}\dfrac{L^{2}\Vert \nabla f(x_{j})\Vert ^{2}}{b_{j}^{3}}+2\sum _{j=p}^{k-1}\dfrac{\Vert \nabla f(x_{j})\Vert ^{2}}{b_{j}}\nonumber \\\le & {} 2\sum _{j=p}^{k-1}\dfrac{{\tilde{L}}^{2}\Vert \nabla f(x_{j})\Vert ^{2}}{b_{j}^{2}b_{j}}+2\sum _{j=p}^{k-1}\dfrac{\Vert \nabla f(x_{j})\Vert ^{2}}{b_{j}}\nonumber \\\le & {} 4\sum _{j=p}^{k-1}\dfrac{\Vert \nabla f(x_{j})\Vert ^{2}}{b_{j}}. \end{aligned}$$
(4.5)

Now, combining (4.5) and (4.2), we obtain

$$\begin{aligned} b_{k}-b_{p}\le 16\left( f(x_{p})-f_{low}\right) , \end{aligned}$$

and so

$$\begin{aligned} b_{k}\le b_{p}+16\left( f(x_{p})-f_{low}\right) ,\quad \forall k\in I(p,k_{i+1}-1). \end{aligned}$$
(4.6)

Our first goal is to refine the upper bound for \(b_{k}\) in (4.6). For that, we will break the analysis into a few cases and subcases related with the position of p in the set \(I(k_{i},k_{i+1}-2)\).

Case I \(p=k_{i}\).

In this case, it follows from (4.6) that

$$\begin{aligned} b_{k}\le b_{k_{i}}+16(f(x_{k_{i}})-f_{low})\quad \forall k\in I(p,k_{i+1}-1). \end{aligned}$$
(4.7)

Case II \(p\in I(k_{i}+1,k_{i+1}-2)\).

By A1 and the trust-region constraint, we have

$$\begin{aligned} b_{p}= & {} b_{p-1}+\dfrac{\Vert \nabla f(x_{p})\Vert ^{2}}{b_{p-1}}\nonumber \\\le & {} b_{p-1}+\dfrac{\left( \Vert \nabla f(x_{p-1})-\nabla f(x_{p})\Vert +\Vert \nabla f(x_{p-1})\Vert \right) ^{2}}{b_{p-1}}\nonumber \\\le & {} b_{p-1}+2\dfrac{\Vert \nabla f(x_{p-1})-\nabla f(x_{p})\Vert ^{2}+\Vert \nabla f(x_{p-1})\Vert ^{2}}{b_{p-1}}\nonumber \\\le & {} b_{p-1}+2\dfrac{L^{2}\Vert x_{p-1}-x_{p}\Vert ^{2}+\Vert \nabla f(x_{p-1})\Vert ^{2}}{b_{p-1}}\nonumber \\\le & {} b_{p-1}+\dfrac{2L^{2}\Vert \nabla f(x_{p-1})\Vert ^{2}}{b_{p-1}^{3}}+\dfrac{2\Vert \nabla f(x_{p-1})\Vert ^{2}}{b_{p-1}}. \end{aligned}$$
(4.8)

Moreover, given \(j\in I(k_{i},p-1)\), we also have

$$\begin{aligned} f(x_{j+1})\le & {} f(x_{j})+\nabla f(x_{j})^{T}(x_{j+1}-x_{j})+\dfrac{L}{2}\Vert x_{j+1}-x_{j}\Vert ^{2}\nonumber \\\le & {} f(x_{j})+\Vert \nabla f(x_{j})\Vert \Vert d_{j}\Vert +\dfrac{L}{2}\Vert d_{j}\Vert ^{2}\nonumber \\\le & {} f(x_{j})+\dfrac{\Vert \nabla f(x_{j})\Vert ^{2}}{b_{j}}+\dfrac{L}{2b_{j}}\dfrac{\Vert \nabla f(x_{j})\Vert ^{2}}{b_{j}}\nonumber \\= & {} f(x_{j})+\left( 1 + \dfrac{L}{2b_j} \right) \dfrac{\Vert \nabla f(x_{j})\Vert ^{2}}{b_{j}}\nonumber \\\le & {} f(x_{j})+\left( 1 + \dfrac{L}{2b_{\min }} \right) \dfrac{\Vert \nabla f(x_{j})\Vert ^{2}}{b_{j}}. \end{aligned}$$
(4.9)

Case II(a) \(p=k_{i}+1\).

In this case, by (4.8) we have

$$\begin{aligned} b_{p}\le b_{k_{i}}+\dfrac{2L^{2}\Vert \nabla f(x_{k_{i}})\Vert ^{2}}{b_{k_{i}}^{3}}+\dfrac{2\Vert \nabla f(x_{k_{i}})\Vert ^{2}}{b_{k_{i}}}. \end{aligned}$$
(4.10)

On the other hand, by (4.9) with \(j=k_{i}\), we get

$$\begin{aligned} f(x_{p})\le f(x_{k_{i}})+\left( 1+\dfrac{L}{2b_{\min }}\right) \dfrac{\Vert \nabla f(x_{k_{i}})\Vert ^{2}}{b_{k_{i}}}. \end{aligned}$$
(4.11)

Thus, combining (4.6), (4.10) and (4.11), it follows that

$$\begin{aligned} b_{k}\le & {} b_{k_{i}}+\dfrac{2L^{2}\Vert \nabla f(x_{k_{i}})\Vert ^{2}}{b_{k_{i}}^{3}}+\dfrac{2\Vert \nabla f(x_{k_{i}})\Vert ^{2}}{b_{k_{i}}}\nonumber \\&+16\left[ \left( 1+\dfrac{L}{2b_{\min }}\right) \dfrac{\Vert \nabla f(x_{k_{i}})\Vert ^{2}}{b_{k_{i}}}+f(x_{k_{i}})-f_{low}\right] \nonumber \\= & {} b_{k_{i}}+\left( \dfrac{4L^{2}+36b_{\min }^{2}+16b_{min}L}{2b_{\min }^{3}}\right) \Vert \nabla f(x_{k_{i}})\Vert ^{2}+16\left( f(x_{k_{i}})-f_{low}\right) \end{aligned}$$
(4.12)

for all \(k\in I(p,k_{i+1}-1)\).

Case II(b) \(p\in I(k_{i}+2,k_{i+1}-2)\).

In this case, \(b_{p-1}\ge b_{p-2}\) and so, by (4.8), we get

$$\begin{aligned} b_{p}\le & {} b_{p-1}+\dfrac{2L^{2}\Vert \nabla f(x_{p-1})\Vert ^{2}}{b_{p-1}^{2}b_{p-2}}+\dfrac{2\Vert \nabla f(x_{p-1})\Vert ^{2}}{b_{p-2}}\\= & {} b_{p-1}+\dfrac{2L^{2}(b_{p-1}-b_{p-2})}{b_{p-1}^{2}}+\dfrac{2\Vert \nabla f(x_{p-1})\Vert ^{2}}{b_{p-2}}\\< & {} b_{p-1}+\dfrac{2L^{2}}{b_{p-1}}+2(b_{p-1}-b_{p-2})\\< & {} 3b_{p-1}+\dfrac{2L^{2}}{b_{p-1}}. \end{aligned}$$

Since \(b_{\min }\le b_{p-1}<{\tilde{L}}\), it follows that

$$\begin{aligned} b_{p}\le 3{\tilde{L}}+\dfrac{2L^{2}}{b_{\min }}. \end{aligned}$$
(4.13)

On the other hand, by (4.9) we have

$$\begin{aligned} f(x_{p})-f(x_{k_{i}})= & {} \sum _{j=k_{i}}^{p-1}f(x_{j+1})-f(x_{j})\\\le & {} \left( 1+\dfrac{L}{2b_{\min }}\right) \sum _{j=k_{i}}^{p-1}\dfrac{\Vert \nabla f(x_{j})\Vert ^{2}}{b_{j}}\\= & {} \left( 1+\dfrac{L}{2b_{\min }}\right) \dfrac{\Vert \nabla f(x_{k_{i}})\Vert ^{2}}{b_{k_{i}}}+\left( 1+\dfrac{L}{2b_{\min }}\right) \sum _{j=k_{i}+1}^{p-1}\dfrac{\Vert \nabla f(x_{j})\Vert ^{2}}{b_{j}}\\\le & {} \left( 1+\dfrac{L}{2b_{\min }}\right) \dfrac{\Vert \nabla f(x_{k_{i}})\Vert ^{2}}{b_{k_{i}}}+\left( 1+\dfrac{L}{2b_{\min }}\right) \sum _{j=k_{i}+1}^{p-1}\dfrac{\Vert \nabla f(x_{j})\Vert ^{2}}{b_{j-1}}\\= & {} \left( 1+\dfrac{L}{2b_{\min }}\right) \dfrac{\Vert \nabla f(x_{k_{i}})\Vert ^{2}}{b_{k_{i}}}+\left( 1+\dfrac{L}{2b_{\min }}\right) \sum _{j=k_{i}+1}^{p-1}(b_{j}-b_{j-1})\\= & {} \left( 1+\dfrac{L}{2b_{\min }}\right) \dfrac{\Vert \nabla f(x_{k_{i}})\Vert ^{2}}{b_{k_{i}}}+\left( 1+\dfrac{L}{2b_{\min }}\right) (b_{p-1}-b_{k_{i}})\\< & {} \left( 1+\dfrac{L}{2b_{\min }}\right) \dfrac{\Vert \nabla f(x_{k_{i}})\Vert ^{2}}{b_{k_{i}}}+\left( 1+\dfrac{L}{2b_{\min }}\right) b_{p-1}\\< & {} \left( 1+\dfrac{L}{2b_{\min }}\right) \dfrac{\Vert \nabla f(x_{k_{i}})\Vert ^{2}}{b_{k_{i}}}+\left( 1+\dfrac{L}{2b_{\min }}\right) {\tilde{L}}. \end{aligned}$$

and so

$$\begin{aligned} f(x_{p})\le f(x_{k_{i}})+\left( 1+\dfrac{L}{2b_{\min }}\right) \dfrac{\Vert \nabla f(x_{k_{i}})\Vert ^{2}}{b_{\min }}+\left( 1+\dfrac{L}{2b_{\min }}\right) {\tilde{L}}. \end{aligned}$$
(4.14)

Thus, combining (4.6), (4.13) and (4.14), it follows that

$$\begin{aligned} b_{k}\le & {} 3{\tilde{L}}+\dfrac{2L^{2}}{b_{\min }}+16\left[ \left( 1+\dfrac{L}{2b_{\min }}\right) \dfrac{\Vert \nabla f(x_{k_{i}})\Vert ^{2}}{b_{\min }}+\left( 1+\dfrac{L}{2b_{\min }}\right) {\tilde{L}} + f(x_{k_{i}})-f_{low}\right] \nonumber \\\le & {} 19{\tilde{L}}+\dfrac{10{\tilde{L}}^{2}}{b_{\min }}+16\left( 1+\dfrac{L}{2b_{\min }}\right) \dfrac{\Vert \nabla f(x_{k_{i}})\Vert ^{2}}{b_{\min }}+16(f(x_{k_{i}})-f_{low}), \end{aligned}$$
(4.15)

for all \(k\in I(p,k_{i+1}-1)\).

Summarizing all cases and subcases above, it follows from (4.7), (4.12), and (4.15) that

$$\begin{aligned} b_{k}\le b_{k_{i}}+19{\tilde{L}}+\dfrac{10{\tilde{L}}^{2}}{b_{\min }}+\left( \dfrac{4L^{2}+36b_{\min }^{2}+16b_{\min }L}{2b_{\min }^{3}}\right) \Vert \nabla f(x_{k_{i}})\Vert ^{2}+16(f(x_{k_{i}})-f_{low}) \end{aligned}$$
(4.16)

for all \(k\in I(p,k_{i+1}-1)\), regardless of the position of p in the set \(I(k_{i},k_{i+1}-2)\). Finally, by (2.4) and Lemma 3,

$$\begin{aligned} f(x_{j})-f(x_{j+1})\ge \dfrac{\Vert \nabla f(x_{j})\Vert ^{2}}{4b_{j}}\quad \text {for}\,\,j=p,\ldots ,k_{i+1}-1. \end{aligned}$$

Summing up these inequalities, it follows from A2, (2.11) and (4.16) that

$$\begin{aligned} f(x_{p})-f_{low}\ge & {} f(x_{p})-f(x_{k_{i+1}})\nonumber \\\ge & {} \sum _{j=p}^{k_{i+1}-1}\dfrac{\Vert \nabla f(x_{j})\Vert ^{2}}{4b_{j}}\nonumber \\\ge & {} \dfrac{|I(p,k_{i+1}-1)|\epsilon ^{2}}{4\left[ b_{k_{i}}+19{\tilde{L}}+\dfrac{10{\tilde{L}}^{2}}{b_{\min }}+\left( \dfrac{4L^{2}+36b_{\min }^{2}+16b_{\min }L}{2b_{\min }^{3}}\right) \Vert \nabla f(x_{k_{i}})\Vert ^{2}+16(f(x_{k_{i}})-f_{low})\right] }\nonumber \\&\end{aligned}$$
(4.17)

By (4.11) and (4.14), we also have

$$\begin{aligned} f(x_{p})-f_{low}\le & {} \left( 1+\dfrac{L}{2b_{\min }}\right) {\tilde{L}}+\left( 1+\dfrac{L}{2b_{\min }}\right) \dfrac{\Vert \nabla f(x_{k_{i}})\Vert ^{2}}{b_{\min }}+(f(x_{k_{i}})-f_{low})\nonumber \\\le & {} {\tilde{L}}+\dfrac{{\tilde{L}}^{2}}{2b_{\min }}+\left( 1+\dfrac{L}{2b_{\min }}\right) \dfrac{\Vert \nabla f(x_{k_{i}})\Vert ^{2}}{b_{\min }}+(f(x_{k_{i}})-f_{low})\nonumber \\\le & {} b_{k_{i}}+19{\tilde{L}}+\dfrac{10{\tilde{L}}^{2}}{b_{\min }}+\left( \dfrac{4L^{2}+36b_{\min }^{2}+16b_{\min }L}{2b_{\min }^{3}}\right) \Vert \nabla f(x_{k_{i}})\Vert ^{2}+16(f(x_{k_{i}})-f_{low}). \end{aligned}$$
(4.18)

Thus, combining (4.17), (4.18) and using \(\Vert \nabla f(x_{k_{i}})\Vert \le \Vert \nabla f(x_{0})\Vert\) (by Lemma 1), we get

$$\begin{aligned} |I(p,k_{i+1}-1)|\le 4\left[ b_{k_{i}}+19{\tilde{L}}+\dfrac{10{\tilde{L}}^{2}}{b_{\min }}+\left( \dfrac{4L^{2}+36b_{\min }^{2}+16b_{\min }L}{2b_{\min }^{3}}\right) \Vert \nabla f(x_{k_{i}})\Vert ^{2}+16(f(x_{k_{i}})-f_{low})\right] ^{2}\epsilon ^{-2}. \end{aligned}$$

\(\square\)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Grapiglia, G.N., Stella, G.F.D. An adaptive trust-region method without function evaluations. Comput Optim Appl 82, 31–60 (2022). https://doi.org/10.1007/s10589-022-00356-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10589-022-00356-0

Keywords

Navigation