An adaptive trust-region method without function evaluations

Grapiglia, Geovani N.; Stella, Gabriel F. D.

doi:10.1007/s10589-022-00356-0

An adaptive trust-region method without function evaluations

Published: 02 March 2022

Volume 82, pages 31–60, (2022)
Cite this article

Computational Optimization and Applications Aims and scope Submit manuscript

785 Accesses
5 Citations
1 Altmetric
Explore all metrics

Abstract

In this paper we propose an adaptive trust-region method for smooth unconstrained optimization. The update rule for the trust-region radius relies only on gradient evaluations. Assuming that the gradient of the objective function is Lipschitz continuous, we establish worst-case complexity bounds for the number of gradient evaluations required by the proposed method to generate approximate stationary points. As a corollary, we establish a global convergence result. We also present numerical results on benchmark problems. In terms of the number of calls of the oracle, the proposed method compares favorably with trust-region methods that use evaluations of the objective function.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

The Frank-Wolfe Algorithm: A Short Introduction

Article Open access 13 December 2023

Random Gradient-Free Minimization of Convex Functions

Article 30 November 2015

On convergence of iterative thresholding algorithms to approximate sparse solution for composite nonconvex optimization

Article Open access 06 March 2024

Data availability

The data used to form the test problems in Subsection 3.2 are freely available in [12, 28].

Notes

The performance profiles were generated using the code perf.m freely available in the website http://www.mcs.anl.gov/~more/cops/.
Looking closely problems Powell badly scaled, Brown badly scaled and Meyer, we see that for these problems $\Vert \nabla f(x_{0})\Vert \ge 10^{3}$. Due to (2.4) and (2.2), large values for $\Vert \nabla f(x_{0})\Vert$ make $\varDelta _{k}$ become extremely small very quickly, which severely slows down the progress of the iterates towards stationary points. This remark suggests that when initializing AdaTrust2, starting points with very large norm of the gradient should be avoided.

References

Abdelhamid, N., Ayesh, A., Thabtah, F.: Phishing detection based Associative Classification data mining. Expert Syst. Appl. 41, 5948–5959 (2014)
Article Google Scholar
Aeberhard, S., Coomans, D., de Vel, O.: Comparison of Classifiers in High Dimensional Settings, Tech. Rep. no. 92-02, (1992), Dept. of Computer Science and Dept. of Mathematics and Statistics, James Cook University of North Queensland
Balima, O., Boulanger, J., Charette, A., Marceau, D.: New developments in frequency domain optical tomography. Part II: application with a L-BFGS associated to an inexact line search. J. Quant. Spectrosc. Radiat. Transf. 112, 1235–1240 (2011)
Bartholomew-Biggs, M., Brown, S., Christianson, B., Dixon, L.: Automatic differentiation of algorithms. J. Comput. Appl. Math. 124, 171–190 (2000)
Article MathSciNet Google Scholar
Baydin, A.G., Pearlmutter, B.A., Radul, A.A., Siskind, J.M.: Automatic differentiation in machine learning: a survey. J. Mach. Learn. Res. 18, 1–43 (2018)
MATH Google Scholar
Birgin, E.G., Gardenghi, J.L., Martínez, J.M., Santos, S.A.: On the use of third-order models with fourth-order regularization for unconstrained optimization. Optim. Lett. 14, 815–838 (2020)
Article MathSciNet Google Scholar
Charytanowicz, M., Niewczas, J., Kulczycki, P., Kowalski, P.A., Lukasik, S., Zak, S.: A complete gradient clustering algorithm for features analysis of X-ray images. In: Pietka, E., Kawa, J. (eds.) Information Technologies in Biomedicine, pp. 15–24. Springer, Berlin (2010)
Chapter Google Scholar
Conn, A.R., Gould, N.I.M., Toint, Ph.L.: Trust-Region Methods. SIAM, Philadelphia (2000)
Book Google Scholar
Dietterich, T.G., Lathrop, R.H., Lozano-Perez, T.: Solving the multiple-instance problem with axis-parallel rectangles. Artif. Intell. 89, 31–71 (1997)
Article Google Scholar
Ding, J., Pan, Z., Chen, L.: Parameter identification of multibody systems based on second order sensitivity analysis. Int. J. Non-Linear Mech. 47, 1105–1110 (2012)
Article Google Scholar
Dolan, E., Moré, J.J.: Benchmarking optimization software with performance profiles. Math. Program. 91, 201–2013 (2002)
Article MathSciNet Google Scholar
Dua, D., Graff, C.: UCI Machine Learning Repository. University of California, School of Information and Computer Science, Irvine (2019). http://archive.ics.uci.edu/ml
Fan, J., Yuan, Y.: A new trust region algorithm with trust region radius converging to zero. In: Li, D. (ed.) Proceeding of the 5th International Conference on Optimization: Techiniques and Applications, pp. 786-794. Hong Kong (2001)
Fisher, R.A.: The use of multiple measurements in taxonomic problems. Annu. Eugenics 7(Part II), 179–188 (1936)
Fletcher, R.: An efficient, global convergent algorithm for unconstrained and linearly constrained optimization problems. Technical Report TP 431, AERE, Harwell Laboratory, Oxfordshire, England (1970)
Fletcher, R.: Practical Methods of Optimization, Volume 1: Unconstrained Optimization. Wiley, Chichester, England (1980)
Gorman, R.P., Sejnowski, T.J.: Analysis of hidden units in a layered network trained to classify sonar targets. Neural Netw. 1, 75–89 (1988)
Article Google Scholar
Grapiglia, G.N., Yuan, J., Yuan, Y.: On the convergence and worst-case complexity of trust-region and regularization methods for unconstrained optimization. Math. Program. 152, 491–520 (2015)
Article MathSciNet Google Scholar
Grapiglia, G.N., Yuan, J., Yuan, Y.: Nonlinear stepsize control algorithms: complexity bounds for first-and second-order optimality. J. Optim. Theory Appl. 171, 980–997 (2016)
Article MathSciNet Google Scholar
Gratton, S., Sartenaer, A., Toint, Ph.L.: Recursive trust-region methods for multiscale nonlinear optimization. SIAM J. Optim. 19, 414–444 (2008)
Article MathSciNet Google Scholar
Griewank, A., Walther, A.: Evaluationg Derivatives: Principles and Techniques of Algorithmic Differentiation. SIAM, Philadelphia (2008)
Book Google Scholar
Hebden, M.D.: An algorithm for minimization using exact second order derivatives. Technical Report TP 515, AERE, Harwell Laboratory, Oxfordshire, England (1973)
Heusinger, A., Kanzow, C.: Optimization reformulations of the generalized Nash equilibrium problem using Nikaido-Isoda-type functions. Comput. Optim. Appl. 43, 353–377 (2009)
Article MathSciNet Google Scholar
Koziel, S., Mosler, F., Reitzinger, S., Thoma, P.: Robust microwave design optimization using adjoint sensitivity and trust regions. Int. J. RF Microwave Comput. Aided Eng. 22, 10–19 (2012)
Article Google Scholar
Moré, J.J., Garbow, B.S., Hillstrom, K.E.: Testing unconstrained optimization software. ACM Trans. Math. Softw. 7, 17–41 (1981)
Article MathSciNet Google Scholar
Powell, M.J.D.: A new algorithm for unconstrained optimization. In: Rosen, J.B., Mangasarian, O.L., Ritter, K. (eds.) Nonlinear Programming, pp. 31–66. Academic Press, New York (1970)
Chapter Google Scholar
Powell, M.J.D.: Convergence properties of a class of minimization algorithms. In: O.L. Mangasarian, R.R. Meyer and S.M. Robinson (eds.) Nonlinear Programming, pp. 1–27 (1975)
Rossi, R.A., Ahmed, N.K.: The network data repository with interative graph analytics and visualization (2015). http://networkrepository.com
Sigillito, V.G., Wing, S.P., Hutton, L.V., Baker, K.B.: Classification of radar returns from the ionosphere using neural networks. Johns Hopkins APL Tech. Digest 10, 262–266 (1989)
Google Scholar
Steihaug, T.: The conjugate gradient and trust regions in large scale optimization. SIAM J. Numer. Anal. 20, 626–637 (1983)
Article MathSciNet Google Scholar
Street, W.N., Wolberg, W.H., Mangasarian, O.L.: Nuclear feature extraction for breast tumor diagnosis. IS&T/SPIE 1993 International Symposium on Electronic Imaging: Science and Technology, volume 1905, pages 861-870, San Jose, CA (1993)
Toint, Ph.L.: Towards an efficient sparsity exploiting Newton method for minimization. In: Duff, I.S. (ed.) Sparse Matrices and Their Uses, pp. 57–88. Academic Press, London (1981)
Google Scholar
Walmag, J.M.B., Dellez, E.J.M.: A trust-region method applied to parameter identification of a simple prey-predator model. Appl. Math. Model. 29, 289–307 (2005)
Article Google Scholar
Wu, X., Ward, R., Bottou, L.: WNGrad: learn the learning rate in gradient descent. ArXiv:1803.02865, November 2020
Yuan, Y.: Recent advances in trust region algorithms. Math. Program. Ser. B 151, 249–281 (2015)
Article MathSciNet Google Scholar
Zhang, H., Li, X., Song, H., Liu, S.: An adaptive subspace trust-region method for frequency-domain seismic full waveform inversion. Comput. Geosci. 78, 1–14 (2015)
Article Google Scholar

Download references

Acknowledgements

The authors are very grateful to the two anonymous referees, whose comments helped to improve the paper.

Funding

G. N. Grapiglia was partially supported by the National Council for Scientific and Technological Development (CNPq) - Brazil (Grant 312777/2020-5). G.F.D. Stella was supported by the Coordination for the Improvement of Higher Education Personnel (CAPES) - Brazil.

Author information

Authors and Affiliations

Université catholique de Louvain, ICTEAM/INMA, Avenue Georges Lemaître, 4-6/ L4.05.01, 1348, Louvain-la-Neuve, Belgium
Geovani N. Grapiglia
Programa de Pós-Graduação em Matemática, Centro Politécnico, Universidade Federal do Paraná, Cx. Postal 19.081, Curitiba, Paraná, 81531-980, Brazil
Gabriel F. D. Stella

Authors

Geovani N. Grapiglia
View author publications
You can also search for this author in PubMed Google Scholar
Gabriel F. D. Stella
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Geovani N. Grapiglia.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

1.1 Proof of Lemma 4

By definition, $k_{i}\le q$. If $k_{i}\ge q-1$, then $|I(k_{i},q)|\le 2$ and so (2.22) holds. Now, suppose that $k_{i}<q-1$. By (2.4) we have

$$\begin{aligned} b_{j+1}-b_{j}=\dfrac{\Vert \nabla f(x_{j+1})\Vert ^{2}}{b_{j}}\quad \text {for}\quad j=k_{i},\ldots ,q-1. \end{aligned}$$

Summing up these equalities, it follows from (2.11) and (2.21) that

$$\begin{aligned} b_{q}-b_{k_{i}}=\sum _{j\in I(k_{i},q-1)}\dfrac{\Vert \nabla f(x_{j+1})\Vert ^{2}}{b_{j}}> |I(k_{i},q-1)|{\tilde{L}}^{-1}\epsilon ^{2}, \end{aligned}$$

and so

$$\begin{aligned} |I(k_{i},q-1)|<{\tilde{L}}\left( b_{q}-b_{k_{i}}\right) \epsilon ^{-2}<{\tilde{L}}b_{q}\epsilon ^{-2}. \end{aligned}$$

Since $b_q<{\tilde{L}}$, we obtain

$$\begin{aligned} |I(k_{i},q-1)|< {\tilde{L}}^{2}\epsilon ^{-2}, \end{aligned}$$

which gives

$$\begin{aligned} |I(k_{i},q)|<{\tilde{L}}^{2}\epsilon ^{-2}+1, \end{aligned}$$

Therefore, (2.22) also holds in this case. $\square$

1.2 Proof of Lemma 5

Let $k\in I(p+1,k_{i+1}-1)$. Then, by (2.4), $b_{j}\ge {\tilde{L}}$ for $j=p,\ldots ,k-1$. Consequently, by Lemma 3, we have

$$\begin{aligned} f(x_{j})-f(x_{j+1})\ge \dfrac{\Vert \nabla f(x_{j})\Vert ^{2}}{4b_{j}}\quad \text {for all}\quad j=p,\ldots ,k-1. \end{aligned}$$

Summing up these inequalities we get

$$\begin{aligned} f(x_{p})-f(x_{k})\ge \dfrac{1}{4}\sum _{j=p}^{k-1}\dfrac{\Vert \nabla f(x_{j})\Vert ^{2}}{b_{j}}, \end{aligned}$$

(4.1)

and so, by A2,

$$\begin{aligned} \sum _{j=p}^{k-1}\dfrac{\Vert \nabla f(x_{j})\Vert ^{2}}{b_{j}}\le 4\left( f(x_{p})-f(x_{k})\right) \le 4\left( f(x_{p})-f_{low}\right) . \end{aligned}$$

(4.2)

On the other hand, by (2.4) and A1, we also have

$$\begin{aligned} b_{k}-b_{p}= & {} \sum _{j=p}^{k-1}\left( b_{j+1}-b_{j}\right) =\sum _{j=p}^{k-1}\dfrac{\Vert \nabla f(x_{j+1})\Vert ^{2}}{b_{j}}\nonumber \\\le & {} \sum _{j=p}^{k-1}\dfrac{\left( \Vert \nabla f(x_{j})-\nabla f(x_{j+1})\Vert +\Vert \nabla f(x_{j})\Vert \right) ^{2}}{b_{j}}\nonumber \\\le & {} 2\sum _{j=p}^{k-1}\dfrac{\Vert \nabla f(x_{j})-\nabla f(x_{j+1})\Vert ^{2}+\Vert \nabla f(x_{j})\Vert ^{2}}{b_{j}}\nonumber \\\le & {} 2\sum _{j=p}^{k-1}\dfrac{L^{2}\Vert x_{j}-x_{j+1}\Vert ^{2}+\Vert \nabla f(x_{j})\Vert ^{2}}{b_{j}}. \end{aligned}$$

(4.3)

By (2.2),

$$\begin{aligned} \Vert x_{j}-x_{j+1}\Vert =\Vert d_{j}\Vert \le \delta _{j}\Vert \nabla f(x_{j})\Vert =\dfrac{\Vert \nabla f(x_{j})\Vert }{b_{j}}. \end{aligned}$$

(4.4)

Then, combining (4.3) and (4.4) and using $b_{j}\ge {\tilde{L}}$, it follows that

$$\begin{aligned} b_{k}-b_{p}\le & {} 2\sum _{j=p}^{k-1}\dfrac{L^{2}\Vert \nabla f(x_{j})\Vert ^{2}}{b_{j}^{3}}+2\sum _{j=p}^{k-1}\dfrac{\Vert \nabla f(x_{j})\Vert ^{2}}{b_{j}}\nonumber \\\le & {} 2\sum _{j=p}^{k-1}\dfrac{{\tilde{L}}^{2}\Vert \nabla f(x_{j})\Vert ^{2}}{b_{j}^{2}b_{j}}+2\sum _{j=p}^{k-1}\dfrac{\Vert \nabla f(x_{j})\Vert ^{2}}{b_{j}}\nonumber \\\le & {} 4\sum _{j=p}^{k-1}\dfrac{\Vert \nabla f(x_{j})\Vert ^{2}}{b_{j}}. \end{aligned}$$

(4.5)

Now, combining (4.5) and (4.2), we obtain

$$\begin{aligned} b_{k}-b_{p}\le 16\left( f(x_{p})-f_{low}\right) , \end{aligned}$$

and so

$$\begin{aligned} b_{k}\le b_{p}+16\left( f(x_{p})-f_{low}\right) ,\quad \forall k\in I(p,k_{i+1}-1). \end{aligned}$$

(4.6)

Our first goal is to refine the upper bound for $b_{k}$ in (4.6). For that, we will break the analysis into a few cases and subcases related with the position of p in the set $I(k_{i},k_{i+1}-2)$.

Case I $p=k_{i}$.

In this case, it follows from (4.6) that

$$\begin{aligned} b_{k}\le b_{k_{i}}+16(f(x_{k_{i}})-f_{low})\quad \forall k\in I(p,k_{i+1}-1). \end{aligned}$$

(4.7)

Case II $p\in I(k_{i}+1,k_{i+1}-2)$.

By A1 and the trust-region constraint, we have

$$\begin{aligned} b_{p}= & {} b_{p-1}+\dfrac{\Vert \nabla f(x_{p})\Vert ^{2}}{b_{p-1}}\nonumber \\\le & {} b_{p-1}+\dfrac{\left( \Vert \nabla f(x_{p-1})-\nabla f(x_{p})\Vert +\Vert \nabla f(x_{p-1})\Vert \right) ^{2}}{b_{p-1}}\nonumber \\\le & {} b_{p-1}+2\dfrac{\Vert \nabla f(x_{p-1})-\nabla f(x_{p})\Vert ^{2}+\Vert \nabla f(x_{p-1})\Vert ^{2}}{b_{p-1}}\nonumber \\\le & {} b_{p-1}+2\dfrac{L^{2}\Vert x_{p-1}-x_{p}\Vert ^{2}+\Vert \nabla f(x_{p-1})\Vert ^{2}}{b_{p-1}}\nonumber \\\le & {} b_{p-1}+\dfrac{2L^{2}\Vert \nabla f(x_{p-1})\Vert ^{2}}{b_{p-1}^{3}}+\dfrac{2\Vert \nabla f(x_{p-1})\Vert ^{2}}{b_{p-1}}. \end{aligned}$$

(4.8)

Moreover, given $j\in I(k_{i},p-1)$, we also have

$$\begin{aligned} f(x_{j+1})\le & {} f(x_{j})+\nabla f(x_{j})^{T}(x_{j+1}-x_{j})+\dfrac{L}{2}\Vert x_{j+1}-x_{j}\Vert ^{2}\nonumber \\\le & {} f(x_{j})+\Vert \nabla f(x_{j})\Vert \Vert d_{j}\Vert +\dfrac{L}{2}\Vert d_{j}\Vert ^{2}\nonumber \\\le & {} f(x_{j})+\dfrac{\Vert \nabla f(x_{j})\Vert ^{2}}{b_{j}}+\dfrac{L}{2b_{j}}\dfrac{\Vert \nabla f(x_{j})\Vert ^{2}}{b_{j}}\nonumber \\= & {} f(x_{j})+\left( 1 + \dfrac{L}{2b_j} \right) \dfrac{\Vert \nabla f(x_{j})\Vert ^{2}}{b_{j}}\nonumber \\\le & {} f(x_{j})+\left( 1 + \dfrac{L}{2b_{\min }} \right) \dfrac{\Vert \nabla f(x_{j})\Vert ^{2}}{b_{j}}. \end{aligned}$$

(4.9)

Case II(a) $p=k_{i}+1$.

In this case, by (4.8) we have

$$\begin{aligned} b_{p}\le b_{k_{i}}+\dfrac{2L^{2}\Vert \nabla f(x_{k_{i}})\Vert ^{2}}{b_{k_{i}}^{3}}+\dfrac{2\Vert \nabla f(x_{k_{i}})\Vert ^{2}}{b_{k_{i}}}. \end{aligned}$$

(4.10)

On the other hand, by (4.9) with $j=k_{i}$, we get

$$\begin{aligned} f(x_{p})\le f(x_{k_{i}})+\left( 1+\dfrac{L}{2b_{\min }}\right) \dfrac{\Vert \nabla f(x_{k_{i}})\Vert ^{2}}{b_{k_{i}}}. \end{aligned}$$

(4.11)

Thus, combining (4.6), (4.10) and (4.11), it follows that

$$\begin{aligned} b_{k}\le & {} b_{k_{i}}+\dfrac{2L^{2}\Vert \nabla f(x_{k_{i}})\Vert ^{2}}{b_{k_{i}}^{3}}+\dfrac{2\Vert \nabla f(x_{k_{i}})\Vert ^{2}}{b_{k_{i}}}\nonumber \\&+16\left[ \left( 1+\dfrac{L}{2b_{\min }}\right) \dfrac{\Vert \nabla f(x_{k_{i}})\Vert ^{2}}{b_{k_{i}}}+f(x_{k_{i}})-f_{low}\right] \nonumber \\= & {} b_{k_{i}}+\left( \dfrac{4L^{2}+36b_{\min }^{2}+16b_{min}L}{2b_{\min }^{3}}\right) \Vert \nabla f(x_{k_{i}})\Vert ^{2}+16\left( f(x_{k_{i}})-f_{low}\right) \end{aligned}$$

(4.12)

for all $k\in I(p,k_{i+1}-1)$.

Case II(b) $p\in I(k_{i}+2,k_{i+1}-2)$.

In this case, $b_{p-1}\ge b_{p-2}$ and so, by (4.8), we get

$$\begin{aligned} b_{p}\le & {} b_{p-1}+\dfrac{2L^{2}\Vert \nabla f(x_{p-1})\Vert ^{2}}{b_{p-1}^{2}b_{p-2}}+\dfrac{2\Vert \nabla f(x_{p-1})\Vert ^{2}}{b_{p-2}}\\= & {} b_{p-1}+\dfrac{2L^{2}(b_{p-1}-b_{p-2})}{b_{p-1}^{2}}+\dfrac{2\Vert \nabla f(x_{p-1})\Vert ^{2}}{b_{p-2}}\\< & {} b_{p-1}+\dfrac{2L^{2}}{b_{p-1}}+2(b_{p-1}-b_{p-2})\\< & {} 3b_{p-1}+\dfrac{2L^{2}}{b_{p-1}}. \end{aligned}$$

Since $b_{\min }\le b_{p-1}<{\tilde{L}}$, it follows that

$$\begin{aligned} b_{p}\le 3{\tilde{L}}+\dfrac{2L^{2}}{b_{\min }}. \end{aligned}$$

(4.13)

On the other hand, by (4.9) we have

$$\begin{aligned} f(x_{p})-f(x_{k_{i}})= & {} \sum _{j=k_{i}}^{p-1}f(x_{j+1})-f(x_{j})\\\le & {} \left( 1+\dfrac{L}{2b_{\min }}\right) \sum _{j=k_{i}}^{p-1}\dfrac{\Vert \nabla f(x_{j})\Vert ^{2}}{b_{j}}\\= & {} \left( 1+\dfrac{L}{2b_{\min }}\right) \dfrac{\Vert \nabla f(x_{k_{i}})\Vert ^{2}}{b_{k_{i}}}+\left( 1+\dfrac{L}{2b_{\min }}\right) \sum _{j=k_{i}+1}^{p-1}\dfrac{\Vert \nabla f(x_{j})\Vert ^{2}}{b_{j}}\\\le & {} \left( 1+\dfrac{L}{2b_{\min }}\right) \dfrac{\Vert \nabla f(x_{k_{i}})\Vert ^{2}}{b_{k_{i}}}+\left( 1+\dfrac{L}{2b_{\min }}\right) \sum _{j=k_{i}+1}^{p-1}\dfrac{\Vert \nabla f(x_{j})\Vert ^{2}}{b_{j-1}}\\= & {} \left( 1+\dfrac{L}{2b_{\min }}\right) \dfrac{\Vert \nabla f(x_{k_{i}})\Vert ^{2}}{b_{k_{i}}}+\left( 1+\dfrac{L}{2b_{\min }}\right) \sum _{j=k_{i}+1}^{p-1}(b_{j}-b_{j-1})\\= & {} \left( 1+\dfrac{L}{2b_{\min }}\right) \dfrac{\Vert \nabla f(x_{k_{i}})\Vert ^{2}}{b_{k_{i}}}+\left( 1+\dfrac{L}{2b_{\min }}\right) (b_{p-1}-b_{k_{i}})\\< & {} \left( 1+\dfrac{L}{2b_{\min }}\right) \dfrac{\Vert \nabla f(x_{k_{i}})\Vert ^{2}}{b_{k_{i}}}+\left( 1+\dfrac{L}{2b_{\min }}\right) b_{p-1}\\< & {} \left( 1+\dfrac{L}{2b_{\min }}\right) \dfrac{\Vert \nabla f(x_{k_{i}})\Vert ^{2}}{b_{k_{i}}}+\left( 1+\dfrac{L}{2b_{\min }}\right) {\tilde{L}}. \end{aligned}$$

and so

$$\begin{aligned} f(x_{p})\le f(x_{k_{i}})+\left( 1+\dfrac{L}{2b_{\min }}\right) \dfrac{\Vert \nabla f(x_{k_{i}})\Vert ^{2}}{b_{\min }}+\left( 1+\dfrac{L}{2b_{\min }}\right) {\tilde{L}}. \end{aligned}$$

(4.14)

Thus, combining (4.6), (4.13) and (4.14), it follows that

$$\begin{aligned} b_{k}\le & {} 3{\tilde{L}}+\dfrac{2L^{2}}{b_{\min }}+16\left[ \left( 1+\dfrac{L}{2b_{\min }}\right) \dfrac{\Vert \nabla f(x_{k_{i}})\Vert ^{2}}{b_{\min }}+\left( 1+\dfrac{L}{2b_{\min }}\right) {\tilde{L}} + f(x_{k_{i}})-f_{low}\right] \nonumber \\\le & {} 19{\tilde{L}}+\dfrac{10{\tilde{L}}^{2}}{b_{\min }}+16\left( 1+\dfrac{L}{2b_{\min }}\right) \dfrac{\Vert \nabla f(x_{k_{i}})\Vert ^{2}}{b_{\min }}+16(f(x_{k_{i}})-f_{low}), \end{aligned}$$

(4.15)

for all $k\in I(p,k_{i+1}-1)$.

Summarizing all cases and subcases above, it follows from (4.7), (4.12), and (4.15) that

$$\begin{aligned} b_{k}\le b_{k_{i}}+19{\tilde{L}}+\dfrac{10{\tilde{L}}^{2}}{b_{\min }}+\left( \dfrac{4L^{2}+36b_{\min }^{2}+16b_{\min }L}{2b_{\min }^{3}}\right) \Vert \nabla f(x_{k_{i}})\Vert ^{2}+16(f(x_{k_{i}})-f_{low}) \end{aligned}$$

(4.16)

for all $k\in I(p,k_{i+1}-1)$, regardless of the position of p in the set $I(k_{i},k_{i+1}-2)$. Finally, by (2.4) and Lemma 3,

$$\begin{aligned} f(x_{j})-f(x_{j+1})\ge \dfrac{\Vert \nabla f(x_{j})\Vert ^{2}}{4b_{j}}\quad \text {for}\,\,j=p,\ldots ,k_{i+1}-1. \end{aligned}$$

Summing up these inequalities, it follows from A2, (2.11) and (4.16) that

$$\begin{aligned} f(x_{p})-f_{low}\ge & {} f(x_{p})-f(x_{k_{i+1}})\nonumber \\\ge & {} \sum _{j=p}^{k_{i+1}-1}\dfrac{\Vert \nabla f(x_{j})\Vert ^{2}}{4b_{j}}\nonumber \\\ge & {} \dfrac{|I(p,k_{i+1}-1)|\epsilon ^{2}}{4\left[ b_{k_{i}}+19{\tilde{L}}+\dfrac{10{\tilde{L}}^{2}}{b_{\min }}+\left( \dfrac{4L^{2}+36b_{\min }^{2}+16b_{\min }L}{2b_{\min }^{3}}\right) \Vert \nabla f(x_{k_{i}})\Vert ^{2}+16(f(x_{k_{i}})-f_{low})\right] }\nonumber \\&\end{aligned}$$

(4.17)

By (4.11) and (4.14), we also have

$$\begin{aligned} f(x_{p})-f_{low}\le & {} \left( 1+\dfrac{L}{2b_{\min }}\right) {\tilde{L}}+\left( 1+\dfrac{L}{2b_{\min }}\right) \dfrac{\Vert \nabla f(x_{k_{i}})\Vert ^{2}}{b_{\min }}+(f(x_{k_{i}})-f_{low})\nonumber \\\le & {} {\tilde{L}}+\dfrac{{\tilde{L}}^{2}}{2b_{\min }}+\left( 1+\dfrac{L}{2b_{\min }}\right) \dfrac{\Vert \nabla f(x_{k_{i}})\Vert ^{2}}{b_{\min }}+(f(x_{k_{i}})-f_{low})\nonumber \\\le & {} b_{k_{i}}+19{\tilde{L}}+\dfrac{10{\tilde{L}}^{2}}{b_{\min }}+\left( \dfrac{4L^{2}+36b_{\min }^{2}+16b_{\min }L}{2b_{\min }^{3}}\right) \Vert \nabla f(x_{k_{i}})\Vert ^{2}+16(f(x_{k_{i}})-f_{low}). \end{aligned}$$

(4.18)

Thus, combining (4.17), (4.18) and using $\Vert \nabla f(x_{k_{i}})\Vert \le \Vert \nabla f(x_{0})\Vert$ (by Lemma 1), we get

$$\begin{aligned} |I(p,k_{i+1}-1)|\le 4\left[ b_{k_{i}}+19{\tilde{L}}+\dfrac{10{\tilde{L}}^{2}}{b_{\min }}+\left( \dfrac{4L^{2}+36b_{\min }^{2}+16b_{\min }L}{2b_{\min }^{3}}\right) \Vert \nabla f(x_{k_{i}})\Vert ^{2}+16(f(x_{k_{i}})-f_{low})\right] ^{2}\epsilon ^{-2}. \end{aligned}$$

$\square$

Rights and permissions

Reprints and permissions

About this article

Cite this article

Grapiglia, G.N., Stella, G.F.D. An adaptive trust-region method without function evaluations. Comput Optim Appl 82, 31–60 (2022). https://doi.org/10.1007/s10589-022-00356-0

Download citation

Received: 20 July 2021
Accepted: 03 February 2022
Published: 02 March 2022
Issue Date: May 2022
DOI: https://doi.org/10.1007/s10589-022-00356-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An adaptive trust-region method without function evaluations

Abstract

Access this article

Similar content being viewed by others

The Frank-Wolfe Algorithm: A Short Introduction

Random Gradient-Free Minimization of Convex Functions

On convergence of iterative thresholding algorithms to approximate sparse solution for composite nonconvex optimization

Data availability

Notes

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendix

1.1 Proof of Lemma 4

1.2 Proof of Lemma 5

Rights and permissions

About this article

Cite this article

Keywords

Navigation

An adaptive trust-region method without function evaluations

Abstract

Access this article

Similar content being viewed by others

The Frank-Wolfe Algorithm: A Short Introduction

Random Gradient-Free Minimization of Convex Functions

On convergence of iterative thresholding algorithms to approximate sparse solution for composite nonconvex optimization

Data availability

Notes

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendix

Appendix

1.1 Proof of Lemma 4

1.2 Proof of Lemma 5

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation