Skip to main content
Log in

A variation of Broyden class methods using Householder adaptive transforms

  • Published:
Computational Optimization and Applications Aims and scope Submit manuscript

Abstract

In this work we introduce and study novel Quasi Newton minimization methods based on a Hessian approximation Broyden Class-type updating scheme, where a suitable matrix \(\tilde{B}_k\) is updated instead of the current Hessian approximation \(B_k\). We identify conditions which imply the convergence of the algorithm and, if exact line search is chosen, its quadratic termination. By a remarkable connection between the projection operation and Krylov spaces, such conditions can be ensured using low complexity matrices \(\tilde{B}_k\) obtained projecting \(B_k\) onto algebras of matrices diagonalized by products of two or three Householder matrices adaptively chosen step by step. Experimental tests show that the introduction of the adaptive criterion, which theoretically guarantees the convergence, considerably improves the robustness of the minimization schemes when compared with a non-adaptive choice; moreover, they show that the proposed methods could be particularly suitable to solve large scale problems where L-BFGS is not able to deliver satisfactory performance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

References

  1. Al-Baali, M.: Analysis of a family of self-scaling quasi-Newton methods. Deptartment of Mathematics and Computer Science, United Arab Emirates University, Technical Report (1993)

  2. Al-Baali, M.: Global and superlinear convergence of a restricted class of self-scaling methods with inexact line searches, for convex functions. Comput. Optim. Appl. 9(2), 191–203 (1998)

    Article  MathSciNet  MATH  Google Scholar 

  3. Andrei, N.: A double-parameter scaling Broyden-Fletcher-Goldfarb-Shanno method based on minimizing the measure function of byrd and nocedal for unconstrained optimization. J. Optim. Theory Appl. 178(1), 191–218 (2018)

    Article  MathSciNet  MATH  Google Scholar 

  4. Bortoletti, A., Di Fiore, C., Fanelli, S., Zellini, P.: A new class of quasi-Newtonian methods for optimal learning in MLP-networks. IEEE Trans. Neural Netw. 14(2), 263–273 (2003)

    Article  Google Scholar 

  5. Bottou, L., Curtis, F.E., Nocedal, J.: Optimization methods for large-scale machine learning. SIAM Rev. 60(2), 223–311 (2018)

    Article  MathSciNet  MATH  Google Scholar 

  6. Byrd, R.H., Hansen, S.L., Nocedal, J., Singer, Y.: A stochastic quasi-Newton method for large-scale optimization. SIAM J. Optim. 26(2), 1008–1031 (2016)

    Article  MathSciNet  MATH  Google Scholar 

  7. Byrd, R.H., Nocedal, J.: A tool for the analysis of quasi-Newton methods with application to unconstrained minimization. SIAM J. Numer. Anal. 26(3), 727–739 (1989)

    Article  MathSciNet  MATH  Google Scholar 

  8. Byrd, R.H., Nocedal, J., Yuan, Y.X.: Global convergence of a class of Quasi-Newton methods on convex problems. SIAM J. Numer. Anal. 24(5), 1171–1190 (1987)

    Article  MathSciNet  MATH  Google Scholar 

  9. Cai, J.F., Chan, R.H., Di Fiore, C.: Minimization of a detail-preserving regularization functional for impulse noise removal. J. Math. Imaging Vis. 29(1), 79–91 (2007)

    Article  MathSciNet  Google Scholar 

  10. Caliciotti, A., Fasano, G., Roma, M.: Novel preconditioners based on quasi-Newton updates for nonlinear conjugate gradient methods. Optim. Lett. 11(4), 835–853 (2017)

    Article  MathSciNet  MATH  Google Scholar 

  11. Cipolla, S., Di Fiore, C., Tudisco, F.: Euler-Richardson method preconditioned by weakly stochastic matrix algebras: a potential contribution to Pagerank computation. Electron. J. Linear Algebra 32, 254–272 (2017)

    Article  MathSciNet  MATH  Google Scholar 

  12. Cipolla, S., Di Fiore, C., Tudisco, F., Zellini, P.: Adaptive matrix algebras in unconstrained minimization. Linear Algebra Appl. 471, 544–568 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  13. Cipolla, S., Di Fiore, C., Zellini, P.: Low complexity matrix projections preserving actions on vectors. Calcolo 56(2), 8 (2019)

    Article  MathSciNet  MATH  Google Scholar 

  14. Cipolla, S., Durastante, F.: Fractional PDE constrained optimization: an optimize-then-discretize approach with L-BFGS and approximate inverse preconditioning. Appl. Numer. Math. 123, 43–57 (2018)

    Article  MathSciNet  MATH  Google Scholar 

  15. Di Fiore, C.: Structured matrices in unconstrained minimization methods. In: Contemporary Mathematics, pp. 205–219 (2003)

  16. Di Fiore, C., Fanelli, S., Lepore, F., Zellini, P.: Matrix algebras in Quasi-Newton methods for unconstrained minimization. Numer. Math. 94(3), 479–500 (2003)

    Article  MathSciNet  MATH  Google Scholar 

  17. Di Fiore, C., Fanelli, S., Zellini, P.: Low-complexity minimization algorithms. Numer. Linear Algebra Appl. 12(8), 755–768 (2005)

    Article  MathSciNet  MATH  Google Scholar 

  18. Di Fiore, C., Fanelli, S., Zellini, P.: Low complexity secant quasi-Newton minimization algorithms for nonconvex functions. J. Comput. Appl. Math. 210(1–2), 167–174 (2007)

    Article  MathSciNet  MATH  Google Scholar 

  19. Di Fiore, C., Lepore, F., Zellini, P.: Hartley-type algebras in displacement and optimization strategies. Linear Algebra Appl. 366, 215–232 (2003)

    Article  MathSciNet  MATH  Google Scholar 

  20. Di Fiore, C., Zellini, P.: Matrix algebras in optimal preconditioning. Linear Algebra Appl. 335(1–3), 1–54 (2001)

    Article  MathSciNet  MATH  Google Scholar 

  21. Dolan, E.D., Moré, J.J.: Benchmarking optimization software with performance profiles. Math. Program. 91(2), 201–213 (2002)

    Article  MathSciNet  MATH  Google Scholar 

  22. Dunlavy, D.M., Kolda, T.G., Acar, E.: Poblano v1. 0: a matlab toolbox for gradient-based optimization. Sandia National Laboratories, Albuquerque, NM and Livermore, CA, Technical Report SAND2010-1422 (2010)

  23. Ebrahimi, A., Loghmani, G.: B-spline curve fitting by diagonal approximation BFGS methods. Iran. J. Sci. Technol. Trans. A Sci. 1–12

  24. Eldén, L.: Numerical linear algebra in data mining. Acta Numer. 15, 327–384 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  25. Horn, R.A., Johnson, C.R.: Matrix Analysis, 2nd edn. Cambridge University Press, Cambridge (2013)

    MATH  Google Scholar 

  26. Jiang, L., Byrd, R.H., Eskow, E., Schnabel, R.B.: A preconditioned L-BFGS algorithm with application to molecular energy minimization. Technical Report, Colorado University at Boulder Dept. of Computer Science (2004)

  27. Kolda, T.G., O’leary, D.P., Nazareth, L.: BFGS with update skipping and varying memory. SIAM J. Optim. 8(4), 1060–1083 (1998)

    Article  MathSciNet  MATH  Google Scholar 

  28. Lecun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)

    Article  Google Scholar 

  29. Li, D.H., Fukushima, M.: A modified BFGS method and its global convergence in nonconvex minimization. J. Comput. Appl. Math. 129(1), 15–35 (2001)

    Article  MathSciNet  MATH  Google Scholar 

  30. Liu, C., Vander Wiel, S.A.: Statistical Quasi-Newton: a new look at least change. SIAM J. Optim. 18(4), 1266–1285 (2007)

    Article  MathSciNet  MATH  Google Scholar 

  31. Liu, D.C., Nocedal, J.: On the limited memory BFGS method for large scale optimization. Math. Program. 45(1–3), 503–528 (1989)

    Article  MathSciNet  MATH  Google Scholar 

  32. Nazareth, L.: A relationship between the BFGS and conjugate gradient algorithms and its implications for new algorithms. SIAM J. Numer. Anal. 16(5), 794–800 (1979)

    Article  MathSciNet  MATH  Google Scholar 

  33. Nocedal, J., Wright, S.J.: Numerical Optimization, 2nd edn. Springer, Berlin (2006)

    MATH  Google Scholar 

  34. Oren, S.S., Luenberger, D.G.: Self-scaling variable metric (SSVM) algorithms: part i: criteria and sufficient conditions for scaling a class of algorithms. Manag. Sci. 20(5), 845–862 (1974)

    Article  MATH  Google Scholar 

  35. Powell, M.J.D.: Some global convergence properties of a variable metric algorithm for minimization without exact line searches. Nonlinear Program SIAM-AMS Proc. 9, 53–72 (1976)

    MathSciNet  MATH  Google Scholar 

  36. Saad, Y.: Analysis of some Krylov subspace approximations to the matrix exponential operator. SIAM J. Numer. Anal. 29(1), 209–228 (1992)

    Article  MathSciNet  MATH  Google Scholar 

  37. Saad, Y.: Numerical methods for large eigenvalue problems. SIAM (2011). https://doi.org/10.1137/1.9781611970739

    Article  MATH  Google Scholar 

Download references

Acknowledgements

The authors acknowledge anonymous referees for their thorough reading of the manuscript and the many suggestions they gave in order to improve its readability. Moreover, they acknowledge the Associated Editor for his/her valuable commentaries and for suggesting the introduction of the scaling factor as in Sect. 4. S.C. and C.D.F. are members of the INdAM Research group GNCS, which partially supported this work. C.D.F acknowledges the partial support of the Italian mathematics Research Institute INdAM-GNCS and of the MIUR Excellence Department Project awarded to the Dept of Mathematics, Univ. of Rome “Tor Vergata”, CUP E83C18000100006.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to S. Cipolla.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix 1: Householder matrices

The results contained in this section are borrowed from [13] and we refer the interested reader there for more details.

Definition 1

(Householder Orthogonal Matrix) Given a vector \({\mathbf {p}}\in \mathbb {R}^n\) define

$$\begin{aligned} {\mathcal {H}}({\mathbf {p}}):=I_n-\frac{2}{\Vert {\mathbf {p}}\Vert ^2}{\mathbf {p}}{\mathbf {p}}^T. \end{aligned}$$

Consider two vectors \({\mathbf {v}},\, {\mathbf {z}}\in \mathbb {R}^n\). From direct computation one can check that defining \({\mathbf {p}}= {\mathbf {v}}- \frac{\Vert {\mathbf {v}}\Vert }{\Vert {\mathbf {z}}\Vert }{\mathbf {z}}\) with \({\mathbf {z}}\ne 0,\) we have

$$\begin{aligned} {\mathcal {H}}({\mathbf {p}}){\mathbf {v}}=\frac{\Vert {\mathbf {v}}\Vert }{\Vert {\mathbf {z}}\Vert }{\mathbf {z}}. \end{aligned}$$

Lemma 5

([13]) Consider \(W=[{\mathbf {w}}_1|\dots |{\mathbf {w}}_s] \in \mathbb {R}^{n\times s}, V=[{\mathbf {v}}_1|\dots |{\mathbf {v}}_s] \in \mathbb {R}^{n\times s}\)of full rank and such that \(s \le n\), \(W^TW=V^TV\). Then there exist \(\,{\mathbf {h}}_1, \dots ,{\mathbf {h}}_{s} \in \mathbb {R}^n\), \(\Vert {\mathbf {h}}_i\Vert =\sqrt{2}\), such that the orthogonal matrix \(U=\mathcal {H}({\mathbf {h}}_s)\cdots \mathcal {H}({\mathbf {h}}_1)\), product of s Householder matrices, satisfies the following identities

$$\begin{aligned} U {\mathbf {w}}_{i}= {\mathbf {v}}_i \hbox { for all } i \in \{1, \dots , s\}. \end{aligned}$$

The vectors \({\mathbf {h}}_i\)for \(i\in \{1,\dots ,s\}\)can be obtained by setting:

$$\begin{aligned} \begin{aligned}&{ \tilde{{\mathbf {h}}}_i := \mathcal {H}({\mathbf {h}}_{i-1}) \cdots \mathcal {H}({\mathbf {h}}_1) ({\mathbf {w}}_{i}-{\mathbf {w}}_{i-1}) - ({\mathbf {v}}_{i}-{\mathbf {v}}_{i-1}),}\\&{\mathbf {h}}_i:= ( \sqrt{2}/\Vert \tilde{{\mathbf {h}}}_i\Vert )\tilde{{\mathbf {h}}}_i \end{aligned} \end{aligned}$$
(65)

(where we set \({\mathbf {h}}_0={\mathbf {w}}_0={\mathbf {v}}_0=\varvec{0}\)). If \(s=n\)we have \({\mathbf {h}}_n=\mathbf {0}\)or \({\mathbf {h}}_n=\frac{\sqrt{2}}{\Vert {\mathbf {v}}_n\Vert }{\mathbf {v}}_n.\)The cost of the computation of the \({\mathbf {h}}_i\)for \(i=1,\dots ,s\)is:

$$\begin{aligned} \begin{aligned}&[s(s-1)n+ s(2n+1)] \hbox { multiplications } \\&\quad +[(s(s+2) - 2)n + s(n-1)] \hbox { additions } \\&\quad + s \hbox { square roots.} \end{aligned} \end{aligned}$$

Observe that when \({\mathbf {w}}_i={\mathbf {e}}_{k_i}\)for \(i=1,\dots ,s\), that is when \({\mathbf {v}}_1, \dots , {\mathbf {v}}_s \)are orthonormal and we are interested to construct an orthogonal U with s columns fixed as \({\mathbf {v}}_1, \dots , {\mathbf {v}}_s \), it is possible to save \((s-1)n \hbox { mult.}\)and \( (3s-2)n \hbox { add..}\)

Proof

The explicit expression of the \({\mathbf {h}}_i\) in (65) is obtained by applying the techniques for their construction introduced in [13]. \(\square \)

Appendix 2: Details on Theorem 1

In order to prove inequality (34) it is enough to prove the following:

Lemma 6

There exists \(c_3\), constant with respect to j and depending only on s and M, such that

$$\begin{aligned} \gamma ((j+1-s)+1)^{n} \le c_3^{j+1-s} \hbox { for all } j \ge s, \hbox { where } \gamma :=\left( \frac{c_1}{n}\right) ^n\frac{1}{\det B_s} \end{aligned}$$

(of course, such \(c_3\)turns out to be greater than 1).

In fact, once Lemma 6 is proved, the constant \(c_2\) (constant with respect to j) for which (34) is verified, will be \(c_2=2c_1c_3/(1-\beta )\) (note that \(c_2\) depends only s, M, \(\beta \) but not on j).

Proof

Fix \(\tilde{c}_3>1\). Note that the sequence of positive numbers

$$\begin{aligned} \frac{\gamma ((j+1-s)+1)^n}{\tilde{c}_3^{j+1-s}} \hbox { for } j= s,s+1,\dots \end{aligned}$$

converges to zero as \(j \rightarrow +\infty \); thus there exists \(j^{*}\ge s\) (depending on s, M and \(\tilde{c}_3\)) s.t.

$$\begin{aligned} \gamma ((j+1-s)+1)^n \le {\tilde{c}_3^{j+1-s}} \hbox { for all } j \ge j^{*}. \end{aligned}$$

Note also that for all \(j \in \{s+1, \dots , j^{*}-1\}\) we have

$$\begin{aligned} \gamma ((j+1-s)+1)^n \le \gamma (j^{*}-s+1)^n \end{aligned}$$
(66)

and consider \(\hat{j} \ge j^{*}\) s.t. \(\gamma (\hat{j}-s+1)^n>1\) (\(\hat{j}\) depends on s, M, \(\gamma \) and \(\tilde{c}_3\)). From (66) we have

$$\begin{aligned} \gamma ((j+1-s)+1)^n \le \gamma (\hat{j}-s+1)^n \le (\gamma (\hat{j}-s+1)^n)^{j+1-s} \end{aligned}$$

for all \(j \in \{s,s+1, \dots j^{*}-1\}\).

Collecting the above results, we can conclude that

$$\begin{aligned} \gamma ((j+1-s)+1)^n \le c_3^{j+1-s} \hbox { for all } j \ge s \end{aligned}$$
(67)

where \(c_3:=\max \{ \tilde{c}_3, \gamma (\hat{j}-s+1)^n\}\) (\(c_3>1\) and depends on s, M and \(\tilde{c}_3\)).

Finally note that, once \(\tilde{c}_3\) is fixed, it is clear that \(c_3\) depends only on sM. \(\square \)

In order to prove inequality (\(34_{1}\)), define \(a_k:=(1-\phi -\psi _k\phi )\Vert {\mathbf {g}}_k\Vert ^2/{\mathbf {s}}_k^{T}(-{\mathbf {g}}_k)>0\). We know that \(\lim _{k \rightarrow +\infty }a_k=+\infty \) and we have to show that there exists \(j^{*}\ge s\) such that

$$\begin{aligned} \prod _{k=s}^{j} a_k > c_2^{j+1-s} \hbox { for all } j \ge j^{*}. \qquad \qquad \qquad \qquad \qquad (34_{1}\hbox {bis}) \end{aligned}$$

If \(a_k\ge c_2\) for all \(k \ge s\), since it must be \(a_k>c_2\) for infinite indexes k, then the thesis is obvious. So assume that there exists some index k such that \(a_k < c_2\). Let \(r \ge s\) be such that \(a_k>c_2\) for all \(k>r\). Note that \(c_2>\min _{k=s, \dots , r}a_k\). Set

$$\begin{aligned} t:=\left( \frac{c_2}{\min _{k=s, \dots , r}a_k} \right) ^{r+1-s} >1. \end{aligned}$$

Let \(j^{*}>r+1\) be such that \(a_k \ge t c_2\) for all \(k\ge j^{*}\). Then we have

$$\begin{aligned} \begin{aligned} \prod _{k=s}^{j^{*}}a_k&=\left( \prod _{k=s}^{r}a_k\right) \left( \prod _{k=r+1}^{j^{*}-1}a_k\right) a_{j^{*}} \\&>\left( \min _{k=s,\dots ,r}a_k\right) ^{r-s+1}c_2^{j^{*}-r-1}tc_2\\&=\left( \min _{k=s,\dots ,r}a_k\right) ^{r-s+1}\left( \frac{c_2}{\min _{k=s, \dots , r}a_k} \right) ^{r-s+1}c_2^{j^{*}-r}=c_2^{j^*-s+1}, \end{aligned} \end{aligned}$$

i.e., \(\prod _{k=s}^{j^{*}}a_k>c_2^{j^*-s+1}\). Thus we obtain (\(34_{1}\hbox {bis}\)) since \(a_k\ge tc_2>c_2\) for \(k> j^{*}\).

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Cipolla, S., Di Fiore, C. & Zellini, P. A variation of Broyden class methods using Householder adaptive transforms. Comput Optim Appl 77, 433–463 (2020). https://doi.org/10.1007/s10589-020-00209-8

Download citation

  • Received:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10589-020-00209-8

Keywords

Mathematics Subject Classification

Navigation