On Limited Memory Self-Scaling VM-Algorithms for Unconstrained Optimization

In this paper, we have investigated a new scaling parameter in the standard memoryless L-BFGS algorithm. This new consideration is compared with the standard L-BFGS method under the assumption of the L-BFGS method with m=3


Introduction
BFGS quasi-Newton methods have reliable and efficient for the unconstrained minimization of a smooth nonlinear function R R f n → : .However, the need to store an n x n approximate Hessian has limited their application to problems with a small to medium number of variables.
For large n it is necessary to use methods that do not require the storage of a full n x n matrix.Sparse quasi-Newton updates can be applied if the Hessian has a significant number of zero entries (see, e.g., Powell and Toint [13], Fletcher [5]. In nonlinearly constrained optimization, other methods must be used.Such methods include conjugate-gradient methods, limited-memory quasi-Newton methods, and limited-memory reduced-Hessian quasi-Newton methods (see Gill,et al [7]).

1.1: Variable Metric Methods:
We have seen that in order to obtain a super linearly convergent method.
How can we do this without actually evaluating the Hessian matrix at every iteration?The answer was discovered by Dixon (1959), and was subsequently developed and popularized by (Fletcher and Powell [6]).It consists of starting with any approximation to the Hessian matrix, and at each iteration, update this matrix by incorporating the curvature of the problem measured along the step.If this update is done appropriately, one obtains some remarkably robust and efficient methods, called variable metric methods.they revolutionized nonlinear optimization by providing an alternative to Newton's method, which is too costly for many applications.There are many variable metric methods, but since 1970, the BFGS method has been generally considered to be the most effective.It is implemented in all major subroutine libraries and is currently being used to solve optimization problems arising in a wide spectrum of applications.
The theory of variable metric methods is beautiful.The more we study them, the more remarkable they seem.We now have a fairly good understanding of their properties.Much of this knowledge has been obtained recently, and we will discuss it in this section.We will see that the BFGS method has interesting self-correcting properties, which account for its robustness.We will also discuss some open questions that have resisted an answer for many years.
The BFGS method is a line search method.At the th k − iteration, a symmetric and positive definite matrix k B is given, and a search direction is computed by The next iterate is Given by where the step size ( k λ ) satisfies the following Wolfe conditions: It has been found that it is best to implement BFGS with a very loose line search: typical values for parameters in (3), (4) are s.t: Note that the two correction matrices on the right hand side of (5) have rank one.Therefore by the interlocking eigen value theorem Wilkinson, (1965), the first rank-one correction matrix, which is subtracted, decreases the eigen values.We will say that it "shift the eigen values to the left" on the other hand, the second rank one matrix.Which is added, shifts the eigen values to the right.There must be a balance between these eigen values shifts, for otherwise the Hessian approximation could either approach singularity or become arbitrarily large, causing a failure of the method.
A global convergence result for the BFGS method can be obtained by careful consideration of these eigen value shifts.This done by Powell [12], who uses the trace and the determinant to measure the effect of the two rank-one corrections on k B .He is able to show that if f is convex, then for any positive definite starting matrix 1 B and any starting point 1 x , the BFGS method gives lim inf 0 = k g .If in addition the sequence { } k x converges to a solution point at which the Hessian matrix is positive definite, then the rate of convergence is superlinear.
This analysis has been extended by Byrd, Nocedal and Yuan [4], to the restricted Broyden class of quasi-Newton methods in which ( 5) is replaced by , and , and leaves that case unresolved.Indeed the following question has remained unanswered since 1976, when Powell published his study on the BFGS method, [11].

The Limited Memory BFGS Method.
Quasi-Newton methods are a class of numerical methods that are similar to Newton's method except that the inverse of Hessian )) ( ( is replaced by a n x n symmetric matrix k H , which satisfies the quasi-Newton equation: (see [8]). where λ is a step-length which satisfies some line search conditions.Assuming k H nonsingular, we define . It is easy to see that the quasi-Newton step.
Is a stationary point of the following problem: Which is an approximation to problem (12) and the quasi-Newton condition ( 8) is equivalent to x and 1 − k x , satisfying conditions ( 11)- (12).The matrix k B (or k H ) can be updated so that the quasi-Newton equation is satisfied.One well known update formula is the BFGS formula which updates , k s and k y in the following way: In Yuan [9], approximate function (13).This change was inspired from the fact that for onedimensional problem, using (15) gives a slightly faster local convergence if we assume 1 = k λ for all k .Equation ( 15) can be rewritten as In order to satisfy (16), the BFGS formula is modified as follows: chosen by an exact line search, which requires . For a uniformly convex function, it can be easily shown that there exists a constant for all k , and consequently global convergence proof of the BFGS method for convex functions with inexact line searches, which was given by Powell [12].However, for a general nonlinear function f , inexact line searches do not imply the positivity of k t , hence Yuan [15] By considering the Hermit interpolation on the line between 1 − k x and k x .Hence it is reasonable to require that the new approximate Hessian satisfies condition [ ] Instead of (18).Biggs [2], [3] gives the update of (19) with the value k t chosen so that (22) holds.The respected value of k t is given by For one-dimensional problems, Wang and Yuan [14] showed that (17) with (23) and without line searches (that is for all k ) implies Rquadratic convergence, and expect some special cases (17) with (23) also give Q-convergence.It is well known that the convergence rate of secant method is ( ) which is approximately 1.618 and less than 2.
The limited memory BFGS method is described by Nocedal [10], where it is called the SQN method.The user specifies the number m of BFGS corrections that are to be kept, and provides a sparse symmetric and positive definite matrix 0 H , which approximates the inverse Hessian of f .During the first m iterations the method is identical to the BFGS method.For H is obtained by applying m BFGS updates to 0 H using information from the m previous iterations.The method uses the inverse BFGS formula in the form (see [3]).

2-1: Non-Convex Functions
All the results for the BFGS method discussed so far depend on the assumption that the objective function f is convex.At present, few results are available for the case in which f is a more general nonlinear function.Even though the numerical experience of many years suggests that the BFGS method always converges to a solution point, this has not been proved.
Consider the BFGS method with a line search satisfying the Wolfe conditions (3), (4).Assume that f is twice continuously differentiable and bounded below.Do the iterates satisfy lim inf 0 = k g , for any starting point 1 x and any positive definite starting matrix 1 B ?This is one of the most fundamental questions in the theory of unconstrained optimization, for BFGS is perhaps the most commonly used method for solving nonlinear optimization problems.It is remarkable that the answer to this question has not yet been found.Nobody has been able to construct an example in which the BFGS method fails, and the most general result available to us, due to (Powell [12]).

2-2: Outlines of the limited memory BFGS algorithm step 1: Choose
x , and initial matrix times by using the pairs

Derivation of a new Scaling parameter
From section (2) above we have observed that taking

Computational Results:
In this section, we present and discus some numerical experiments that were conducted in order to test the performance of limited memory Quasi-Newton methods for unconstrained optimization using the standard BFGS formulae again using modified BFGS update.
The algorithms used for limited memory methods are form L-BFGS, which provides the line search strategy for calculating global step.The line search is based on backtracking, using quadratic and cubic modeling of in the direction of search.Ten test functions, with variable dimensions, have been chosen from literature of optimization.The description of these test problems can be found, for instance, in More et al. [9].Each function is tested with seven different dimensions, namely n=2,4,10,100,500,1000 and 10000, m=3.All test functions are tested with a single standard starting point.
All algorithms are implemented in FORTRAN.The runs were performed with a double precision arithmetic, for which the unit round off is approximately 10 -16 .In all cases, convergence is assumed if For the obtained numerical results, we have from tables (4.1)-(4.10)that taking NOI as the standard tool for comparison neglecting NOF because it depends up on NOI under the condition of using the cubic fitting technique as a linear search subprogram.The improvement percentage of the new method is between (13 -41)%.

Table (4.1) A (Comparison between standard L-BFGS method and modified L-BFGS using Dixon test function (2≤n≤10000)
Dixon BFGS is updated by: method, the first variable metric method proposed by Davidon, Fletcher and Powell.Byrd, Nocedal and Yuan prove global and superlinear convergence on convex problems, for all methods in the restricted Broyden class, except for DFP.Their approach breaks down when 1 = φ

1
truncated k t to the interval [0.01,100], and showed that the global convergence of the modified BFGS algorithm is preserved for convex functions.If the objective function f is cubic along the line segment between 1 − k x and k x then we have the following relation [ ]

3 - 1 :
yields the standard BFGS method, Now, taking the scalar as Al-Bayati[1] parameter, with our consideration that for the purpose of the storage of the matrix H , we considered that k k I H = , so we have obtained a new scalar parameter, quantity does not need the calculation of the matrix k H at every iteration.Suppose that f is differentiable and bounded below.Consider the BFGS method with a line search satisfying the Wolfe conditions (3), (4λ satisfies wolfe conditions (3),(4).step 5: Compute