Non-recursive equivalent of the conjugate gradient method without the need to restart

A simple alternative to the conjugate gradient(CG) method is presented; this method is developed as a special case of the more general iterated Ritz method (IRM) for solving a system of linear equations. This novel algorithm is not based on conjugacy, i.e. it is not necessary to maintain overall orthogonalities between various vectors from distant steps. This method is more stable than CG, and restarting techniques are not required. As in CG, only one matrix-vector multiplication is required per step with appropriate transformations. The algorithm is easily explained by energy considerations without appealing to the A-orthogonality in n-dimensional space. Finally, relaxation factor and preconditioning-like techniques can be adopted easily.


Introduction
Let be a real linear system with a symmetric positive definite (SPD) matrix of order n. By IRM, the solution is sought through successive minimisation of the corresponding energy functional, or the quadratic form inside a small subspace formed at each iteration step [1]. After the convergence criterion is reached, a solution is found that is close to the unique minimiser of f (x). Geometrically, this is the point close to the centre of the hyper-ellipsoids f (x) = c, where c are arbitrary real constants.

Briefly about IRM
The main idea here is to present the solution increment by the discretised Ritz method: where Φ (i) = [ φ 1,(i) φ 2,(i) . . . φ m,(i) ] is a matrix of linearly independent coordinate vectors, and a (i) is the vector of corresponding coefficients. The energy decrement associated with (3) can be also expressed as the quadratic function ∆f (a (i) ) = 1 2 a T (i) A (i) a (i) − a T (i) r (i) (4) where A (i) = Φ T (i) AΦ (i) and r (i) = Φ T (i) r (i) are the SPD generalised (Ritz) matrix and the generalised residual vector, respectively, and both terms are of order m. After minimising (4), we obtain the system of equations that should be solved at each step: The solution is used to find the increment in (3), and x (i+1) = x (i) + p (i) is updated afterwards. The residual is defined in a standard manner as Obviously, IRM represents an iterative procedure, where a discrete Ritz method is applied at each step, and a suitable set of coordinate vectors which span a subspace are generated. A local energy minimum is sought within that subspace (therefore (5) should be solved at each step), thereby decreasing the total energy of the system, which eventually converges to the required minimum. The subspace dimension, or the size of (5), is not limited. Rather, it aims to be small-much smaller than the number of unknowns (m n), because every iteration must be as fast as possible. Such a small system (though A (i) is usually full) can be solved by any direct method. Simple pseudocode, with input data and sequence of instructions common for the iterative solution methods, is given by the Algorithm 1.

14: end while {end Iterated Ritz method}
The central problem involves quickly generating a small and efficient subspace, such that the energy reduction per step is as large as possible and the number of steps is extremely reduced. Usually, one coordinate vector is p (i) , and others are generated as P j r (i+1) , where P j is fast approximation of A −1 . Non-residual based generation ideas are also possible [2]. Because many strategies to construct P j (or generally φ j,(i) ) exist, the algorithm also allows for any new routine that generate subspaces (e.g. those suggested by other independent researchers) to be easily implemented in (line 8, Alg. 1). Potentially, this may make the method even faster.
It should be noted that the conjugacy property is not explicitly taken into account in IRM, and coordinate vectors may become (almost) linearly dependent. Therefore, routines for subspace generation which prevent such a scenario are preferred and may even change between steps. Nevertheless, if this dependence arises, some pivots approach zero during the decomposition of A (i) , which can be recognised and used for discarding corresponding equations from the small system. The subspace dimension is reduced in such cases, but A (i) becomes more regular and better conditioned. This strategy is faster than orthogonalisation, rejection, or replacement of dependent vectors [3].
IRM can also be considered as a generalisation of some iterative methods [1,2]. Depending on the choice of coordinate vectors, some solvers can be represented or interpreted as special cases of this approach. Furthermore, it is possible to combine good properties of several methods simultaneously. If appropriate vectors are selected, convergence should proceed faster than using any single method considered. Here, an improved CG algorithm (IRM-CG) is presented due to the popularity of SPD systems.

IRM-CG as a special case of IRM
The algorithm presented here also starts with the steepest descent (SD) step. Other steps are executed using a CG-like algorithm simulated by IRM with two coordinate vectors. The first vector is the current residual r (i+1) , and the second vector is previous solution increment p (i) . Vectors span a two-dimensional subspace. At each step, a system of two equations is solved and a new energy minimum within that plane is found (Algorithm 2). This approach has three matrix-vector multiplications per step: one in line 7 and two in line 8. Applying two 'induced' recursive relations ('inherent' A-orthogonalisation is not exploited here), only one such multiplication remains (as in CG). If line 11 (Alg. 2) is multiplied by A, then Substituting α (i) = Ar (i+1) and β (i) = Ap (i) into the first recursion yields Second, the frequently used residual recursion Now, after the line 4 (Alg. 2), the new initialisation should be inserted, and the pseudocode inside the while loop becomes: 8: : 10:

12:
i ← i + 1 13: end while {end IRM-CG method} lines 9 -11 remain unchanged Due to round-off errors, as in CG, the residual is periodically (after i max steps) updated from the equilibrium equation, i.e. the following should be used instead of the second line from (10):

Equivalence between CG and IRM-CG
The proof of equivalence between CG and IRM-CG is very simple, so it will be discussed only briefly. Initialisation is practically identical for both methods. In other steps, the minimum of the energy function inside the plane spanned by r (i+1) and p (i) is determined. In CG this is realised by A-orthogonalisation and by solving a linear system of two equations in IRM-CG. If exact arithmetic is considered, IRM-CG and CG have an identical sequence of intermediate results. The exact solution is obtained after m steps, where m is the number of different 'active' eigenvalues. If b is represented as a sum of eigenvectors ϕ j , i.e. b = a j ϕ j , eigenvectors (and their corresponding eigenvalues) with a j = 0 may be called 'active' (or 'inactive' otherwise). Of course, m can be found only if all n eigenpairs are detected. Multiple eigenvalues should be counted as one, and 'inactive' eigenvalues are not counted at all. This comment is only important for theoretical considerations, as IRM-CG is interesting as an iterative, not direct solution method.

Advantages of IRM-CG over CG
During real calculations (with round-off errors), IRM-CG is more stable and behaves better than CG. First, restarting of this algorithm is not needed because A-orthogonality is not exploited. Namely, error in A-orthogonality also exists if the IRM-CG formulation is used, but it is not accumulated during calculation. Therefore, inherited errors from the A-orthogonality decrease, although non-exact arithmetic (as in every numerical process) causes new errors and affects convergence.
Consider simple example with diagonal A (a 1,1 = 1 and a 2,2 = κ, where κ is a condition number of A). If b = [ 1 1 ] T , using rational arithmetic exact solution x = [ 1 1/κ ] T is obtained in two steps by both methods. To check stability of the methods, after the initialisation phase (for i = 0) a small disturbance δ to the second term of p (1) is added (Fig. 1a). The second step of IRM-CG still gives exact result, but CG gives only s perturbed approximate solution as a function of δ and κ: Notice complexity of the CG solution, even with a diagonal matrix of order two. For better explanation of the expressions, over domain δ ∈ [−10 −2 , 10 −2 ] functions x − x = f (δ) for three values of κ, and x − x = f δ, κ for κ = 10 a (a ∈ [0, 4]) are given (Figs. 1b and c). Only if δ = 0 exact solution is recovered from (12). Also, functions are non-symmetric, so CG behaves differently for ±δ.
When large equation systems are considered, δ is accumulated primarily due to the loss of A-orthogonality, which is inherited recursively. The main reason is approximate CG solution at each step (in the current plane); in practice more complicated then (12). On the contrary, IRM-CG finds numerically 'exact' solution at each plane. Therefore system of two equations is repeatedly solved. Roughly, if δ is split into two components at each step, the one inside and the other orthogonal to the plane, the first component is 'exactly' resolved and does not produce inherited error. In CG, both components cause propagation of error. In other words, if δ lies on the plane, the behaviour of IRM-CG is as if δ = 0, but if it is orthogonal to the plane the solution is disturbed.
It is possible to interchange methods because the two approaches are equivalent. Each step may be executed by CG or IRM-CG, no matter how the earlier steps were performed. If CG is used as a solution method, it is suggested that one equivalent IRM-CG step be executed after some number of steps, but before orthogonality error becomes too large. This may be called "refresh" instead of traditionally "restart".
The second advantage of this formulation is the natural adoption of the relaxation factor ω ∈ (0, 2), known from the Successive overrelaxation method [4], and which may even change at each step. Line 6 (Alg. 2) is simply and the second term of r (i) is not zero (line 9, Alg. 2); it is rather ω (i) r T (i+1) p (i) . Also, recurrence relation in (11) becomes r (i+1) ← r (i) − ω (i) β (i) . Obviously, using ω = 1, A-orthogonality is lost, but convergence is improved in many cases [5]. The third advantage is the very natural adoption of (multi) preconditioning-like techniques [6]. In such cases, only line 8 (Alg. 2) is reformulated as The coordinate vectors are where M j is a matrix according to the standard approach [7], which is used to produce a better conditioned system M −1 j Ax = M −1 j b equivalent to (1). However, transformations of the CG algorithm required for such strategies are not needed here. According to IRM, that is just another way of generating coordinate vector(s). During the solution process, they can also become (exactly or approximately) linearly dependent, and one or several of them should be excluded.
Many possibilities to rapidly construct M −1 exist, such as (not always robust) incomplete Cholesky factorisations with different fill-ins [8], algebraic multigrid methods [9,10], and sparse approximate inverses [11]. It is even possible to use methods that are not useful as standalone solvers, as they are neither convergent nor numerically stable. For smoothing purposes, forward and backward techniques or any other promising order of unknowns may be useful.

Illustrative example
Consider a simple linear FEM benchmark: a cube discretised by 8-node solid elements, supported by the corner springs of stiffnesses k, and loaded with a vertical force at the top. This model has 3 993 unknowns (Fig. 2). The condition number is κ ≈ 8 · 10 4 , which is calculated as the ratio of extreme eigenvalues. CG and IRM-CG behave almost identically (curves practically collide) for such a well-conditioned system. If the spring stiffness A is greatly reduced to 10 −10 k, then κ ≈ 3 · 10 13 and IRM-CG behaves much better than CG. Processes are terminated after ε = 10 −10 is reached. Of course, CG may be improved by preconditioning and restarting techniques. However, IRM-CG may also be enhanced by using additional coordinate vectors, while restarting strategies are not needed at all, as previously mentioned.

Conclusion
Although general theorems and proofs about algorithm convergence rate and stability are not given here, according to the results of numerical experiments with exact and floating-point arithmetic, IRM-CG should be an interesting replacement for a standard or preconditioned CG. Recursive A-orthogonalisation, restarting recommendations, and transformations (necessary for preconditioning) are not required, hence the method should be very useful for solving non-well-posed problems. Finally, the property of conjugacy, which underlies many iterative procedures and is valid only for linear systems, is not absolutely necessary here. Therefore, IRM-CG can be successfully applied to nonlinear problems (including optimisation), where conjugacy is not even defined. This issue is vitally important, as iterative methods are used exclusively in these cases [12].