Residual Symmetry for Linear Algebraic Equations and the Lorentz-Group Algorithm

Abstract In the iterative solution of n linear algebraic equations Bx = b by using the steepest descent method, i.e., xk+1 = xk − αkBrk, it is known that the steplength αk := rk Ark/∥Ark∥ causes a slow convergence, where r = Bx − b is the residual vector and A = BBT. In this paper we study the residual symmetry of the residual dynamics for a scaled residual vector y ∈ Sn−1 ∥r0∥, which as expressed in the augmented space is a nonlinear Lorentzian dynamical system, and is endowed with a cone structure in the Minkowski space with the Lorentz group S Oo(n, 1) being the internal symmetry group. Consequently, we can modify the steplength to αk = yk Ayk/∥Ayk∥ with yk being computed by a Lorentz group algorithm (LGA) based on S Oo(n, 1), which can significantly improve the convergence speed and enhance the stability. Several linear inverse problems are used to assess the numerical performance of the LGA.


Introduction
In this paper we study the internal symmetry of the residual dynamics, abbreviated as residual symmetry, which is defined in terms of the residual vector: for a system of linear algebraic equations (LAEs): where x ∈ R n is an unknown vector to be determined from a given coefficient matrix B ∈ R n×n and the input vector b ∈ R n .Equation ( 2) is in general an ill-posed system if it is derived from the numerical solution of linear inverse problems.
(ii) For k = 0, 1, 2 . .., we repeat the following computations: If ∥R k+1 ∥ < ε for a prescribed convergence criterion ε then stop; otherwise, go to step (ii).In the above, C = B T B, b 1 = B T b, R k = B T r k = B T r(x k ), 0 ≤ γ < 1 is a relaxed parameter, and is the steplength for the steepest descent method (SDM), where A = BB T .
To solve the ill-posed linear problems, there are several methods developed after the pioneering work of Tikhonov & Arsenin (1977).To account of the sensitivity to noise it is often used a regularization method to solve the ill-posed problem [Kunisch & Jou (1998); Wang & Xiao (2001); Xie & Jou (2002); Resmerita (2005)], where a suitable regularization parameter is used to depress the bias in the computed solution by a better balance of approximation error and propagated data error.Liu (2012b) has developed an optimally scaled vector regularization method, the dynamic Tikhonov regularization method [Liu (2013a)], a two-side conditioning method [Liu (2013b)], as well as the Krylov subspace double-optimization method [Liu (2013c); Liu (2014)] to solve the ill-posed linear problems.
There are many methods that converge significantly faster than the SDM, unlike that of the conjugate gradient method (CGM), they insist their search directions to be the gradient vector at each iteration [Barzilai & Borwein (1988); Friedlander et al. (1999); Raydan & Svaiter (2002); Dai & Yuan (2003); Dai et al. (2006)].The SDM performs poorly, yielding iteration counts that grow linearly with Cond(C) [Akaike (1959); Forsythe (1968); Nocedal et al. (2002)].Several modifications to the SDM have been made, and they have stimulated a new interest in the SDM because it is recognized that the gradient vector itself is not a bad choice of the solution direction, but rather that the steplength in Equation ( 5) originally used by the SDM is to blame for the slow convergence behavior.
An interesting problem is that there exists a certain symmetry of the iterative dynamics of the LAEs when we solve them by the iterative method.What is the underlying space of the residual dynamics and what sort Lie-group is acting on that space.In this paper we try to answer these problems and propose a theoretical foundation to modify the steplength in Equation ( 5) from the Lorentz-group symmetry which is acting on a future cone in the Minkowski space.
The remaining part of this paper is arranged as follows.In Section 2 we start from a future cone in the Minkowski space to derive a system of nonlinear ordinary differential equations (ODEs) for the numerical solution of Equation ( 2).Then, the residual dynamics on the future cone is constructed in Section 3, resulting to a Jordan dynamics and a nonlinear Lorentzian dynamical system, of which the residual symmetry is developed.Accordingly, the Lorentz-group algorithm (LGA) used to solve Equation ( 2) is developed in Section 4. The numerical examples of linear inverse problems are given in Section 5 to display some advantages of the LGA.Finally, the conclusions are drawn in Section 6.

Nonlinear ODEs for x
For Equation (1) we can introduce an invariant manifold: where we let x be a function of a time-like variable t, with the initial values of x(0) = x 0 and r 0 = r(x 0 ) being given, and Q(t) > 0, with Q(0) = 1, is a monotonically increasing function of t.In terms of Equation ( 6) represents a positive cone: in the Minkowski space M n+1 , which is endowed with an indefinite Minkowski metric tensor: where I n is the n × n identity matrix.Because the last component 1/ √ Q(t) of X is positive, the cone in Equation ( 8) is a future cone [Liu (2001)] as shown in Fig. 1.Vol. 7,No. 4; 2015 Figure 1.The construction of a future cone in the Minkowski space and the Lorentzian residual dynamics for solving the ill-posed linear system signifies a conceptual breakthrough.
The cone structure about the residual vector was first pointed out by Liu (2012c).However, the residual dynamics is not yet clearly developed in that paper.In Section 3 we will derive a nonlinear Lorentzian dynamical system for X.
When Q > 0, the manifold defined by Equation ( 6) is continuous and differentiable, and by the consistency condition we have where is the steepest descent vector.Equation ( 10) is obtained by taking the differential of Equation ( 6) with respect to t and considering x = x(t) and h(x, t) = 0 for all t.
We suppose that the evolution of x is driving by R as like the SDM: inserting which into Equation (10) we can derive a nonlinear ODEs system for x: where Hence, if Q(t) can be guaranteed to be a monotonically increasing function of t, we have an absolutely convergent algorithm to solve Equation (2): wherein we can observe that the path of X gradually moves down to the vertex point along the cone defined by Equation ( 8) as schematically shown in Fig. 1.However, it is still a great challenge by developing a suitable numerical integrator to solve the nonlinear ODEs in Equation ( 13), such that the orbit of x can really retain on the future cone in the Minkowski space.This important issue is addressed in the next two sections.

Discretizing and Keeping x on the Manifold
From Equation (13) we can observe that it is utmost important to study the residual dynamics of r in order to have a better understanding of the solution behavior of x.In order to keep x on the manifold (16) we can consider the evolution of r(t) := r(x(t)) along the path x(t), t ≥ 0 generated from Equation ( 13) by ṙ = Bẋ = −q(t) ∥r∥ 2 r T Ar Ar. (17) Now we discretize both Equations ( 13) and ( 17) into discrete dynamics by applying the forward Euler scheme: where is a time-varying steplength.
In order to keep x on the invariant manifold ( 16), we can take the square-norm of both sides of Equation ( 19) and use Equation ( 16) to obtain Thus, by dividing both sides by C/Q(t) and reminding Q(t) > 0 the following scalar equation can be derived: where Upon letting the solution of β in Equation ( 22) is found to be where 0 ≤ γ < 1 is a relaxed parameter, and 0 < β ≤ 1.

The Residual Dynamics on the Future Cone
In this section we are going to disclose some essential structures of the residual dynamics in terms of a scaled residual vector.

A Perpendicular Operator
be a scaled residual vector, and from Equation ( 16) we know that y ∈ S n−1 ∥r 0 ∥ with a radius ∥r 0 ∥, i.e., This equation is an invariant condition of the residual dynamical system.At the same time, by using Equation (26), Equations ( 18) and ( 23) become where we do not modify the steepest descent vector, keeping it to be R = B T r, but we modify ∥r∥ 2 /r T Ar to ∥y∥ 2 /y T Ay.Vol. 7, No. 4;2015 Now we derive the governing ODEs for y.From Equations ( 17) and ( 26) we can derive ẏ = q(t) which is the first form of the residual dynamics.If we define the following operator: we have ẏ = q(t)Dy, (32) where D satisfies the following properties: Indeed, D is a symmetric perpendicular operator, because it maps y to a new vector Dy which is perpendicular to y itself in view of the second equation in Equation ( 33).

The Jordan Dynamics
Here we give a new aspect of the residual dynamics (30) derived for Equation (2) from the Jordan structure [Iordanescu (2007)].There exist three kinds of Jordan structures: algebras, triple systems, and pairs.We will use the Jordan algebra as developed by Liu (2000) which with the use of triplet vectors was developed for a general purpose of dynamical system, including a conservative force and a dissipative force on the right-hand side.Since the creation of Jordan algebras by Pasqual Jordan, an improvement of the mathematical foundation of quantum mechanics was made.In the meantime, the Jordan structures have been intensively studied by many mathematicians, and a lot of important results have been obtained.The study of Jordan structures and their applications is at the present time a wide-spreading and interesting field of mathematical researches [Iordanescu (2009)].Liu (2000) has derived a system of ODEs based on the Jordan algebra: The triplet vectors w, z and u are functions of x and t.
In terms of the Jordan dynamics in Equation ( 34) we can rewrite Equation (30) as which is the second form of the residual dynamics.

The Lorentz Group S O o (n, 1)
Obviously, by letting Equation ( 35) can be rearranged to ẏ + ( q • y)y = ∥y∥ 2 q. (37) By using Equation ( 27) we can further write Defining the integrating factor Y 0 by and from Equation ( 38) we have d dt On the other hand, by taking the time differential of Equation ( 39) we have Equations ( 40) and ( 41) can be written together into a matrix form: where is an augmented vector, and the following system coefficient matrix: belongs to a Lie algebra of the proper orthochronous Lorentz-group S O o (n, 1), satisfying with g defined by Equation ( 9).In this sense, Equation ( 42) is a nonlinear Lorentzian dynamical system [Liu (2002)], which is the third form of the residual dynamics.

The Cone Condition
Now we prove that where X is defined by Equation ( 7).
From Equations ( 39), ( 36) and ( 15) it follows that where we have used Q(0) = 1.Then inserting it into Equation ( 43) and by Equations ( 26) and ( 7) we can obtain Inserting Y into the cone condition (8) we can prove that and thus It means that the cone condition in the Minkowski space M n+1 and the invariant condition of ∥y∥ = ∥r 0 ∥ in the Euclidean space E n are equivalent.The above cone condition is further shown in Fig. 1 by the augmented vector Y. Consequently, we have derived a genuine residual dynamics as expressed by Equations ( 42) and ( 48) on the future cone in the Minkowski space with Y ∈ M n+1 .

The Fixed Point
We can prove that the eigenvector y of the matrix A, i.e., is a fixed point of the dynamical system (35).Inserting the above equation into Equation ( 35) and using Equation ( 34) we can derive Thus when the orbit of y tends to the eigenvector, we have ẏ = 0, which means that the eigenvector is a critical point of the dynamical system (35).

The Lorentz-Group Algorithm Based on S O o (n, 1)
As that done by Liu (2001), we can develop the group-preserving scheme for the nonlinear Lorentzian system (42).The numerical scheme provides a medium to calculate the value of Y at time t = t k+1 when Y is already known at time t = t k .
The evolution of Y is governed by the dynamical law (42) with the coefficient matrix C given by Equation ( 44).Unluckily, due to the presence of q in the coefficient matrix C, which as shown in Equation ( 36) is a function of t and y = ∥r 0 ∥Y s /Y 0 , we need to approximate the solution of the dynamical law ( 42) by considering Y 0 and Y s to be constant in each single time step with a stepsize ∆t = t k+1 − t k .So the solution of Equation ( 42) within one-step is known to be where A numerical algorithm to solve Equation ( 42) is called a group-preserving scheme (GPS) if for every time increment the mapping G(k) from Y(k) to Y(k + 1) can preserve the following group properties [Liu (2001)]: where det is the shorthand of determinant, and G 0 0 is the 00-th mixed component of G.In Equation ( 54), G 0 0 = a(k) > 1.Based on the above results and Equations ( 28), ( 29) and ( 25) we can develop the Lorentz group algorithm (LGA) to solve Equation ( 2): (i) Give 0 ≤ γ < 1 and an initial guess of x 0 .(ii) Calculate r 0 = Bx 0 − b and y 0 = r 0 .(iii) For k = 1, 2 . .., we repeat the following computations: If x k converges according to a given stopping criterion ∥r k ∥ < ε 1 (or ∥R k ∥ < ε 2 ) then stop; otherwise, go to step (iii).From Liu (2011a), β k = q(t k )∆t is found to be where 0 ≤ γ < 1 is a relaxed parameter.
When the iterative sequence of y k approaches the eigenvector of A the steplength tends to Below we will use two numerical examples to demonstrate this phenomenon, which is quite different from that of the RSDM about the behavior of adapting steplength.

Numerical Examples
In order to assess the performance of the newly developed method of the Lorentz-group algorithm (LGA), let us investigate the following examples.Especially, we emphasize the numerical solutions of linear inverse problems.

Example 1
The following example is used to illustrate the fixed point behavior of the LGA: The exact solution is (x, y) = (1, 1).Denote the coefficient matrix in Equation ( 60) by B; hence, Cond(A) =Cond(BB T ) = 100 and the eigenvalues of A are λ 1 = 100, and λ 2 = 1.
In the computation of y k we have discretized it by an iterative sequence as shown in Equation ( 57), which is a discretized model of the LGA of the residual dynamics.Instead of we can also integrate it by using a time stepsize ∆t, which is a continuous model of the residual dynamics; by setting β k−1 = q k−1 ∆t in Equation ( 57) we can obtain the continuous model.
Then we compare the residual errors obtained by the LGA with γ = 0.05 and the continuous model with q = 1 and ∆t = 0.02 in Fig. 2(a).Under the convergence criterion ε 2 = 10 −10 , the LGA is convergence with 227 iterations, while the continuous model is convergence with 101 iterations.As shown in Fig. 2(b), starting from the same point (y 1 , y 2 ) = (−10, −2), the continuous model approaches to the fixed point (y 1 , y 2 ) = (0, −10), while the LGA does not exactly converge to the fixed point, and it oscillates near the fixed point.Because the LGA is obtained by a discretization to a purely iterative algorithm with β k given by Equation ( 58), it does not fully obey the dynamical law of a continuous time dynamical system.The above two methods both give very accurate solutions with the maximum errors being smaller than 10 −11 .There have two main drawbacks of the continuous model that one needs to specify the function q(t) and determine the stepsize ∆t.In the followings, we only use the discretized model of the LGA to solve linear problems.

Example 2
This example is used to illustrate that the presented LGA is better than the RSDM, where we give a simple example: The exact solution is (x, y) = (1, 1).It is interesting to note that the condition number is Cond(A) =Cond(BB T ) = 1.5957 × 10 11 , where B denotes the coefficient matrix in Equation ( 61).The eigenvalues are λ 1 = 80.0012, and λ 2 = 5.01352 × 10 −10 .
Then, we add a random noise with an intensity σ = 0.05 on the data of (4, 12.0001) T .By using the LGA with γ = 0 we obtain a solution of (x, y) = (0.99981, 0.99945) under a convergence criterion ε 2 = 10 −7 or with the maximum number 1000 of iterations.In Fig. 3 we display the iterative orbit of (y 1 , y 2 ) starting from (−4.01326, −11.9907) and ending at (−11.9958, 3.99819), which is an eigenvector of A corresponding to the eigenvalue λ 1 .Then we apply the RSDM with γ = 0.01 to solve this problem, whose solution is found to be (x, y) = (1.0007,0.9986).It can be seen that the LGA is more accurate than the RSDM with one order.In Fig. 4(a) we compare the residual errors of RSDM and LGA, while the steplengths are compared in Fig. 4(b).The two thick solid red lines composed of squares present the iterative steplengths of which the lower part is just the inverse of the first eigenvalue 1/λ 1 = 0.0124998.However, due to the matrix A being highly ill-conditioned, the calculated steplengths are jumping between two values of 1.53160 and 0.0124998, which are sensitive to the slight difference of the computed values of (y 1 , y 2 ).On the other hand, the steplengths of the RSDM as shown by the blue dashed line are quite regularly varying in a large range, which are drastically different from the ones of the LGA.The existence of a fixed point of the residual dynamics for the LGA is very important, which can enhance the long-term stability of the LGA.For example we increase the maximum number of iterations to 30000.The above RSDM leads to an incorrect solution of (x, y) = (2.0427,−0.0434); however, the LGA with γ = 0.05 gives a rather accurate solution of (x, y) = (0.9952, 1.004), with the accuracy being three orders.

Example 4
We solve the Cauchy problem of the Laplace equation under boundary conditions: where h(θ) and g(θ) are given functions, and ρ = ρ(θ) is a given contour to describe the boundary shape.The contour in the polar coordinates is specified by Γ = {(r, θ)|r = ρ(θ), 0 ≤ θ ≤ 2π}, which is the boundary of the problem domain Ω, and n denotes the normal direction.
In the potential theory, it is well known that the method of fundamental solutions (MFS) can be used to solve the Laplacian problems when a fundamental solution is known [Kupradze (1964)].In the MFS the trial solution of u at the field point z = (r cos θ, r sin θ) can be expressed as a linear combination of the fundamental solutions U(z, s j ): where n is the number of source points, c j are the unknown coefficients, s j are the source points, and Ω c is the complementary set of Ω.For the Laplace equation ( 70) we have the fundamental solutions: In the practical application of the MFS, frequently the source points are uniformly located on a circle with a radius R, such that after imposing the boundary conditions in Equation ( 70) on Equation ( 71) we can obtain a linear equations system (2), where in which n = 2m, and We fix n = 38 and employ a circle with a constant radius R = 15 to distribute the source points.Then we apply the LGA with γ = 0.02 to solve the linear system (2) under a convergence criterion ε 2 = 10 −3 , where a noise with an intensity σ = 10% is imposed on the given data.The residual error is plotted in Fig. 6(a), which is convergence with 4624 iterations.
Along the lower half contour ρ(θ) = √ 10 − 6 cos(2θ), π ≤ θ < 2π, in Fig. 6(b) we show the numerical error obtained by the LGA with that given by u = ρ 2 cos(2θ), π ≤ θ < 2π.For the purpose of comparison with the regularized MFS proposed by Wei et al. (2007), we also display its numerical error in Fig. 6(b), whose maximum error is found to be 0.39.We can observe that the result obtained by the LGA with the maximum error being 0.111 is better than that of the MFS.

Example 5
When we apply a central difference scheme to the following two-point boundary value problem: we can derive a linear equations system where ∆x = 1/(n + 1) is the spatial length, and u i = u(i∆x), i = 1, . . ., n, are unknown values of u(x) at the grid points x i = i∆x.u 0 = a and u n+1 = b are the given boundary conditions.
In this numerical test we fix n = 30.Let us consider the boundary value problem in Equation ( 75) with f (x) = sin πx.The exact solution is where we fix a = 1 and b = 2.
A relative random noise with intensity σ = 0.01 is added into the data on the right-hand side of Equation ( 76).Under a moderate convergence criterion with ε 2 = 10 −8 , we find that the RSDM with γ = 0.15 converges with 6111 iterations as shown in Fig. 7(a) by dashed line, and the maximum error as shown in Fig. 7(b) is 5.3 × 10 −5 .It can be seen that the presented LGA with γ = 0.04 converges with 4140 iterations and with the maximum error being 5.04 × 10 −5 , which is better than the algorithm RSDM.
Residual error

Example 6
Let us consider the following inverse problem to recover the external force F(t) for an ODE: In a time interval of t ∈ [0, t f ] the discretized data y i = y(t i ) are supposed to be measurable, which are subjected to the random noise with an intensity σ = 0.01.Usually, it is very difficult to recover the external force F(t i ) from Equation ( 78) by the direct differentials of the noisy data of the displacements, because the differential is an ill-posed linear operator.
To approach this inverse problem by the polynomial interpolation, we begin with Now, the coefficient c k is projected into two coefficients a k and b k to absorb more interpolation points; in the meanwhile, cos(kθ k ) and sin(kθ k ) are introduced to reduce the condition number of the coefficient matrix [Liu (2011c)].We suppose that and The considered problem domain is [a, b], and the interpolating points are: Substituting Equation (80) into Equation ( 79), we can obtain where we let c 0 = a 0 , and a k and b k are unknown coefficients.In order to obtain them, we impose the following n interpolated conditions: Thus, we obtain a linear equations system to determine a k and b k : ) m sin mθ m . . . . . . . . . . . . . . . . . .
We note that the norm of the first column of the above coefficient matrix is √ 2m + 1.According to the concept of equilibrated matrix [Liu (2012d)], we can derive the optimal scales for the current interpolation with a half-order technique as , k = 1, 2, . . ., m, (86) , k = 1, 2, . . ., m, (87) where β 0 is a stability factor.The improved method uses m order polynomial to interpolate n = 2m + 1 data nodes, while regular method with a full-order can only interpolate m + 1 data points.Now we fix m = 10 and t f = 5 and consider the exact solution of F(t) = cos t, which is obtained by inserting the exact y(t) = sin t into Equation ( 78).The parameters used are β 0 = 1.2, γ = 0.025 for the LGA, and β 0 = 2.2 and γ = 0.35 for the RSDM.Under the same convergence criterion ε 2 = 10 −5 , the LGA does not converge over 10000 iterations, while the RSDM is not convergent over 50000 iterations, as compared the residual errors in Fig. 8(a).We compare the numerical solutions and the exact solution in Fig. 8(b), where the maximum error for the RSDM is a large value 0.452, and that for the LGA is 0.067.Obviously, the LGA is much better than the RSDM.The result calculated by the LGA is also better than that calculated by Liu (2012a).

Figure 2 .
Figure 2.For example 1, (a) comparing the residual errors, and (b) comparing the iterative orbit of y, which tends to a fixed point for a continuous model, but oscillate near the fixed point for the discretized model.

Figure 3 .
Figure 3.For example 2 showing the iterative orbit of y, which tends to a fixed point.

Figure 5 .
Figure 5.For example 3, (a) comparing the residual errors of RSDM and LGA, and (b) showing the numerical errors.

Fig. 6 .
Fig. 6.For example 4, (a) the residual error of LGA, and (b) showing the Figure 6.(a) the residual error of LGA, and (b) showing the numerical errors of LGA and MFS.

Fig. 8 .Figure 9 .
Fig. 8.For example 6, (a) comparing the residual errors of RSDM and Figure 8.For example 6, (a) comparing the residual errors of RSDM and LGA, and (b) comparing the numerical results with exact one.