On iterative methods based on Sherman-Morrison-Woodbury splitting

We consider a new splitting based on the Sherman-Morrison-Woodbury formula, which is particularly effective with iterative methods for the numerical solution of large linear systems. These systems involve matrices that are perturbations of circulant or block circulant matrices, which commonly arise in the discretization of differential equations using finite element or finite difference methods. We prove the convergence of the new iteration without making any assumptions regarding the symmetry or diagonal-dominance of the matrix. To illustrate the efficacy of the new iteration we present various applications. These include extensions of the new iteration to block matrices that arise in certain saddle point problems as well as two-dimensional finite difference discretizations. The new method exhibits fast convergence in all of the test cases we used. It has minimal storage requirements, straightforward implementation and compatibility with nearly circulant matrices via the Fast Fourier Transform. For this reasons it can be a valuable tool for the solution of various finite element and finite difference discretizations of differential equations.


Introduction
Classical iterative methods for approximating solutions of linear systems are based on the splitting of matrices A = M − N and converge whenever the iteration matrix G = M −1 N has spectral radius ρ(G) < 1.The spectral radius ρ(G) also characterizes the speed of convergence of the iterative method, and the smaller it is, the faster the corresponding iterative method converges.Such splittings rely heavily on the invertibility of M. On the other hand, most commonly used methods for the numerical solution of differential equations such as finite difference and finite element methods lead to discretizations with large and sparse linear systems.When structured rectangular grids are used the coefficient matrix A of such linear systems is usually a perturbation of a circulant matrix M, i.e. we can write A = M − N, with N a sparse matrix.Such systems are usually solved with the help of direct methods for band matrices such as the banded LU or Cholesky decomposition and in the case of circulant matrices with the Sherman-Morrison-Woodbury algorithm [12] or with direct inversion of its diagonalization via the Fast Fourier Transform(FFT) [3].In addition to the previously mentioned classical discretizations, other H 1 -Galerkin/mixed Finite element discretizations lead to saddle point problems with block nearly circulant matrices [7,15].General saddle point problems can be very challenging and various methods have been proposed for their numerical approximation of their solutions [1,2,10,11].Among these methods we just mention the Uzawa-type iterative methods and other Krylov methods such as the conjugate gradient and GMRES method [2,17].
In this paper we study iterative methods based on a new splitting A = M − N. The analysis is not limited to perturbations of circulant or symmetric matrices: We consider general perturbations of a matrix M, which we call nearly M matrices.The definition of a nearly M matrix is based on the Sherman-Morrison-Woodbury formula, [6]: n is near a matrix M (we say nearly M matrix) if there is an invertible matrix M and a matrix N = UV T for appropriate U, V such that A = M − N, and with spectral radius ρ(V T M −1 U) < 1.
Note that if the spectral radius ρ(V T M −1 U) < 1, then det(I − V T M −1 U) = 0 and the Sherman-Morrison-Woodbury theorem ensures that A is invertible [6].An example of significant interest is the case where M is circulant as we mentioned before.In such a case we will call the matrix A nearly circulant matrix.We know that circulant matrices are invertible and its inversion can be performed as follows: Suppose that we want to solve the system Mx = b where M is circulant, then the diagonalization of M is M = F DF −1 where F is the so-called Fourier matrix and D a diagonal matrix with entries the discrete Fourier transform of the first row (or column) of M. Therefore, the solution of the linear system can be expressed as x = F (D −1 F −1 b), which has a trivial implementation via the FFT, [3].Some of the main advantages of the proposed iterative method are the following: • There is no requirement for symmetric, sparse or diagonaly dominant matrix • It can become faster using sparse matrix-vector multiplication or parallel computing • In the case of nearly circulant matrices it can be implemented seamlessly via FFT and it can be very fast.In Section 2 we study the new iteration and its convergence.Applications with various finite element and finite difference discretizations are presented in Section 3. The numerical experiments were performed using our implementations in Python programming language, and thus the times reported in this paper are indicative.Our implementations can be found in [14].

The new splitting
Let A be near M. Using standard notation of [19] for iterative methods, we consider the splitting A = M−N where N = UV T .For this particular splitting, we define the iterative method as usual (1) for x (0) a given initial guess of the solution.We will call this new method Sherman-Morrison-Woodbury (SMW) iterative method because of the particular splitting we used.Before proving the convergence of the new iterative scheme, and for the sake of completeness, we present a generalization of the so-called matrix determinant lemma of [4,5] for more general rank-r modifications: Proof.The proof follows from the identity The determinants of the first and last matrix on the right-hand side are equal to 1, while the determinant in the middle of this formula can be expressed as the desired product.
Using the previous matrix determinant lemma we can prove the convergence of the new iterative method: Theorem 2.1.If A is nearly M, then the SMW iteration converges.
Proof.In order to prove that the SMW iteration converges we will show that the spectral radius of the iteration matrix Thus, the eigenvalues of M −1 N coincide with the eigenvalues of V T M −1 U.By the Definition 1.1 these eigenvalues are |λ| < 1.Thus, the new iterative method converges.
Remark 2.1.One can define a matrix A to be nearly M as the perturbation A = M − uv T , where u, v ∈ R n with |v T M −1 u| < 1.In such a case the previous theorem holds true again and we can obtain the eigenvalue of the iteration matrix G to be λ = v T M −1 u.
In the previous analysis M does not need to be a circulant matrix but any matrix that can be inverted easily.On the other hand the inversion of a circulant matrix can be implemented trivially using the Fast Fourier Transform.Specifically, if M is circulant, then the iterative method (1) can be written as where M = F DF −1 .Recall that the diagonal matrix D is the FFT of the first row of M. The computational cost of the inversion of M is O(n log(n)) while the matrix-vector multiplications on the right-hand side of (3) can be performed using sparse matrix multiplication algorithms.The minimal storage requirements of circulant matrices and the almost linear complexity of the FFT for very large values of n, makes the iteration (3) very useful in demanding situations.We can improve the speed of the iteration (3) by introducing an extrapolated parameter ω ∈ C \ {0} and writing the extrapolated SMW (eSMW) iteration in the form ( 4) It is known that the extrapolated iteration converges for max λi∈σ(G) |1+ω(λ i −1)| < 1, [9,8].The estimation of such parameter ω requires the knowledge of the eigenvalues λ i of G = M −1 N. To simplify the calculations we will search for a real optimal parameter ω opt that maximizes the speed of convergence, and also we assume that the eigenvalues λ i are real with λ min and λ max the minimum and maximum eigenvalues respectively.Since max i |λ i | < 1, the convergence of the extrapolated iteration requires that the eigenvalues 1 + ω(λ i − 1) of the iteration matrix Following [16] we compute the optimal value ω opt to be ( 5) This value is the optimal real value ω that satisfies the minimization problem min Recognizing the difficulty of finding the optimal relaxation parameter, in most of the experiments we estimated the parameter using experimentation, while in some case we estimated the eigenvalues numerically, and then computed the optimal values ω opt .

Applications
Although we let the definition of a nearly M matrices to be very generic, we focus here on linear systems that occur in finite element and finite difference discretizations of differential equations with nearly circulant matrices.In all cases we can write N = U T V and we can verify numerically that the assumptions on matrices U and V are satisfied.In some case we present an indicative comparison of the performance of the new iteration with the classical Gauss-Seidel and GMRES methods, [17].In all the experiments we considered as initial guess of the solution the zero vector.
3.1.Applications to finite element methods.First, we demonstrate the applicability of the previous iterations by solving the classical boundary value problem −u ′′ (x) + u(x) = f (x) for x ∈ (0, 1) with u(0) = u(1) = 0 , using finite element methods with spline elements.In particular we consider a uniform partition 0 = ) for i = 0, 1, . . ., n − 1} , where P r denotes the space of polynomials of degree r.Using the standard basis splines φ i of continuous piecewise linear elements [13], the discrete problem is based on the computation of the Expressing u h = i c i φ i (x) and taking φ = φ j , the discrete problem becomes equivalent to a banded linear system with matrix where K is the stiffness matrix and M the mass matrix.For the sake of simplicity here we consider h = 1 for any value of n, and we take appropriate right-hand side b that leads to the exact solution x = (1, 1, . . ., 1) T .
Specifically, we considered the circulant matrix defined by the vector M = circulant(−5/6, 8/3, −5/6) , and N = M − A, and we solve the corresponding linear system using the iterations (3) and (4).Table 1 presents a comparison between the classical iterative method Gauss-Seidel, the GMRES and the new one.We observe that the new method requires the least iterations to converge while it is very fast.Note that we report only the time used for the execution of the main loop of the method.Our implementations took into account the sparsity of the matrices A, M and N and we used sparse matrix vector multiplications [13].It is easy to check (numerically) that the condition of the Definition 1.1 is satisfied by the matrices M and N that is derived using the n×n matrices U and and all the other entries are zero.
In a similar manner we study a more demanding situation.We consider the finite element space of smooth cubic splines ] for i = 0, 1, . . ., n − 1} , and we compute the corresponding mass matrix for the computation of the L 2 -projection The particular mass matrix is hepta-diagonal , and is not a diagonally dominant matrix.Again, for simplicity we take h = 1 for all values of n, and we choose the right hand side appropriately so as the exact solution is again the vector x = (1, 1, . . ., 1) T .For the new method we consider the circulant matrix M which is defined as and N = M − A. The results of the solution of this linear system using the new SMW splittings are summarized in Table 2.In this case, the Jacobi method did not converge at all.Luckily, the Gauss-Seidel method converged to the exact solution.In this experiment the new iteration had pretty much the same performance as our GMRES implementation.method Gauss-Seidel GMRES SMW (3) eSMW(ω = 1.86) (4) n = 10 As we mentioned before, the new methodology can be applied to more general matrices that are not necessarily symmetric or diagonally dominant.To demonstrate this we consider the matrix which is nearly circulant with N the matrix with −1 on its four corners and zeros elsewhere.The SMW iteration (3) converges in 20 iterations.The extrapolated SMW iteration (4) with ω opt ≈ 0.851 converges in only 9 iterations, while the Jacobi method does not converge, the Gauss-Seidel method luckily requires 78 iterations and the GMRES required only 3 iterations.The parameter ω opt was computed using the formula (5).
Remark 3.1.Note that the cases we present in this paper can be handled by other numerical methods and are commonly encountered in finite element and finite difference implementations.However, there are cases where the new method outperforms other methods.For example, consider a dense n × n circulant matrix M (for large n), and as N any sparse matrix that satisfies the requirement stated in Theorem 2.1.We tested such a case with M constructed by a random vector with entries following the uniform distribution over [1,2).In such a case, the method presented here is the only trivial approach to handle it even for n = 10 6 .This particular example can be found in [14] where the resulting matrix A is ill-conditioned but still the new method converges almost instantaneously within a few iterations.On the other hand, all the other numerical methods we tested (including the GMRES implementation of the Python module scipy.sparse.linalg)failed to converge even when n = 100.The particular experiment can be found in [14].
Remark 3.2.We underline that the performance of all the methods we used can be improved using various techniques (parallelization, preconditioning, etc.) and even with better implementations.For this reason, these experiments serve as indication of the performance of the new iteration.On the other hand, the simplicity of the new method and its overall good performance.

3.2.
Extensions to saddle point problems.In this section we consider a linear system that can be derived when we use a mixed formulation of a finite element method for some nonlinear and dispersive wave equations with zero Dirichlet boundary conditions [15].Consider the linear system where A 1 , A 2 are square n × n and m × m, and B 1 and B 2 are n × m and m × n matrices respectively.The unknown vectors x and y are accordingly n and m-dimensional vectors.In what follows we assume for simplicity that A i and B i are invertible n × n matrices.Taking into account the special structure of the matrices A 1 and A 2 , which are assumed to be near M 1 and M 2 , respectively, we propose the following iterative method: and (x (0) , y (0) ) T is a given initial guess of the solution.
are circulant matrices with F i the corresponding Fourier matrices, then the inversion of the block-diagonal matrix is trivial using FFT and in particular the diagonal of D i is actually the FFT of the vector comprised the first row of M i .Denoting the vectors u (k) = (x (k) , y (k) ) T and b = (b 1 , b 2 ) T , and the matrices we express the iterative method in the form ( 7) where D is actually the FFT of the vector comprised the first row of M 1 followed by the first row of M 2 .
Another extension is the block-Gauss-Seidel-SMW variant of the new iterative method, which can be formulated as and with given (x (0) , y (0) ).This is expected to converge in less iterations than the Jacobi method since it utilises the updates x (k+1) in the computation of the y (k+1) .In the special case where the matrices M 1 and M 2 are nearly circulant, then the implementation is straightforward again.If the matrices B i are also circulant, then the method can be accelerated again using the FFT of x (k) in all the intermediate stages.In our implementation we assume that B 1 and B 2 are sparse and we use sparse matrix-vector multiplication as it is implemented in Python in CSR storage format, cf.e.g.[13].
As an example we consider the case where , where φ i are the standard hat basis functions of the space of continuous and piecewiselinear functions S 1 h .The matrix B is tri-diagonal B = tridiag(−1/2, 0, 1/2) but not circulant.Such matrices occur in mixed formulations of finite element methods like those in [15] and [7].In this particular case the method converges very fast.The performance of the new method in comparison with the sparse Gauss-Seidel method and GMRES is presented in Table 3 The methodology presented in this section can be extended to larger block matrices obtained in saddle point problems and also to two-dimensional problems with tensor products of splines in a very similar manner.A particular, classical example is presented in the next section.For simplicity we consider the uniform grid with ∆x = ∆y = 1/m.Applying the five-point stencil finite difference scheme we obtsain to a block-tridiagonal system of equations Ax = b where A is the ) is m × m matrix, and I is the m × m identity matrix.In order to devise the block-Gauss-Seidel-SMW iteration for this particular case we write D = M − N where M is the circulant matrix M = circulant(1, −4, 1) and N is such that N i,j = 0 for all i, j except for N 1,m = N m,1 = 1.Moreover, we write the k-th iteration and the right hand side as a block-column vectors:

Then the algorithm is as follows:
To test this algorithm we took m = 200 (i.e.A ∈ R 20000×20000 ).For simplicity, we took again b to be the vector that results in the solution x with x i = 1 for all i.The method with T OL = 10 −7 converged within 503 iterations and it required 2.98 seconds approximately.Here it is pointless to provide any comparison with the other classical iterative methods but we report that the python implementation of GMRES converged in 5211 iterations and it required more than 30 seconds and the actual difference between the exact solution and the numerical was of the order 10 −5 even if the residual was of the order 10 −9 .It is obvious by the requirements of this experiment that the new method can be used in large-scale problems due to its simplicity, minimum storage requirements as well as its performance.
The implementation of this algorithm along with implementations of the previous variants of the SMW iteration and the setup of all the examples presented here can be found in [14].All the experiments performed on an Apple computer with processor M1 of 2020 and 16GB RAM.

Conclusions
A new splitting based on the Sherman-Morrison-Woodbury formula was presented for the iterative solution of linear systems of certain form.After introducing the notion of nearly M matrices, we proved that for such matrices the new method converges.The new method can be used with finite element and finite difference formulations to solve banded, non-symmetric, block systems (saddle point problems) seamlessly.The applications should not be limited to those we presented here.Among the several advantages of the new method, one is its simple implementation in case of nearly circulant matrices.The speed of the new method can be increased using classical extrapolation techniques and can compete standard iterative methods.The performance of the new iteration is increasing with the dimension of the matrix and becomes faster even than the classical GMRES for large matrices.

Algorithm 1
Block-Gauss-Seidel-SMW iteration for the Poisson equation Set the initial guess x (0) i for i = 1, 2, . . ., m Set the iteration k = 0, the tolerance T OL and the maximum number of iterations M AXIT Initialize Error = T OL + 1 while Error > T OL and k < M AXIT do Solve Mx + b m Set Error = x (k) − x (k+1) and k = k + 1 end while Return the approximation x (k)

Table 1 .
Taking advantage of parallel features of the FFT one can achieve better results.Number of iterations and CPU time required for the linear finite elements matrix

Table 2 .
Number of iterations and CPU time required for the cubic spline finite elements matrix

Table 3 .
. Number of iterations and CPU time for linear elements in a mixed formulation