Condition Number Estimation of Preconditioned Matrices

The present paper introduces a condition number estimation method for preconditioned matrices. The newly developed method provides reasonable results, while the conventional method which is based on the Lanczos connection gives meaningless results. The Lanczos connection based method provides the condition numbers of coefficient matrices of systems of linear equations with information obtained through the preconditioned conjugate gradient method. Estimating the condition number of preconditioned matrices is sometimes important when describing the effectiveness of new preconditionerers or selecting adequate preconditioners. Operating a preconditioner on a coefficient matrix is the simplest method of estimation. However, this is not possible for large-scale computing, especially if computation is performed on distributed memory parallel computers. This is because, the preconditioned matrices become dense, even if the original matrices are sparse. Although the Lanczos connection method can be used to calculate the condition number of preconditioned matrices, it is not considered to be applicable to large-scale problems because of its weakness with respect to numerical errors. Therefore, we have developed a robust and parallelizable method based on Hager’s method. The feasibility studies are curried out for the diagonal scaling preconditioner and the SSOR preconditioner with a diagonal matrix, a tri-daigonal matrix and Pei’s matrix. As a result, the Lanczos connection method contains around 10% error in the results even with a simple problem. On the other hand, the new method contains negligible errors. In addition, the newly developed method returns reasonable solutions when the Lanczos connection method fails with Pei’s matrix, and matrices generated with the finite element method.


Introduction
Solving the linear equation is considered to be the most time consuming component of simulation computation. As a result, a number of linear equation solvers have been developed in order to realize more efficient simulation. Linear equation solvers can be roughly categorized into two types: direct solvers and iterative solvers. Direct solvers have been used for ill conditioned problems, because of the robustness of the solution. However, the iterative method has become mainstream in recent years for the following reasons: • The iterative method requires less memory than the direct method, if the coefficient matrix of the equation is sparse. Most simulation methods, as is the case with the finite element method and the finite difference method, use sparse matrices.
• The iterative method is more suitable for distributed memory type parallel computers than the direct method. Recently, the trend in super computers has been toward distributed memory computers.
As a result, iterative solvers have generated a great deal of interest. Iterative methods are of two types: the stationary method and the Krylov sub-space type method. Generally, the Krylov subspace type method is more commonly used than the stationary method because the stationary method, is in most cases, slower and more suited to parallel computers. The conjugate gradient method is representative of the Krylov sub-space type method, and the Gauss-Seidel method is a typical representative example of the stationary method. The Krylov sub-space type method is faster than the stationary method, but is sometimes unstable. Furthermore, rapid convergence is always desired. In order to address such considerations, preconditioning is applied to the Krylov sub-space type method. In particular, the conjugate gradient method with preconditioning, sometimes called the preconditioned conjugate gradient method (PCG), is one of the most well known iterative solvers [1]. Evaluating the convergence rate of an iterative solver is sometimes important in order to demonstrate the effectiveness of new methods or to select an adequate method. The convergence rate of the conjugate gradient method (CG) and other Krylov sub-space type solvers strongly depends on the eigenvalue distribution of coefficient matrix, that is to say, the convergence rate is better if the eigenvalues are concentrated. Complete knowledge of the eigenvalue distribution enables the complete prediction of the convergence behavior of the PCG. However, obtaining all of the eigenvalues is difficult, and so another simple indicator, called condition number, is used. Generally, the condition number is easy to calculate when the coefficient matrix is explicitly obtained. As described above, parallel computers, especially distributed memory type parallel computers, are becoming increasingly important for engineers or physicists. Obtaining preconditioned matrices using a parallel computer is extremely difficult and an evaluation method is required. One way to evaluate preconditioned matrices is by the Lanczos connection method [2], [3]. However, the Lanczos connection method is weak with respect to round-off errors and therefore is not suitable for large matrices. Moreover, the applicability of the Lanczos connection method has not yet been discussed.
In the present paper, we introduce a new evaluation method of preconditioned matrices based on Hager's method, and verify the robustness of the new method by comparing their precisions.
In particular, the following property is required to maintain the symmetry of the preconditioned matrix: The coefficient matrix of a transformed system of linear equations (Equation 3) should be more favorable in nature than Equation 1, and the number of itrations to convergence of the CG on the transformed system should be less than that on the original system. One can show that the number of iterations to convergence of the CG is proportional to ffiffi k p À1 ffiffi k p þ1 , where κ is the 2-norm condition number of a coefficient matrix (defenition of the 2-norm condition number is given in the section "Condition number" in this article) [4]. Thus, the smaller κ becomes, the less iterations is required to convergence. Therefore, M should be selected to decrease κ of the transformed system. Such a transformation is not operated in practical programming, since the matrix-matrix product is needed, which requires much computational effort. The PCG algorithm is shown in Table 1. In the algorithm, Solve Mz = r is the preconditioning (line 5 in the algorithm), where r is the residual norm of the PCG, and z is the residual norm in the transformed space by M. Although this equation is usually difficult to solve, such difficulty can be avoided by selecting matrix M to be a diagonal matrix or products of triangular matrices. Such matrices are described in the following two sections.

Diagonal scaling
When the preconditioning matrix M is expressed as where the function diag(A) extract the diagonal component of a matrix A. This preconditioning is called the diagonal scaling [5].

SSOR
Assuming that the coefficient matrix is symmetric and is decomposed as the SSOR preconditioning matrix is defined as where, D, L, and ω are the diagonal component of A, the lower triangular component of A, and the relaxation parameter, respectively. Usually ω is set to 1.0 [6]. Note that the diagonal scaling and SSOR preconditioning matrices are symmetric positive definite (SPD), if A is SPD. This is because, those preconditioning matrices can be rewritten as, whereL is a solvable lower triangular matrix, and then, Equation 11 is the definition of the Cholesky decomposition. Thus M is SPD [7], [8].

Condition number
The convergence rate of the CG method depends on the distribution of eigenvalues of a coefficient matrix A or a preconditioned matrixÃ. Although the complete information of eigenvalues enables prediction of the exact convergence behavior, a simpler indicator, known as the condition number, exists. The condition number of a matrix is given by where kÁk denotes the norm of a matrix [8]. Since there are various definitions of the norm, we can define various condition numbers. The condition numbers given by the 1-norm or 17: end for 18: return x i last , where i last is the last iteration 2-norm, as defined below, are commonly used.
Furthermore, if the matrix A is SPD, then the condition number given by the 2-norm can be written as

Lanczos connection
The Lanczos connection method is a method of solving the eigensystem and is strongly related to the CG method [2]. Using such a relationship, the condition number of A can be calculated through the CG procedure. Now, we define the n × k matrix R k as where n is the matrix size of A, k is the iteration number of the PCG and r i , (i = 0, . . ., k − 1) are the residual vectors and can be found in the PCG algorithm (Table 1). Next, we define k × k matrix B k as follows: where β k are the scalars shown in the PCG algorithm. Using the matrix P k = [p 0 , p 1 , . . ., p k−1 ], where p i , (i = 0, . . ., k − 1) are the modification direction vectors in the PCG algorithm. Since vectors p 1 , p 2 , . . ., p k are A-orthogonal, where Λ k is a matrix having components that are given by : Here, we consider the n × k matrix Q k , where Δ k is a k × k matrix having the following components: where kr i k 2 , (i = 0, . . ., k) are the 2-norms of the residual vectors. The columns of Q k are the Lanczos vectors, for which the associated projections of A form a tridiagonal matrix, thus Since external eigenvalues of T k approximate those of matrix A, we can approximate the condition number of matrix A using this method. Furthermore, if the above parameters are obtained from the PCG, we can obtain the condition number of the preconditioned matrix [3]. The advantages of this method are ease of implementation and ease of evaluating the condition number of preconditioned matrices. However, this method is not applicable for large systems or illconditioned systems, due to the fact that this method requires the strict orthogonality of basis vector used in the CG aglgorithm. However, such orthogonality tends to be broken for the large system or ill conditioned system.

Hager's method
Hager's method gives the 1-norm of A −1 [9]. Originally, the condition number is defined as Equation 12. Therefore, we need the 1-norm of A. Fortunately, the 1-norm of A is easy to obtain if the components of A are given explicitly. The algorithm of Hager's method is described in Table 2. Although Hager's method requires the transposition of matrix A, we assume that A T = A, because we are using the PCG, therefore symmetric positive definite matrices can be solved. Hager's method requires the solution of a linear equation during iteration of the procedure for finding the 1-norm condition number. In the present study, they are solved using the PCG.

Hager's method for preconditioned matrices
Here, we develop a new condition number estimation method which overcomes the downside of the Lanczos connection method. That is to say, new method must have such features: • Applicable for large system. Especially, it must work on distributed memory computers.
• Robust for ill conditioned system.
In order to develop such new method, we employ Hager's method which gives the 1-norm of inverse matrices as a basic algorithm. In the rest of this sections, we describe the algorithm of our new method. As introduced in the previous section, we can obtain the 1-norm condition number using Hager's method. There is a simple method of extending Hager's method to preconditioned matrices without explicit matrix production. That is, in Hager's method, we simply replace matrix A with M À1 1 AM À1 2 . It can be easily achieved if the PCG is employed to solve equations that appear in Hager's method (lines 4 and 11 in Table 2). This is because, matrices are used as the form of p = Aq, where p and q are vectors. Therefore, we can solve the system of The algorithm is named "calc 1-norm of inverse preconditioned matrix" and is shown in Table 3. In addition, the PCG algorithm that solves M À1 Table 4. Since the Hager's method gives us only k M À1 1 AM À1 2 À Á À1 k 1 , a method by which to calculate kM À1 1 AM À1 2 k 1 must be constructed. It is not straightforward, because we have no explicit M À1 1 AM À1 2 . In the present study, we develop such a method using Hager's method by replacing the solution of the two linear equations in Table 2 (lines 4 and We thus obtain kM À1 1 AM À1 2 k 1 . The algorithm which calculates the 1-norm of preconditioned matrix is named "calc 1-norm of preconditioned matrix" and is shown in Table 3. The entire algorithm to calculate 1-norm condition number is referred to as the modified Hager's method in the present article.
These developed algorithms for preconditioned matrices always converge if coefficient matrices and preconditioning matrices are positive definite. This is because, Hager's method always converges for positive definite matrices [10]. In addition, one can show that the inverse matrices of positive definite matrices are also positive definite, and the product of two positive definite matrices are also positive definite.

Results
In this section, we carry out the feasibility test of our new method by comparing the performance of both our method and the Lanczos connection method. In the following sections, the

Computational environment
Here, we describe the computational environment in order to determine the precision of the numerical calculation. All of the code was written with Fortran90, and the eigenvalues of the tridiagonal matrix obtained from the Lanczos connection was calculated using the Intel Math Kernel library. The machine architecture and compiler environment are listed in Table 5. In addition, the condition number, which should be compared between the Lanczos connection method and our modified Hager's method, was calculated using Octave [11].  Let x 0 be the initial approximation 3: for i = 0, 1, 2, 3, . . ., until convergence do do 5: 10: Sample matrices Diagonal matrix. In order to verify the precision of both of these methods, we first employ a simple diagonal matrix for a test problem. The test matrix T diag is given by ; ð24Þ where n is the matrix size. The eigenvalues of this matrix are (1, 2, . . ., n), and the condition number of these eigenvalues should be n. Note that, the condition number of both diagonal scaled and SSOR preconditioned T diag are 1.0. Tridiagonal matrix. The n × n tridiagonal matrix we employed in the present studies is given as : Pei's matrix. We employ Pei's matrix [12] as a test matrix, because one can control the condition number of the matrix quite easily. Pei's matrix is defined as follows: where d is the parameter by which to determine the condition of matrix T Pei , which becomes increasingly ill conditioned as d approaches zero. If d is zero, matrix T Pei is singular.
Poission's equation with FEM. The above sample matrices were artificially created and have simple structures. In this section, we will introduce matrices that are created with the finite element method (FEM) in order to investigate the feasibility of both methods with realistic problems. The equation which employed was Poission's equation. The analysis domain was a cube with edge lengths of 1.0. In the present study, three types of finite element meshes were prepared and each of three types had a different degree of freedom (DOF) on x, y, and z axes: 20 × 20 × 20, 10 × 10 × 80, and 5 × 5 × 320. The total DOFs were the same. Dirichlet boundary conditions were applied to both faces at z = 0, and 1.0.

Estimated condition numbers
Diagonal matrix. First we discuss the feasibility of the Lanczos connection. The eigenvalues of T diag calculated with the Lanczos connection is listed in the first row in Table 6. Here we set n = 10. The largest eigenvalue is unreasoably large, and therefore, obtained condition number is unreasonable. We can remove this unresonably large eigenvalue, by ignoring the first step information of the PCG. This treatment leads one-rank smaller tri-diagonal matrix associated with the Lanczos connection. However, the resulted condition number is more reasonable. The eigenvalues without the first step information is list in the second row in Table 6. Simply, the largest eigenvalue is removed. This treatment is applied to the remaining of this article.
Next, we check the feasibility of both the Lanczos connection and Hager's method with T diag . The condition numbers calculated using both methids are listed in Table 7. The obtained condition numbers are reasonable.
Tridiagonal matrix. The condition numbers of T Tri are listed in Table 8. Here we employ 10 × 10, 100 × 100, 200 × 200 and 500 × 500 matrices. In addition, those of diagonally scaled matrices of the same size are listed in Table 9. Comparison of Table 8. Table 9 reveals that the condition number is not improved by diagonal scaling in this case. Moreover, the number of iterations to convergence was also the same. We first discuss Hager's method, which corresponds to Octave completely, for every size in both cases. Higham reported that Hager's method gives the correct value for positive definite matrices [10]. As we discussed, the diagonal scaled matrices are also SPD. Therefore, these results can be expected. On the other hand, although the Lanczos connection method provides less favorable results for smaller matrices, the results improved for larger problems. This behavior results from the modification described in the previous section, whereby we removed the information of the first conjugate gradient step. The effects of the modification became smaller for larger problems. Finally, it can be said that software developed in the present studies of both the Lanczos connection method and the modified Hager's method work well with preconditioned matrices.
Pei's matrix. In the present paper, we fixed the matrix size as 100 × 100 and set d as d = 0.5, 0.25 and 0.125. Calculated condition numbers and errors with respect to the condition number from Octave are listed in Table 10. An error is defined as   Table 11, and 12. The errors are also shown in the tables. In diagonal scaled cases, the Lanczos connection gives results with over 1,000% errors. Particularly, in the 20 × 20 × 20 case, it shows 3.03 × 10 27 % error. Reasonable results can be obtained by The condition numbers of T Tri using the Lanczos connection method, the modified Hager's method, and Octave are listed.
doi:10.1371/journal.pone.0122331.t008 The condition numbers of diagonal scaled T Tri using the Lanczos connection method, the modified Hager's method, and Octave are listed.
doi:10.1371/journal.pone.0122331.t009 removing the last CG step information from the Lanczos connection procedure. The condition number moves from 2.35 × 10 28 to 6.94 × 10 2 . However, such procedures can only be done when the true result is already known, and therefore, the Lanczos connection cannot be used with ease. On the other hand, using the modified Hager's method gives the results with an error rate under 6.46%.
In SSOR preconditioned cases, the Lanczos connection gives more accurate results than diagonal scaled cases, however, the results still include over 50% errors. In addition, when the condition number of the original matrix becomes large, the error becomes more extreme. Estimations may become unreasonable in cases where problems are ill-conditioned. The modified Hager's method gives results with under 3% errors. The results in this section show that the modified Hager's method can be used to estimate the condition numbers of practical problems. Finally, for reference, the condition numbers of each problem based on each norm without preconditioning are indicated below: 1-norm 20 × 20 × 20: 1.02 × 10 3 , 10 × 10 × 80: 3.73 × 10 3 , and 5 × 5 × 320: 7.22 × 10 4 , 2-norm 20 × 20 × 20: 7.75 × 10 2 , 10 × 10 × 80: 2.79 × 10 3 , and 5 × 5 × 320: 4.73 × 10 4 . The condition numbers without preconditioning were calculated with Octave. Sample programs of both the Lanczos connection and the modified Hager's method will be uploaded on the Zenodo site for readers' convenience [13].

Parallel implementation
In this section, the research examines how to implement the PCG, and the modified Hager's method. First, the algorithm of the PCG (Table 1) is considered. The PCG algorithm consists of just three components: (1) a matrix-vector product (lines 3, and 13), (2) a vector inner product (lines 6, and 14), and (3) a scalar-vector product (lines 11, 15, and 16). By taking into account the above point, the PCG on distributed memory parallel computers is usually implemented with row-wise matrix decompoistion manner [6], [14]. Here we assume that there are two processors, an 8 × 8 matrix, and an 8 elements vector. The first processor has rows from one to four of the matrix, and vector elements from one to four. The other processor has the remaining matrix rows and vector elements. This memory allocation enables a user to compute all the above three operations with minimal (or even zero) communication, which deteriorates parallel efficiency. Namely, (1) matrix-vector products can be performed if necessary vector elements are obtained, (2) vector inner products can be performed by exchanging the results of partial vector inner products (which are just scalars) on each processor, and (3) scalarvector products can be performed without any communication. One drawback of this memory allocation is that some preconditioning techniques, like SSOR, cannot be implemented on parallel computers as they are on sequential computers. One of the ways to accomplish such a preconditioning technique is localization, which uses only the information that is on each processor, and ignores the information on the other processors [6]. This ignoring of information degrades the effect of preconditioning; however the entire PCG algorithm still works successfully.
Hager's method, and therefore the modified Hager's method as well, can be implemented with the memory allocation techniques explained above. There are several additional operations to PCG (e.g. lines 10, 12, and 16 in Table 2), but they can be implemented without any difficulties. In addition, it should be noted that there is no extra memory space required for parallel computing, except the vector elements which must be obtained from other processors.
Finally, the parallel version of the modified Hager's method was implemented with the above technique. In addition, it was confirmed that the same results as the sequential version were obtained with the diagonal scaling preconditioning.

Computational effort and memory usage
Here, the research discusses the computational effort and the memory usage of the Lanczos connection, the modified Hager's method, and the naive implementation that computes preconditioned matrices explicitely. First, the computational effort, and the memory usage of the naive implementation are proportional to n 3 , and n 2 respectively, where n is the size of a matrix. Here, we assume that the Householder transformation is used to calculate the eigenvalues. Thus, both the computational effort and the memory usage are colossal when n is large and it is applicable only when n is sufficiently small.
Second, the computational effort and the memory usage of the Lanczos connection are proportional to m × n and n respectively, where m is the number of iterations required to achieve convergence of the PCG algorithm. Here, we assume that the matrix we consider has the same matrix structure as the matrices used in the section "Poisson's equation with FEM". In addition, we can assume that m < < n, in other words, the tridiagonal matrix associated with the Lanczos connection is small and the computational effort required to solve is negligible. The matrix has 10 matrix elements in each row, therefore, the entire size becomes 10 × n. The PCG algorithm requires six vectors. In the end, 16 × n elements must be stored. The number of floating point operations of an iteration of the PCG algorithm is approximately twice (add and multiply) the number of matrix and vector elements (excluding preconditioning). The diagonal scaling requries n floating point operations, and the SSOR preconditioning requries 20 × n floating point operations. As a result, roughly 50 × m × n floating point operations are required.
Finally, we consider the modified Hager's method. The memory space requirement of the modified Hager's method is almost the same as the PCG, but additional four vectors must be stored. Experiments in this study indicate that Hager's method converges within four iterations. In addition, Higham also reported that Hager's method converges within four iterations [10]. Using Hager's method, solving linear equations requires the most computational effort and must be done twice par iteration. Other operations, however, require negligible effort. Thus, the total number of floating point operations can be written as 4 iterations ×2 PCGs ×50 × m × n. The computational effort required of the modified Hager's method can be as much as ten times larger than effort required of the Lanczos connection. Nevertheless, the memory usage required for each method is almost identical, and the modified Hager's method is more reliable than the Lanczos connection.

Conclusions
In present paper, we developed a condition number estimation method for the preconditioned matrix and verified the accuracy of the newly developed method. Although the Lanczos connection method can be considered as a method by which to estimate the condition number of the preconditioned matrix, this method is weak with respect to numerical errors. Therefore, we compared the condition numbers estimated using both the Lanczos connection method and the newly developed method with Octave's result. The newly developed method provided better results, whereas the Lanczos connection method failed. In addition, the newly developed method can be applied without difficulty for parallel computing and large-scale problem because this new method does not require explicit matrix operation.