Multipreconditioned GMRES for simulating stochastic automata networks

: Stochastic Automata Networks (SANs) have a large amount of applications in modelling queueing systems and communication systems. To find the steady state probability distribution of the SANs, it often needs to solve linear systems which involve their generator matrices. However, some classical iterative methods such as the Jacobi and the Gauss-Seidel are inefficient due to the huge size of the generator matrices. In this paper, the multipreconditioned GMRES (MPGMRES) is considered by using two or more preconditioners simultaneously. Meanwhile, a selective version of the MPGMRES is presented to overcome the rapid increase of the storage requirements and make it practical. Numerical results on two models of SANs are reported to illustrate the effectiveness of these proposed methods.


Introduction
The use of Stochastic Automata Networks (SANs) is of interest in a wide range of applications, including modeling queueing systems [1][2][3], communication systems [4,5], manufacturing systems and inventory control [6].The generator matrices of these SANs are block tridiagonal matrices with each block diagonal being a sum of tensor products of matrices.To analyze the system performance of a SAN, it is required to nd the steady state probability distribution of the SAN.The solution can be obtained by solving a large linear system involving its generator matrix.
Iterative procedures are often employed to compute the steady state probability distribution of SANs, such as the classical Jacobi, Gauss-Seidel, SOR, SSOR and TSS iterative methods [3,[6][7][8].These methods are simple and easy to implement, but they usually su er from slow convergence and they do not appear to be practical when the size of the generator matrix is huge [6,9,10].Hence, Krylov subspace methods which use the matrix-vector multiplications as their main arithmetic operations are developed.For instance, the popular Arnoldi/FOM, GMRES, CGS, BiCGSTAB and QMR methods and so on [7,[11][12][13][14][15][16][17][18].Based on a small-dimension Krylov subspace, such iterative methods are able to e ciently approximate the steady state probability distribution of SANs [7,10,19].However, they may also su er from slow convergence, even though lots of numerical experiments indicate that they generally outperform the classical iterative methods, refer, e.g., to [3,7] for details.
It is well known that preconditioning techniques are necessary to improve the convergence of iterative methods by modifying the eigenvalue distribution of the iteration matrix, while the solution remains unchanged.Preconditioners based on incomplete LU factorizations are possible choices and have been discussed in [3,7,20].A SAN preconditioner (based on the Neumann) series was proposed by Stewart et al. [10].Numerical experiments show that the SAN preconditioner is expensive and needs more executing time.For taking use of the tensor structure of the generator matrix of a SAN, an approximate inverse preconditioner for the SAN was proposed by Langville and Stewart [19].It is called the Nearest Kronecker Product (NKP) preconditioner.Numerical tests on some SANs have shown the e ectiveness of the NKP preconditioner, however it is not always appropriate for general Markov chains [21].Taking circulant approximations of the tensor blocks of the generator matrix, circulant preconditioners were constructed for a SAN [2,4].These are simple to construct and as since circulant matrices can be easily diagonalized by using the fast Fourier transforms [22].However, the successful applications of these circulant preconditioners depend on whether the tensor blocks of the generator matrix of a SAN are near-Toeplitz matrices.In fact, di erent preconditioners have di erent e ciencies for some model problems.Which one is better and how to develop more e cient preconditioners, especially for very large systems, remains to be done in the future.
In this paper, motivated by the idea of [23], our main contribution is to nd the steady state probability distribution of a SAN by employing a variant of GMRES where two or more preconditioners are applied simultaneously, and denote it as multipreconditioned GMRES (MPGMRES).Compared to the standard GMRES, one advantage of the MPGMRES is that it is able to nd an approximate solution over a rich search space that has been enlarged by collecting all possible search direction generated by di erent preconditioners at each iteration.However, its disadvantage is that more memory requirements are required since the search space has an exponential growth at each iteration.Fortunately, a selective version of the MPGMRES (sMPGMRES) is considered by discarding some search directions such that the search space only has a linear growth with the size of problems increasing at each iteration.In fact, the idea to use multiple preconditioners is not new, e.g., Bridson et al. [24] has proposed the multipreconditioned conjugate gradients (MPCG) for symmetric positive de nite linear systems.Even though the short-term recurrence of the standard CG does not hold in general, the MPCG is very useful in terms of determining what the most e ective preconditioner is by executing a few iterations of the MPCG.On the other hand, for linear systems with a nonsymmetric coe cient matrix, a combined preconditioning strategy has been analyzed in [25] and shows that an optimal linear combination of two preconditioners could obtain the good numerical performance.
The rest of this paper is organized as follows.In Section 2, a brief introduction of two models of SANs is given.The derivation of the MPGMRE and its selective version are discussed in Section 3. In Section 4, numerical experiments on two SANs are employed to illustrate the e ectiveness of these methods.Finally, some conclusions are given in Section 5.

Two models of SANs
To introduce certain terminologies and notations of SANs, two models will be described in this section; a two-queue over ow network and another one is a telecommunication system.

. A two-queue overflow network
The early research on the two-queue over ow network can be found in [2,26].This network consists of two independent queues with the number of servers and waiting spaces being s i and l i − s i − (i = , ) respectively, where l i is the total number of customers in the queue i.For queue i, i = , , let λ i be the arrival rate of customers, and let µ i be the service rate of the servers, then the state space of the queueing system can be represented by the elements in the following set: where i represents the state that there are i customers in the queue 1, and j represents the state that there are j customers in the queue 2. Thus it is a two-dimensional queueing system.Generally speaking, the queueing discipline is First-come-rst-served.When a customer arrives and nds all the serves are busy, the customer has two choices: one is to wait in the queue, provided that there is still a waiting space available, another is to leave the system.But for now we allow the over ow of customers from queue 2 to queue 1 whenever it is full and queue 1 is not, see Fig. 1.The generator matrix can be shown to be the following l l × l l matrix in the tensor product form where and I l i is the identity matrix of size l i , i = , , and e l is the unit vector ( , . . ., , ) T , see [2,26] for instance.
Let n = l l , then the steady state probability distribution vector p = (p , p , . . ., p n ) T is the solution of the linear system Ap = with constraints According to the properties of the tensor product [9,10,27], the matrix R ⊗ e T l e l is a lower triangular matrix which describes the over ow of customers from the queue 2 to the queue 1, and the matrix A in (1) is a singular and block tridiagonal matrix.Particularly, the matrix A is irreducible, has zero column sums, positive diagonal elements and nonpositive o -diagonal elements.According to [28,29], matrix A has a one-dimensional null space with a positive null vector.Hence, the existence of the steady state probability distribution vector p can be guaranteed.

. A telecommunication system
Here an (MMPP/M/s/s+m) network arising in telecommunication systems is considered.It is known that the Markov Modulated Poisson Process (MMPP) is a generalization of the Poisson process and is often regarded as the input model of telecommunication system [4,5].To construct the (MMPP/M/s/s+m) network, rst de ne the system parameters as follows: (a).λ, the mean arrival time of the exogenously originating calls of the main queue, (b).µ, the mean service time of each server of the main queue, (c).s, the number of servers in the main queue, (d).l − s − , the number of waiting spaces in the main queue, (e).q, the number of over ow, (f).(Q j , Λ j ), ≤ j ≤ q, the parameters of the MMPP's modeling over ow parcels, where Here σ j , σ j and λ j , ≤ j ≤ q, are positive MMPP parameters.
The input of the main queue comes from the superposition of several independent MMPPs, which is still an MMPP and is parameterized by two q × q matrices (Q, Ω).Here and Ω = Λ + λI q , where I and I q are the × and q × q identity matrices, respectively.Now the (MMPP/M/s/l) network can be regarded as a Markov process on the state space where the number i corresponds to the number of calls at the destination, and the number j corresponds to the state of the Markov process with generator matrix Q.Hence, it is a two-dimensional queueing system, and the generator matrix can be shown to be the following l q × l q tridiagonal block matrix: It can be rewritten as in the tensor product form, where matrices B and R has the same form as Q i and R in ( 2) and (3), respectively.Let n = l q , then steady state probability distribution vector p = (p , p , . . ., p n ) T is the solution of the linear system Ap = with the same constraints as (4).
Similarly, the matrix A in ( 5) is irreducible, has zero column sums, positive diagonal elements and nonpositive o -diagonal elements.According to the famous Perron-Frobenius theory, the matrix A has a onedimensional null space with a positive null vector [28,29].Hence, the steady state probability distribution vector p exists.
Remark 2.1.To solve the linear system Ap = e ciently, it is necessary to make certain modi cations to the matrix A sine it is singular.One possible way to obtain the steady state probability distribution vector p is to normalize the solution x of the following nonsingular linear system: where b = ( , . . ., , ) T is an n × vector, I is as n × n identity matrix, and > can be chosen as small as possible.The matrix A is nonsingular because it is irreducible strictly diagonally dominant.The linear system ( 6) can be solved by GMRES.It is clear that solving the linear system ( 6) may induce a small error of O( ), but fortunately Ching et al. have proved that the 2-norm of the error induced by the regularization tends to zero [2,4].

Multipreconditioned GMRES
This section will give a brief description of the preconditioned GMRES, and then discuss the GMRES with multiple preconditioners (MPGMRES).At last, to decrease the computing cost of the MPGMRES, a selective version of the MPGMRES is considered.

. Preconditioned GMRES
GMRES is often used to solve the non-symmetric linear system (6).Given an initial vector x , then the corresponding initial residual is r = b − Ax , and the Krylov subspace is: An orthonormal basis for the Krylov subspace K m (A, r ) can be computed by the modi ed Gram-Schmidt orthogonalization procedure with the rst vector being r r .This generates a useful relation: where V m ∈ R n×m and V m+ ∈ R n×(m+ ) has orthonormal columns, and Hm ∈ R (m+ )×m is an upper Hessenberg matrix.Thus at the k-th step, an approximate solution of ( 6) is computed as: where y m = min y∈R m βe − Hm y with β = r and e = ( , , . . ., ) T .
The preconditioning technique is a key ingredient for the successful applications of GMRES.Let M be a preconditioner, then for the linear system (6), the right preconditioned GMRES is considered in Algorithm 1, refer to Ref. [7].
In line 11 of Algorithm 1, the approximate solution x m is expressed as a linear combination of the preconditioned vectors z i = M − v i , i = , , . . ., m, where the preconditioner M is xed, i.e., it does not change from step to step.Now suppose that the preconditioner is able to change at every step: Then the approximate solution is computed by x m = x + Z m y m , where Z m = (z , . . ., z m ).Such kind of GMRES is called Flexible GMRES (FGMRES) since di erent preconditioners are allowed to be used at each iteration [30].Compute w = AM − v j .

. Multiple preconditioned GMRES
Based on the idea of using di erent preconditioners in the FGMRES, the multipreconditioned GMRES with two or more preconditioners being applied simultaneously was proposed by Greif et.al [23].Assume there are k preconditioners, i.e., M , . . ., M k , k ≥ , then at the beginning, for the initial residual r , we have v = r r and such that the rst iteration is computed by x = x + Z y , where the vector y ∈ R k is chosen to minimize b − Ax .From (9), it is easy to see that using all preconditioners simultaneously has enlarged the space where the solution is sought.Similar to Algorithm 1, the multipreconditioned GMRES algorithm (i.e., Algorithm 2) is given as follows.for i = , . . ., j do 6: end for 9: Compute y j = min y∈R m βe − Hm y , and If satis ed stop, otherwise continue; 12: Z j+ = (M − V j+ , . . ., M − k V j+ ).13: end for 14: If unsatis ed, set x = x m and go to step 1.
Comparing with Algorithm 1, the search space increases at each iteration, and the relation in the equation (7) has been replaced by: A Zm = Ṽm+ Hm , where Zm = (Z , . . ., Z m ), Ṽm+ = (V , . . ., V m+ ), and , in which the matrices V j+ and H j+ ,j ( ≤ j ≤ m) are computed by QR factorization in the line 8 of Algorithm 2. Note that matrices Z j and V j+ have k j columns, j = , . . ., m.Thus the matrix Ṽm+ has: Let P m− = P m− (X , . . ., X k ) be the space of all possible polynomials of matrices in k variables of at most degree m − , then at the j-th step of the MPGMRES, the approximate solution can be represented by: where ω i j ∈ P m− (X , . . ., X k ), see [23] for details.Furthermore, from ( 12), the corresponding residual can be computed as: where β i j+ ∈ P m (X , . . ., X k ), and τ i satis es and which implies that, in the matrix polynomial β i j+ , only the i-th variable may have the highest degree j, the degrees of all other variables are less than or equal to j − .From (13), the following result is established.Theorem 3.1.Let r be the initial residual of the linear system (6), and r j be the residual at the j-th step of the MPGMRES with k preconditioners M , . . ., M k .Then it has: r j r ≤ min Therefore, it is important to nd an optimal combination of all the di erent preconditioners as they are used simultaneously at each iteration [23][24][25].This problem still requires further research.

. Selective MPGMRES and computing cost
The idea of the MPGMRES is to enlarge the Krylov subspace over which the GMRES minimizes the residual norm by using two or more preconditioners simultaneously at each iteration.From ( 14), it can be seen that the reason why the search space is so rich is that it not only has the polynomials of higher order of the variables X i , but it also has many mixed terms [23].For instance, when k = , the entire space of the matrix polynomials is It is natural to think of reducing the dimension of the entire space.Specially, for the case when A = M + M , the following result has been proved [23].
Theorem 3.2.If the variables X , X satisfy the condition that X X = X X = X + X , then The mixed terms have been eliminated, which indicates that the dimension of the entire search space has been reduced successfully.Hence, it is a possible choice to construct an approximate search space by discarding some search directions at each iteration.The computing cost of the MPGMRES will then be decreased.Several di erent strategies can be applied to obtain the approximate search space at each iteration, see [23,31].In this paper, a one simple method shown in [23], where line 11 of Algorithm 2 has been replaced by i.e., the preconditioner M is used in the rst column of V j+ , M is used in the second column of V j+ , and so on.This is called selective MPGMRES (sMPGMRES).Correspondingly, the column of the matrix Z j , j = , . . ., m, remains k rather than k j .For comparisons, at the j-th iteration, the computing cost of the MPGMRES and sMPGMRES is listed as follows [23].
From Table 1, it can be seen that the computing cost of the MPGMRES has an exponential growth, while that of the sMPGMRES only grows linearly.

Numerical experiments
In this section, Algorithms 1-2 are tested and the variant (sMPGMRES) introduced in this paper on two models of SANs.One example is from queueing systems, the other from telecommunication systems.In particular, we compare the numerical performance of the MPGMRES, sMPGMRES, standard right preconditioned GMRES and unpreconditioned GMRES (with restart being 10) in terms of the total computing time and iteration steps.For convenience, GMRES(M) is used to denote the right preconditioned GMRES method with the preconditioner M in this section.Convergence histories are shown in gures with the number of iterations on the horizontal axis, and Relres (de ned as log of the relative residual 2-norms, i.e., log ( r j r )) on the vertical axis.All experiments are carried out with a serial MATLAB implementation of the code, in which parts of the MATLAB code are from https://github.com/tyronerees/krylov_solvers/tree/master/mpgmres.
Throughout our numerical experiments, the stopping criteria is set as r j r ≤ − , where r j is the current residual and r is the initial residual.The initial vector for all methods is chosen to be x = ( , . . ., , ) T .The parameter in the equation ( 6) is given as = ., other numbers can also be used if they can make the coe cient matrix A be nonsingular.
Example 4.1.The two-queue over ow network [2,26].This test problem was introduced in Section 2.1.The corresponding matrices can be found in the equations ( 1), ( 2) and ( 3).Here set λ = µ = s = , and λ = µ = s = .The size of the matrix A is n = l l .To nd the steady state probability distribution, di erent ways can be applied to construct di erent preconditioners in order to solve the linear system (6).Case one: Consider the incomplete LU factorization with "droptol = 0.001" for the coe cient matrix A, i.e., A = LU + E, where L is an unit lower triangular matrix and U is an upper triangular matrix.Hence the matrices L and U are used as two preconditioners for the GMRES.Numerical results for this case are shown in Table 2.
Table 2.The number of iterations and computing time in seconds (in brackets) for Example 4.1 when L and U are used as preconditioners.The last column Ratio is de ned as (computing time of sMPGMRES)/(computing time of GMRES).
Table 2 shows that the iteration counts of the MPGMRES and sMPGMRES with two preconditioners L and U being used simultaneously are less than those of the unpreconditioned GMRES.In particular, MPGMRES gives the best iteration steps, even though its computing time is the worst.However, it can be seen that the computing time of the sMPGMRES is superior to that of the unpreconditioned GMRES.The last column in Table 2 further con rms the fast convergence of the sMPGMRES.For instance, when l = , l = , the computing time needed by the sMPGMRES is almost only half of that needed by the unpreconditioned GMRES.Moreover, Fig. 2 (left) plots their convergent histories when l = and l = .Case two: Let D and T be the diagonal and tridiagonal matrices of the matrix A, respectively.Then, correspondingly, the matrices J (equals to D) and T are considered as the Jacobi preconditioner and the tridiagonal preconditioner for the GMRES.Numerical results for this case are listed in Table 3.
Table 3.The number of iterations and computing time in seconds (in brackets) for Example 4.1 when J and T are used as preconditioners.Ratio1 and Ratio2 are de ned as (computing time of sMPGMRES)/(computing time of GMRES(T)) and (computing time of sMPGMRES)/(computing time of GMRES(J)), respectively.
From Table 3, it can see that the number of iterations required by the sMPGMRES is the best.In terms of the computing time, it is clear to nd that the sMPGMRES is superior to the GMRES(T) and GMRES(J) when the size of this test problem becomes larger.While the sMPGMRES needs more computing time than the GMRES(T) and GMRES(J) when the size of this test problem is small.The last two columns of Table 3 illustrate the performance of the sMPGMRES with the size of the test problem increasing.Hence, it shows that, by discarding some search direction at each iteration, using di erent preconditioners for the GMRES simultaneously can be more e ective than using a single preconditioner at each step.Fig. 2 (right) shows their convergent curves when l = and l = .
Example 4.2.The telecommunication system [4,5].This example has been introduced in Section 2.2.The corresponding matrices can be found in the equations ( 2), ( 3) and ( 5).Here we set λ = µ = s = , q = , σ j = and λ j = q, j = , . . ., q.The size of the coe cient matrix A is n = l q .For nding the steady state probability distribution, similar ways to Example 4.1 have been used to construct di erent preconditioners.
Case one: Using the same incomplete LU factorization as given above for the coe cient matrix A generated by (6).Then again, the matrices L and U are used as two preconditioners for the GMRES.The corresponding numerical results for this case are presented in Table 4.As seen from Table 4, in the sense of iteration counts, the MPGMRES and sMPGMRES with two preconditioners being applied simultaneously are superior to the GMRES with single preconditioner.And in terms of the computing time, the MPGMRES and sMPGMRES cost more than the GMRES(L) and GMRES(U) when the size of this test problem is small.The reason may be that the dimension of the Krylov subspaces has an increase when two preconditioners are used simultaneously.However, when the size of the test problem increases, it is not di cult to see that the sMPGMRES has given the best results, e.g., when l = , the computing time of the GMRES(L) and GMRES(U) have been reduced by % and %, respectively.Fig. 3 (left) shows their convergent histories when l = .Note that the computing time of the sMPGMRES is always less than that of the MPGMRES, which indicates that appropriately discarding some search direction from MPGMRES can lead to fast convergence, even though the iteration steps have a little increase.Therefore, the GMRES with multiple preconditioners is a competitive and robust choice for the large test problem.5. Again, from Table 5, it is shown that the iteration counts of the MPGMRES and sMPGMRES are less than those of the GMRES(G) and GMRES(J).With the increase of the size of this test problem, the computing time of the MPGMRES and sMPGMRES are superior to that of the GMRES(J), although the computing time of the MPGMRES is inferior to that of the GMRES(G).In particular, the sMPGMRES shows the best results, e.g., when l = , the computing time of the GMRES(G) and GMRES(J) are reduced by % and %, respectively.Furthermore, Fig. 3

(right) plots their convergent histories when l =
, with G and J being two preconditioners.Case three: Three preconditioners are applied to the GMRES.Observe the equation ( 5), it can be seen that the matrix A consists of three parts.According to the properties of the tensor product [9,10,27], the last part of the matrix A, i.e., R ⊗ Ω, is a lower triangular matrix.Since the last diagonal element of the matrix R is zero as shown in the form of (3), the tensor product R ⊗ Ω is a singular matrix.To modify its singularity, let the last diagonal element of R be equal to λ, and denote the new matrix as R. Then let M = R ⊗ Ω.It is nonsingular.On the other hand, using the tensor construct of the matrix A given in (5), we let M and M be the lower triangular matrices of the tensor product I l ⊗ Q and B ⊗ I q , respectively.Then three preconditioners M , M and M for the linear system (6) are considered for the GMRES.Numerical results for this case are provided in Table 6.As seen from Table 6, it nds that the MPGMRES has shown the best iteration steps.In addition, the number of iterations of the sMPGMRES is also less than those of the GMRES(M ), GMRES(M ) and GMRES(M ).From the last column of Table 6, it also observes that the preconditioner M is not e ective since its iteration counts have a large change and the computing time has a dramatic increase with the size of this test problem increasing.However, the preconditioner M does not a ect the stability of the MPGMRES and sMPGMRES, i.e., their iteration steps have no changes and their computing time are acceptable.In particular, for large test problem, the sMPGMRES gives the best computing time, e.g., when l = , the computing time of the GMRES(M ), GMRES(M ) and GMRES(M ) are reduced by %, % and %, respectively.Hence, the acceptable e ciency of the GMRES with multiple preconditioners is illustrated again.

Conclusions
In the current study, the main contribution is to consider the MPGMRES for computing the steady state probability distribution of SANs.The idea of the MPGMRES is to enlarge the Krylov subspace over which the GMRES minimizes the residual norm by using two or more preconditioners simultaneously at each iteration.Since the dimension of the search space has an exponential growth at each iteration, a practical sMPGMRES is given by discarding some search directions at each iteration, such that the dimension of the search space only has a linear growth with the problem size.Numerical experiments on two models of SANs have illustrated that the MPGMRES is more e ective than the GMRES with a single preconditioner in reducing the number of iterations.Moreover, for large test problems, the computing time of the sMPGMRES is the best.
Here it is worth noting that only the MPGMRES has been tested on two models of SANs.In fact, there are several other models of SANs, e.g., manufacturing systems [6,32], node mobility in wireless networks [33] and FTS projects [34].Hence, it would be interesting to extend method to these models in the future.

11 :
Compute y m = min y∈R m βe − Hm y , and x m = x + M − V m y m .12: If satis ed stop, else set x = x m and go to step 1.

Table 1 .
Comparisons of computing cost.

Table 4 .
The number of iterations and computing time in seconds (in brackets) for Example 4.2, when L and U are used as preconditioners.

Table 5 .
The number of iterations and computing time in seconds (in brackets) for Example 4.2, when G and J are used as preconditioners.

Table 6 .
The number of iterations and computing time in seconds (in brackets) for Example 4.2, when M , M and M are used as preconditioners.