Conjugate gradient algorithm for consistent generalized Sylvester-transpose matrix equations

: We develop an e ﬀ ective algorithm to ﬁnd a well-approximate solution of a generalized Sylvester-transpose matrix equation where all coe ﬃ cient matrices and an unknown matrix are rectangular. The algorithm aims to construct a ﬁnite sequence of approximated solutions from any given initial matrix. It turns out that the associated residual matrices are orthogonal, and thus, the desire solution comes out in the ﬁnal step with a satisfactory error. We provide numerical experiments to show the capability and performance of the algorithm.


Introduction
Sylvester-type matrix equations show up naturally in several branches of mathematics and engineering. Indeed, many problems in vibration and structural analysis, robotics control and spacecraft control can be represented by the following general dynamical linear model: where x ∈ R m×1 and u ∈ R n×1 are the state vector and the input vector respectively, and A i ∈ R m×m and B j ∈ R n×n are the system coefficient matrices; see e.g., [1,2]. The dynamical linear system (1.1) includes A 1ẋ + A 0 x + B 0 u = 0, the descriptor linear system,

2)
A 2ẍ + A 1ẋ + A 0 x + B 0 u = 0, the second-order linear system, (1.3) A k x k + A k−1 x k−1 + · · · + A 0 x = Bu, the high-order dynamical linear system, (1.4) as special cases. Certain problems in control engineering, such as pole/eigenstructure assignment and observer design of the system (1.1) are closely related to the Lyapunov matrix equation AX + XA T = B, the Sylvester matrix equation AX + XB = C, and other realted equations; see e.g., [2][3][4][5][6][7]. In particular, the Sylvester matrix equation also plays a fundamental role in signal processing and model reduction; see e.g., [8][9][10]. These equations are special cases of a generalized Sylvester-transpose matrix equation: C j X T D j = E. (1.5) A traditional way to solve Eq (1.5) for an exact solution is to transform it to an equivalent linear system via the Kronecker linearization; see Section 3 for details. However, this approach is only suitable when the dimensions of coefficient matrices are small. In practice, for a large-dimension case, it is enough to find a well-approximate solution via an iterative procedure, so that it is not necessary required memories as massive as traditional methods. There are several articles that consider problems that approximate the generalized Sylvester resistive matrix equations and constructs a finite sequence of approximated solutions from any given initial matrix. In the last five years, many researchers developed iterative methods for solving Sylvester-type matrix equations related to Eq (1.5). A group of Hermitian and skew-Hermitian splitting (HSS) methods aims to decompose a square matrix as the sum of its Hermitian part and skew-Hermitian part. There are several variantions of HSS, namely, GMHSS [11], preconditioned HSS [12], FPPSS [13], and ADSS [14]. A group of gradient-based iterative (GI) algorithms aims to construct a sequence of approximated solutions converging to the exact solution, based on the gradients of quadratic norm-error functions. The original GI method fora generalized Sylvester matrix equation was developed by Ding et al. [15]. Then Niu et al. [16] adjusted the GI method by introducing a weighted factor. After that a haif-step-update of GI method, called MGI method, introducing by Wang et al. [17]. The idea of GI algorithm can be used in conjuction with matrix diagonal-extraction to get AJGI [18] and MJGI [19] algorithms. See more GI algorithms for a generalized Sylvester matrix equations in [20][21][22]. For a generalized Sylvestertranspose matrix equation AXB + CX T D = E, there are GI algorithm [23] and an accelerated gradientbased iterative (AGBI) algorithm [24] to construct approximate solutions. There are also GI techniques based on optimization, e.g., [25][26][27]. See more computational methods for linear matrix equations in a survey [28]. The iterative procedures can be used to find solutions of certain nonlinear matrix equations; see e.g., [29][30][31]. There are also applications of such techniques to parameter estimation in dynamical systems; see e.g., [32][33][34].
In this paper, we propose a conjugate gradient algorithm to solve the generalized Sylvester-transpose matrix Eq (1.5) in the consistent case, where all given coefficient matrices and the unknown matrix are rectangular (see Section 4). The algorithm aims to construct a sequence of approximate solutions of (1.5) from any given initial value. It turns out that a desire solution comes out in the final step of iterations with a satisfactory error (see Section 4). To validate the theory, we provide numerical experiments to show the applicability and the performance of the algorithm (see Section 5). In particular, the performance of the algorithm is significantly better than that of the direct Kronecker linearization and recent gradient-based iterative algorithms.

Preliminaries
In this section, we recall useful tools and facts from matrix analysis that are used in later discussions. Throughout, we denote the set of all m-by-n real matrices by R m×n .
Recall that the Kronecker product of A = [a i j ] ∈ R m×n and B ∈ R p×q is defined by Lemma 1 (see, e.g., [41]). The following properties hold for any compatible matrices A, B, C: The vector operator Vec(·) assigns to each matrix A = [a i j ] ∈ R m×n the column vector Vec A = [a 11 · · · a m1 · · · a 12 · · · a m2 · · · a 1n · · · a mn ] T ∈ R mn .
This operator is bijective, linear, and compatible with the usual matrix multiplication in the following sense.
Lemma 2 (see, e.g., [41]). For any A ∈ R m×n , B ∈ R p×q and X ∈ R n×p , we have Recall that the commutation matrix P(m, n) is a permutation matrix defined by where each E i j ∈ R m×n has entry 1 in (i, j)-th position and all other entries are 0.
Lemma 3 (see, e.g., [41]). For any A ∈ R m×n and B ∈ R p×q , we have Lemma 4 (see, e.g., [41]). For any matrix X ∈ R m×n , we have Lemma 5 (see, e.g., [41]). For any matrices A, B, X, Y of compatible dimensions, we have Recall that the Frobenius norm of A ∈ R m×n is defined by

The direct Kronecker linearization for a generalized Sylvester-transpose matrix equation
From now on, let m, n, p, q, r, s, k, ∈ N be such that mq = np. Consider the generalized Sylvestertranspose matrix Eq (1.5) where A i ∈ R m×n , B i ∈ R p×q , C j ∈ R m×p , D j ∈ R n×q , D ∈ R m×q , E ∈ R m×q are given matrices, and X ∈ R n×p is unknown. The Eq (1.5) includes the Lyapunov equation, the Sylvester equation, the equation AXB + CXD = E, and the equation AXB + CX T D = E as special cases.
A direct method to solve Eq (1.5) is to transform it to an equivalent linear system. For convenience, denote P = P(n, p). By taking the vector operator to (1.5) and utilizing Lemma 4, we get Thus, Eq (3.1) is equivalent to a linear algebraic system K x = b. Hence, Eq (3.1) is consistent if and only if the associated linear system is consistent (i.e., rank [K b] = rank K). When we solve for x, we can get the unknown matrix X using the injectivity of the vector operator. However, if the matrix coefficients A i , B i , C j , D j are of large sizes, then the size of K can be very large due to the Kronecker multiplication. Thus, traditional methods such as Gaussian elimination and LU factorization require a large memory to solve the linear system for an exact solution. Thus, the direct method is suitable for matrices of small sizes. For matrices of moderate/large sizes, it is enough to find a well-approximate solution for Eq (3.1) via an iterative procedure.

A conjugate gradient algorithm for consistent generalized Sylvester-transpose matrix equations
The main task is to find a well-approximate solution of the matrix Eq (1.5): Problem 4.1. Let m, n, p, q, r, s, k, ∈ N be such that mq = np. Consider the generalized Sylvester- given matrices, and X ∈ R n×p is unknown. Suppose that Eq (1.5) has a solution. Given an error > 0, findX ∈ R n×p such that We will solve Problem 4.1 under an additional assumption that K in (3.2) is symmetric. We propose the following algorithm:

Algorithm 1: A CG algorithm for a generalized Sylvester-transpose matrix equation
We call X k the approximate solution at the k-th step. The main computation means that the next approximation X k+1 is the sum between the current one X k and the search direction U k+1 with the step size R k 2 /α k+1 .
Remark 4.2. The stopping rule of Algorithm 1 is based on the size of the residual matrix R k . One can impose another stopping criterion besides R k , e.g., the norm of the difference X k+1 − X k between sucessive iterates is small enough. Remark 4.3. Let us discuss the complexity analysis for Algorithm 1. For convenience, suppose that all matrices in Eq (1.5) are of sizes n × n. Each step of the algorithm requires the matrix addition (O(n 2 )), the matrix multiplication (O(n 3 )), the matrix transposition (O(n 2 )), the matrix norm (O(n)), and the matrix trace (O(n)). In summary, the complexity analysis for each step is O(n 3 ), and thus the algorithm runtime complexity is cubic time.
Next, we will show that, for any given initial matrix X 0 , Algorithm 1 produces approximate solutions so that the set of residual matrices is an orthogonal set and, thus, we get the disire solution in a finite step. We divides the proof into several lemmas. Lemma 6. Consider Problem 4.1. Suppose that the sequence {R i } is generated by Algorithm 1. We have Proof. From Algorithm 1, we have that for any k, Proof. Applying Lemmas 1-5 and the symmetry of K, we have for any m, n. Proof. To prove this conclusion, we use induction principle. In order to compute related terms, we use Lemmas 6 and 7. For m = 1, we get These imply that (4.3) hold for m = 1.
In the inductive step, for m = k we assume that tr(R T k R k−1 ) = 0 and tr(U T k+1 V k ) = 0. Then Hence, Eq (4.3) holds for any m.  Proof. From Lemma 8 for m = 1, we get tr(R T 1 R 0 ) = 0 and tr(P T 2 Q 1 ) = 0. Now, suppose that Eq (4.4) is true for all m = 1, . . . , k. From Lemmas 6 and 7, for m = k + 1 we write Hence, Eq (4.4) holds for any m. Proof. By Lemma 7 and the fact that tr(R T m−1 R n−1 ) = tr(R T n−1 R m−1 ) for any m, n, it suffices to prove (4.5) for any m, n such that m > n. By Lemma 8, Eq (4.5) holds for m = n + 1. For m = n + 2, we have tr R T n+2 R n = tr Similarly, we can write tr(R T n+2 R n ) and tr(U T n+2 V n ) in terms of tr(R T n+1 R n−1 ) and tr(U T n+1 V n−1 ), respectively. Repeating this process until the terms tr(R T 2 R 0 ) and tr(U T 3 V 1 ) show up. By Lemma 9, we get tr(R T n+2 R n ) = 0 and tr(U T n+2 V n ) = 0. Next, for m = n + 3, we have Hence, we can write tr(R T n+3 R n ) and tr(U T n+3 V n ) in terms of tr(R T n+2 R n−1 ) and tr(U T n+2 V n−1 ), respectively. Repeating this process until the terms tr(R T 3 R 0 ) and tr(U 4 V 1 ) by Lemma 9, we get tr(R T n+3 R n ) = 0 and Hence, tr(R T m−1 R n−1 ) = 0 and tr(U T m V n ) = 0 for any m, n such that m n. Theorem 4.5. Consider Problem 4.1 under the assumption that the matrix K is symmetric. Suppose that the sequence {X i } is generated by Algorithm 1. Then for given initial matrix X 0 ∈ R n×p , an exact solution X can be obtained in at most np iteration steps.
Proof. Suppose that R i 0 for i = 0, 1, . . . , np − 1. Then we compute X np according to Algorithm 1. Assume that R np 0. By Theorem 4.4, the set {R 0 , R 1 , ..., R np } is orthogonal in R n×p . So, {R 0 , R 1 , ..., R np } is linearly independent. Since the dimension of R n×p is np, any linearly independent subset of R n×p must have at most np elements. So this is false because the set {R 0 , R 1 , ..., R np } has np + 1 elements. Thus, R np = 0, hence X np is a solution of the equation.

Numerical experiments with discussions
In this section, we report numerical results to illustrate the applicability and the effectiveness of Algorithm 1. All iterations have been carried out by MATLAB R2021a, on a macos (M1 chip 8C CPU/8C GPU/8GB/512GB). We perform the experiments for several generalized Sylvester-transpose matrix equations, and an interesting special case, namely, the Sylvester equation. We vary given coefficient matrices so that they are square/non-square sparse/dense matrices of moderate/large sizes. The dense matrices considered here are involved a matrix whose all entries are 1, which is denoted by ones. The identity matrix of size n × n is denoted by I n . For each experiment, we set the stopping rule to be R k where = 10 −3 . We discuss the performance of the algorithm through the norm of residual matrices, iteration number, and computational time (CT). The CT (in seconds) is measured by tic-toc function in MATLAB.
In the following three examples, we concern the applicability of Algorithm 1 as well as its performance comparing to the direct Kronecker linearization mentioned in Section 3.

Example 1. Consider a moderate-scaled generalized Sylvester-transpose equation
where all matrices are 50 × 50 tridiagonal matrices given by We run Algorithm 1 using an initial matrix X 0 = 0.25 × ones ∈ R 50×50 . According to Theorem 4.5, Algorithm 1 will produce a solution of the equation within 10 4 iterations. The resulting simulation illustrated in Figure 1 shows the norms of residual matrices R k at each iteration. Althouh the errors R k grow up and down during iterations, they generally climb down to zero. The algorithm takes 138 iterations to get a desire solution (so that R k 10 −3 ), which is significantly less than the theoretical one (10 4 iterations). For the computational time, Algorithm 1 spends totally 0.131079 seconds to get a desire solution, while the direct Kronecker linearization consuming 1.581769 seconds to obtain the exact soluton. Thus, the performace of Algorithm 1 is significantly better than the direct method. Moreover, for sparse coefficient matrices, Agorithm 1 can produce a desire solution in a fewer iterations (that is, 138 iterations) than the theoretical one (that is, 10 4 iterations in this case).

Example 2. Consider a generalized Sylvester-transpose matrix equation
with rectangular coefficient matrices of moderate-scaled as follows: Taking an initial matrix X 0 ∈ R 40×50 , we get an approximate solution X k ∈ R 40×50 with a satisfactory error R k 10 −3 in 164 steps, using 0.196250 seconds. We see in Figure 2 that during iterations, althouh the errors R k grow up and down, they generally climb down to zero. On the other hand, the direct Kronecker linearization consumes 0.811170 seconds to get an exact solution. Thus, Agorithm 1 is applicable and effective. Example 3. Consider a large-scaled generalized Sylvester-transpose equation where all matrices are 100 × 100 tridiagonal matrices given by The resulting simulation of Agorithm 1 using an initial matrix X 0 = 0.5 × ones ∈ R 100×100 is shown in the next figure.  Next, we investigate the effect of changing initial points. So we make experiments for the initial matrices X 0 = 5 × ones, X 0 = 0, and X 0 = −5 × ones. Table 1 shows that, no matter the initial point, we get a desire solution in around 2 seconds. On the other hand the direct method consumes around 70 seconds to get an exacy solution. Thus, Agorithm 1 significantly outperforms the direct mathod. In the rest of numerical examples, we compare the performance of Algorithm 1 to the direct method as well as recent gradient-based iterative algorithms mentioned in Introduction. In fact, this equation has a unique solution. Despite the direct method, we compare the performance of Algorithm 1 to GI [23] and AGBI [24] algorithms. All iterative algorithms are implemented using the initial X 0 = −0.001 × I 100 ∈ R 100×100 . According to [23], the GI algorithm is applicable as long as a convergent factor µ satisfies where λ max (AA T ) is the largest eigenvalue of AA T . We run GI algorithm under 3 different convergent factors, namely, m1 = 6.1728 × 10 −12 , m2 = 8.8183 × 10 −12 and m3 = 3.0864 × 10 −11 . We implement AGBI algorithm with a convergent factor 0.000988 and a weighted factor 10 −8 . Figure 4 shows that the CG algorithm (Algorithm 1) converges faster than GI with m1, GI with m2, GI with m3, and AGBI algorithms. Table 2 shows that, in 30 iterations, GI, AGBI and the direct method consume a big amount of time to get the exact solution, while Algorithm 1 produces a small-error solution in a small time (0.073613 seconds).
In fact, this matrix equation has a solution, which is not unique. We will seek for a solution of the equation using Algorithm 1, GI and AGBI algorithms with the same initial matrix X 0 = −0.4 × ones ∈ R 100×100 .  We carry out GI algorithm with three different convergent factors, namely, m1 = 1.7132 × 10 −8 , m2 = 3.0837×10 −8 and m3 = 1.5418×10 −7 . We implement AGBI algorithm with the convergent factor 0.000112 and the weighted factor 0.00005. Table 3 and Figure 5 express the computational time and the errors for 200 iterations of the simulations. We see that the computational time of CG algorithm is slightly less than those of GI (with parameters m1, m2, m3) and AGBI algorithms. However, the outcoming error produced by CG algorithm is significantly less than those of other algorithms.   We compare the performance of CG algorithm (Algorithm 1) to GI [23], RGI [16], MGI [17] and AGBI [24] algorithms with parameters as shown in Table 4. We implement the algorithms with the same initial matrix X 0 = −5 × ones ∈ R 100×100 . Table 4 shows that the computational times for implementing 30 iterations of CG and other algorithms are close together. However, the relative errors in Figure 6 and Table 4 express that CG algorithm produces a sequence of well-approximate solutions in a few iterations with the lowest error comparing to other GI algorithms.

Conclusions
We propose an iterative procedure (Algorithm 1) to construct a sequence of approximate solutions for the generalized Sylvester-transpose matrix Eq (1.5) with rectangular coefficient matrices. The algorithm is applicable whenever the matrix K, defined by Eq (3.2), is symmetric. In fact, the residual matrices R k , produced by the algorithm, form an orthogonal set with respect to the usual inner product for matrices. Thus, we obtain the desire solution within a finite step, says, np steps. Numerical simulations have verified the applicability of the algorithm for square/non-square sparse/dense matrices of moderate/large sizes. The algorithm is always applicable no matter how we choose an initial matrix. Moreover, for sparse coefficient matrices of large size, the iteration number to get a desire solution can be dramatically less than np iterations. The performance of the algorithm is significantly better than the direct Kronecker linearization and recent gradient-based iterative algorithms when the matrix coefficients are of moderate/large sizes.