Compress-and-restart block Krylov subspace methods for Sylvester matrix equations

Block Krylov subspace methods (KSMs) comprise building blocks in many state-of-the-art solvers for large-scale matrix equations as they arise, e.g., from the discretization of partial differential equations. While extended and rational block Krylov subspace methods provide a major reduction in iteration counts over polynomial block KSMs, they also require reliable solvers for the coefficient matrices, and these solvers are often iterative methods themselves. It is not hard to devise scenarios in which the available memory, and consequently the dimension of the Krylov subspace, is limited. In such scenarios for linear systems and eigenvalue problems, restarting is a well explored technique for mitigating memory constraints. In this work, such restarting techniques are applied to polynomial KSMs for matrix equations with a compression step to control the growing rank of the residual. An error analysis is also performed, leading to heuristics for dynamically adjusting the basis size in each restart cycle. A panel of numerical experiments demonstrates the effectiveness of the new method with respect to extended block KSMs.

1. Introduction. This work is concerned with numerical methods for solving large-scale Sylvester matrix equations of the form (1.1) AX + XB + CD * = 0, with coefficient matrices A, B ∈ C n×n and C, D ∈ C n×s . Although it is essential that both A, B are square, it is not essential that they are of the same size. The latter assumption has been made to simplify the exposition; our developments easily extend to square coefficients A, B of different sizes. Throughout this paper, we will assume s ≪ n and view C, D as block vectors. This not only implies that the right-hand side CD * has low rank, but it also implies that, under suitable additional conditions on the coefficients, such as the separability of the spectra of A and −B, the desired solution X admits accurate low-rank approximations; see, e.g., [3,23,43].
The Sylvester equation (1.1) arises in a variety of applications. In particular, model reduction [2] and robust/optimal control [53] for control problems governed by discretized partial differential equations (PDEs) give rise to Sylvester equations that feature very large and sparse coefficients A, B. Also, discretized PDEs themselves can sometimes be cast in the form (1.1) under suitable separability and smoothness assumptions [40]. Another source of applications are linearizations of nonlinear problems, such as the algebraic Riccati equation [11], and dynamic stochastic general large-scale methods for Sylvester equations that require the solution of shifted linear systems and could be combined with iterative solvers in an inner-outer fashion include the approximate power iteration from [29] and the restarted parameterized Arnoldi method in [1].
Besides the inner-outer solvers discussed above, only a few attempts have been made to design mechanisms to limit the memory requirements of Krylov subspace methods for Sylvester equations. In [52], a connection to implicitly restarted Arnoldi method for eigenvalue problems is made. For symmetric (positive definite) A, B, a two-pass SKSM, such as the one discussed in [34], could be used. During the first pass, only the projected equation is constructed and solved; in the second pass, the method computes the product of the Krylov subspace bases with the low-rank factors of the projected solution.
The rest of this paper is structured as follows. In Section 2, we describe the compress-and-restart method proposed in this work for Sylvester equations as well as its adaptation to the special case of Lyapunov matrix equations. Section 3 provides an error analysis for the approximate solution obtained with our method. Finally, in Section 4, the performance of our method is demonstrated and compared to combinations of EKSM with iterative linear solvers.
2. Restarted SKSM with compression for linear matrix equations. We first recall the general framework of projection methods for Sylvester equations and then explain our new restarted variant.

Projection methods and SKSM.
Projection methods for solving the Sylvester equation (1.1) seek an approximate solution of the form where the columns of U m and V m form orthonormal bases of suitably chosen (lowdimensional) subspaces. The small core factor Y is determined by, e.g., imposing a Galerkin condition on the residual matrix R = A X + XB + CD * ; see, e.g., [48]. In SKSM, the subspaces determining (2.1) are block Krylov subspaces. More specifically, U m = [U 1 | · · · |U m ], V m = [V 1 | · · · |V m ] ∈ R n×ms ; U i , V i ∈ R n×s ; and i = 1, . . . , m, are such that range(U m ) = K m (A, C) = colspan{C, AC, A 2 C, . . . , A m−1 C} ⊂ C n and, analogously, range(V m ) = K m (B * , D). 2 If the block Arnoldi process is used for obtaining these bases, then the following block Arnoldi relations hold: with ms × ms block Hessenberg matrices H m and G m ; s × s matrices H m+1,m and G m+1,m ; and E * m = [0 s | · · · |0 s |I s ] = e * m ⊗ I s , where 0 s and I s denote the s × s zero and identity matrices, respectively. We refer to, e.g., [22,24] for more details on block Krylov subspaces. The matrix Y is obtained by solving the projected equation one column at a time. 2 Note that in the block Krylov subspace literature, often a distinction is made between the "true" block Krylov subspace with elements in C n×s and the span of all the columns of the basis vectors [22,24]. For the sake of simplifying notation, Km(A, C) and Km(B * , D) will always denote the latter here, meaning they are subspaces of C n . This is again a Sylvester equation of the form (1.1) and, since m should be much smaller than n, the equation is of moderate size. Therefore a direct solver, such as the Bartels-Stewart method [5], can be employed for its solution.
Throughout the above discussion, we assumed that none of the block basis vectors U i or V i , i = 1, . . . , m, is rank-deficient. Numerical rank-deficiency is rare in practice, but so-called "inexact" rank-deficiency may benefit from a deflation or columnreplacement procedure; see, e.g., [13,51].
Having the solution Y of (2.3) at hand, we can compute the Frobenius norm of the residual matrix at a low cost as explained in [33,48]: Similarly, for the spectral norm we have Once the residual norm is sufficiently small, the approximation X from (2.1) is returned. Note that X is a large, dense matrix, which is always kept in factored form. Specifically, given a factorization Y = Y L Y * R , which can be computed, e.g., via a truncated singular value decomposition (SVD), we store the low-rank factors where L and R here simply denote "left" and "right," respectively.

Restarts and compression.
We now present the new restarted procedure. Suppose that m 0 iterations of SKSM have been performed, leading to an approximate, possibly quite inaccurate solution For a linear system, a correction to a given approximate solution is obtained by solving the linear system (approximately) with the right-hand side replaced by the residual; see, e.g., [28]. Applying this principle to (1.1), one first solves R * B + CD * , and then adds the correction Z (1) to X (0) in order to obtain X (1) . The Arnoldi relations (2.2) imply that the residual matrix R (0) admits the following low-rank representation: Because C (1) , D (1) have 2s columns, this shows that R (0) has rank at most 2s and, in turn, we can again employ a block Krylov subspace method for solving (2.6). Note, however, that the Krylov subspaces are different from the ones used for X (0) . In order to distinguish them more clearly, we will from now on add the superscript (0) to quantities generated by SKSM for constructing X (0) ; that is, we will write U (0) m0 , V (0) m0 instead of U m0 , V m0 . SKSM applied to the correction equation (2.6) constructs orthonormal bases U (1) m1 and V (1) m1 for K m1 (A, C (1) ) and K m1 (B * , D (1) ), respectively. The correction is again returned in factorized form as R . The procedure can be iterated; i.e., we can perform multiple restarts. After k restarts, the approximate solution is given by The residual for X (k) is equal to the one provided by the last term Z (k) in the summation. Indeed, This relationship can be exploited to design efficient convergence checks within the restarted procedure; see also the discussion in Section 3. See [7] for a similar procedure within the squared Smith method for certain large-scale Smith equations.
An evident shortcoming of the described procedure is that every time we restart, the rank of the residual may double, thus leading to increasingly large Krylov bases that will inevitably exceed memory capacity. This issue can be mitigated by compressing the residual factors before constructing the Krylov subspace in each cycle. For this task, a well-known QR-SVD-based technique can be employed; see, e.g., [35,Section 2.2.1]. We report such a scheme in Algorithm 2.1 for completeness. Note that there is some flexibility in the choice of truncation criterion in line 5.
In this work, we consider two possibilities. Truncating all singular values below a chosen tolerance δ > 0 implies that the spectral norm truncation error is bounded by δ. Alternatively, the Frobenius norm of the error is bounded by δ if we make sure that the Euclidean norm of the truncated singular value remains below δ.
The described compression is not only applied to the residual but also each time when the approximate solution (2.7) is updated. The impact of these compressions on the quality of the computed solution is discussed in Section 3.
Another measure to make sure that memory consumption stays moderate is to impose a maximum number of iterations m k in each cycle of SKSM. As the memory requirements are primarily dictated by the number of basis vectors that need to be stored in U (k) m k and V (k) m k , we set where mem max is the user-defined, maximum number of basis vectors that can be allocated at once, and s k is the number of columns of the residual factors C (k) , D (k) after truncation. Compute economy-size QR decomposition C = Q C R C

3:
Compute economy-size QR decomposition D = Q D R D

5:
Truncate U ΣV * ≈ U Σ V * up to δ 6: The whole procedure, which combines restarting with compression, is reported in Algorithm 2.2. Note that our pseudocode is not written to optimize memory allocation overall. For best performance, one should a priori estimate the storage needed for the final solution components X L and X R and take this into account along with mem max .

The Lyapunov equation. The Lyapunov equation
is an important special case of the more general Sylvester equation (1.1). Indeed, in control and system theory [2], it is more common to find (2.8) than the more general case. In principle, Algorithm 2.2 could be directly applied to solve (2.8). However, as we will see in this section, it is possible to enhance the algorithm by taking into account the specific structure of (2.8).
Thanks to the symmetry of (2.8), general projection techniques for Lyapunov equations generate Hermitian approximations X = U m Y U * m , with range(U m ) = K m (A, C) and where the symmetric matrix Y is computed by solving the projected Lyapunov equation Employing only one approximation space is quite appealing; it potentially permits skipping the construction of the second Arnoldi relation at line 7 of Algorithm 2.2. However, some additional considerations are needed after the first cycle because the residual matrix R (k) becomes in general indefinite for k ≥ 1. To address this, the residual is expressed in LDLT form. Following the discussion in the previous section, at the kth restart the residual matrix R (k) = AZ (k) + Z (k) A * + R (k−1) can be written as for j = 1, . . . , m k do 6: Compute (incrementally) the Arnoldi relation for K j (A, C (k) ) and store U Compute (incrementally) the Arnoldi relation for K j (B, D (k) ) and store V Compute Y (k) as the solution of the projected equation Factor if flag conv then 19: return X L and X R 20: end for 25: end procedure In turn, the subspace K m (A, C (k+1) ) can be used as an approximation space for the subsequent restart. The presence of the matrix 0 I I 0 only affects the definition of the projected equation solved at line 8, which now takes the form This equation can again be solved with the Bartels-Stewart method or with Hammarling's method [26] after splitting the right-hand side as discussed, e.g., in [6,Sec. 2.3].
In both cases, the Hermitian structure of Y is preserved and can be exploited.
To avoid the occurrence of complex arithmetic, an LDLT approach must be employed also during the truncation strategy, and we suggest to call Algorithm 2.3 in place of Algorithm 2.1 at lines 17 and 23 of Algorithm 2.2. See, e.g., [37] for similar considerations in the context of differential Riccati equations. Applying the described modifications to Algorithm 2.2 returns an approximate solution of the form Compute Compute economy-size QR decomposition C = Q C R C

4:
Truncate W ΣW * ≈ W Σ W * up to δ 5: return C := Q C W and S := Σ 6: end procedure The final Lyapunov algorithm is stated as Algorithm 2.4.
Compute Y (k) as the solution of the projected equation Compute the residual norm R Factor

14:
Update [X L , S] ← Compress Sym(X L , diag(I, S), δ) 16: if flag conv then 17: return X L and S 18:  [50]. It is desirable to retain this property in an approximate solution. In the context of projection methods for Lyapunov equations, this property is ensured when A is negative definite; i.e., not only A but also its Hermitian part (A + A * )/2 is stable. In particular, in SKSM it would follow that the matrix H m = U * m AU m is stable for every m and, therefore, that the solution Y of the projected equation H m Y + Y H * m + (U * m C) U * m C * = 0, as well as the corresponding approximation X = U m Y U * m , are SPSD. In our framework, the same arguments show that X (0) is SPSD if A is negative definite. However, the subsequent Z (k) , 0 < k ≤ k max , are in general indefinite (although still symmetric), due to the indefiniteness of the residual matrices R (k) . Nevertheless, it is reasonable to expect the approximate solution X (k) = k k=0 Z (k) , 0 <k ≤ k max , to be close to a positive semidefinite matrix. In particular, if is the eigendecomposition of X (k) , partitioned according to the sign of the eigenvalues, then one can consider X (k) + := U + Λ + U * + as an SPSD approximation to the solution. In practice, X (k) + is obtained by applying a slight modification of Algorithm 2.3, which neglects the part of the eigendecomposition corresponding to the negative eigenvalues, to the matrix X (k) returned by Algorithm 2.2. Such a step might deteriorate the accuracy of the computed solution. However, the next result shows that the error and the residual norm associated with X (k) + would be close to the ones associated with X (k) .
where · corresponds to the Frobenius or the spectral norm.
both for the Frobenius and the spectral norm [25,27]. Therefore, where the last inequality follows from (2.14) by taking into account that X is SPSD. For the second inequality, applying again (2.14) yields 3. Residual and error analysis of Algorithm 2.2. The compression steps performed on the intermediate residuals and solutions introduce some inexactness. In the spirit of the analysis of inexact Krylov methods (as in, e.g., [36,49]), we study how these compression steps affect the residual and error norms associated with the approximation returned by Algorithm 2.2. The relations retrieved in this section hold for both the Frobenius norm and the spectral norm.
Let us suppose that Algorithm 2.2 terminates afterk < k max restarts, so that R (k) < ε. Notice that the returned approximation X (k) satisfies where the matrices ∆Z (j) represent the components removed by the compression step at line 17. In particular, it holds that ∆Z (j) ≤ δ for j = 0, . . . ,k.
Similarly, the sequence of residuals verifies where ∆R (j) takes into account the effect of the compression step at line 23; i.e., ∆R (j) ≤ δ, j = 0, . . . ,k. Summing up (3.1) for j = 0, . . . ,k, we retrieve Therefore, the residual norm associated with X (k) is bounded by Equation (3.2) shows how the truncation tolerance δ is connected with the final attainable accuracy from our restarted routine. Indeed, a reasonable way to choose δ in Algorithm 2.2 is such that (k max + 1)( A + B + 1)δ ≤ ε. We conclude by estimating the distance from the true solution X. To this end, we remark that the difference X (k) + k j=0 ∆Z (j) − X exactly solves the Sylvester equation The impact of perturbing the right-hand side on the solution can be estimated via the norm of the inverse of the solution operator. This is particularly simple, when A, B are normal and the spectra of A, −B are separated by a vertical line in the complex plane.

Proof. By applying Theorem 1.1 in [30] to the equation (3.3) we get
The claim follows by using the triangular inequality and the bound k j=0 ∆Z (j) ≤ (k + 1)δ.

Numerical experiments.
To demonstrate the efficiency of our proposed methods, we examine their behavior with respect to other viable methods on two standard problems where the coefficient matrices stem from the discretization of certain differential operators. These other methods include the Extended Krylov Subspace Method (EKSM) (detailed, e.g., in [14,47] but implemented like a Rational KSM with poles at zero and infinity) and the Standard Krylov Subspace Method (SKSM) of [41], which is tailored to equations with symmetric coefficient matrices.
EKSM can be implemented "exactly" in the sense that linear systems with A are solved at very high accuracy via, e.g., a direct sparse solver like the Matlab backslash operator; or "inexactly," in which inversions of A are computed approximately via a block Krylov subspace method. We test both of these variants of EKSM. Instead of using backslash explicitly per call to A −1 , we precompute and store the Cholesky or LU factorization. In the particular case of iterative solves by A, we use a retooled (and possibly ILU preconditioned) block conjugate gradients (BCG) method from [19] when A is Hermitian positive definite, and (possibly ILU preconditioned) block GMRES otherwise. We employ a rather small tolerance on the relative residual norm when BCG and block GMRES are applied to solve the linear systems with A. In particular, we set this tolerance to 10 −8 , namely two orders of magnitude smaller that the outer tolerance on the relative residual norm. However, the novel results about inexact procedures in the basis construction of extended (and rational) Krylov subspaces presented in [36] may be adopted to further reduce the computational cost of the basis construction. We also remind the reader that at each EKSM iteration 2s new basis vectors are added to the current basis so that, at the mth EKSM iteration, the computed extended Krylov subspace has dimension 2(m + 1)s.
We would like to underscore that the comparisons between our compress-andrestart scheme, the inexact variants of EKSM, and SKSM are the fairest. Indeed, in these families of methods, only the action of A on (block) vectors is allowed, making all these solvers potentially matrix-free. This is not the case for EKSM with a direct inner solver.
All methods make use of block Krylov subspace techniques and, consequently, sparse-matrix-matrix multiplication (SpMM) between A and block vectors V that could vary in size. Since the performance of such block operations depends on machine architecture (in particular, the level 3 cache [20]), memory hierarchies, and choice of libraries, we measure the performance of each method via the number calls to A ("A-calls") and the total number of columns or column vectors to which A is applied, which we refer here to as matvecs. 3 The number of A-calls is a crude measure of memory operations, whereas the number of matvecs is directly related to the amount of floating-point operations (FLOPs). The ratio between FLOPs and memory operations is known as computational intensity, and algorithms with high computational intensity, i.e., many FLOPs to memory operations, are preferable for high-performance architectures; see, e.g., [4]. We approximate computational intensity by looking at the ratio between A-calls and matvecs, which we here refer to as "efficiency." For a more in-depth analysis of the performance of block operations and potential gains over column-by-column applications of A, see, e.g., the thesis by Birk [13].
Unless otherwise noted, all reported residual norms are relative and measured in the Frobenius norm. For all experiments, the residual tolerance is set to 10 −6 .
All results were obtained by running Matlab R2017b [39] on a standard node of the Linux cluster Mechthild hosted at the Max Planck Institute for Dynamics of Complex Technical Systems in Magdeburg, Germany. 4 The full code to reproduce our numerical results can be found https://gitlab. com/katlund/compress-and-restart-KSM. For the original version of SKSM, which we have adapted, see https://zenodo.org/record/3252320#.XjM3UN-YVuR.

2D Laplacian.
In this example, A is the second-order, centered, finitedifference discretization of the two-dimensional Laplacian operator −(∂ xx + ∂ yy ) on the unit square unit cube Ω = (0, 1) 2 . We take n = 100 grid points in each direction, resulting in a matrix of size 10, 000 × 10, 000. The matrix A is Hermitian positive definite. We solve AX + XA + CC * = 0, where C ∈ C n 2 ×3 is drawn from a normal random distribution, and it is such that CC * F = 1. In this example we call Restart Lyap (Algorithm 2.4) and equipped with the enhancements described in section 2.3.
Both the exact and inexact variants of EKSM need 15 iterations to meet the prescribed accuracy. As a result, a low-dimensional extended Krylov subspace of dimension 96 is constructed. We mimic such a feature by setting the memory buffer of the compress-and-restart procedure, i.e., mem max , equal to 96. We report the results in Table 4  Our routine Restarted Lyap needs 20 restarts to converge for a total number of 158 iterations. The compress-and-restart scheme lets us maintain a low storage demand and, at each restart, a polynomial Krylov subspace of dimension (at most) 96 is constructed, as in the case of the comparable EKSM. In SKSM with two-pass Lanczos, thanks to the symmetry of A, we can use short-term recurrences so that the whole basis is never stored, only three basis vectors at a time. The low-rank factors of the solution are recovered by means of a two-pass strategy; see [41] for more details. Despite the very low storage demands of SKSM, the number of A-calls exceeds that of the compress-and-restart scheme. Both the numbers of A-calls and matvecs of Restarted Lyap are much lower than those accrued by the inexact procedures EKSM (BCG) and EKSM (BCG+ILU). Moreover, our procedure is more efficient for block operations while maintaining a computational time comparable to that of the other SpMM-dominant routines, thus reinforcing its potential for further speed-ups in communication-dominant highperformance environments. Timings for EKSM (exact) largely benefit from having precomputed and stored the Cholesky factors of A.
All the routines we tested return a low-rank numerical solution and, for this example, the approximation computed by Restarted Lyap has the lowest rank. In Figure 4.1 (left) we report the ranks of both the residual and the approximate solution computed at the end of each restart, together with the rank of the final solution. These results illustrate that the truncation strategy Compress Sym (Algorithm 2.3) is able to maintain a moderate rank in the residual R (k) , for all k = 0, . . . , 20. This is crucial for making the construction of the subsequent restart space feasible, when necessary. In Figure 4.1 (right) we also plot, in logarithmic scale, the relative residual norm history through all the 158 iterations performed by Restarted Lyap. We can see that the relative residual norm does not have a smooth behavior. This is due to the a Galerkin condition we impose on the residual. Even though this phenomenon has not been extensively analyzed in the matrix equation literature yet, it is quite well understood in the linear system setting. See, e.g., [16,15]. Imposing a minimal residual condition in place of a Galerkin condition might be beneficial, although such a strategy has some peculiar shortcomings in the matrix equation framework. See, e.g., [38,31,42].
We now illustrate some observations about the possible computation of an indefinite approximate solution. See also the end of section 2.3. In Figure 4.2 (left) the blue circles denote the minimum nonzero eigenvalue of X (k) for k = 0, . . . , 20, and of the final solution. The black dashed line marks the x-axis. We can appreciate how these eigenvalues are all positive so that X (k) is positive semidefinite for all k.
We now consider the same Lyapunov equation as before but we do not normalize the random matrix CC * . This is now a harder problem for the polynomial Krylov subspace method and, with mem max = 96, Restarted Lyap needs 60 restarts to converge.The large number of restarts is due to the large rank of R (k) , especially for large k, which limits us to a couple of iterations per restart in order to stay within the memory buffer prescribed by mem max . In Figure 4.2 (right) we report the minimum nonzero eigenvalue of X (k) for k = 0, . . . , 60. The matrix X (k) stops being positive semidefinite for k ≥ 39, and we thus apply the strategy presented at the end of section 2.3 to compute X (k) + , k = 60, in place of X (k) .For this example, such a strategy has multiple advantages. Indeed, in addition to providing a positive semidefinite approximate solution, it reduces the rank of the computed solution while maintaining the prescribed accuracy. In particular, rank(X (k) ) = 81 while rank X (k) + = 77 and AX (k) .84 × 10 −7 . We conclude this example by showing one of the most remarkable features of our novel compress-and-restart strategy. The total memory demand of a Krylov subspace method is in general difficult to predict a priori, as it requires knowing the dimension of the subspace in which a satisfactory approximation can be found. If a situation calls for stringent memory management, then methods that increase the basis size every iteration, like EKSM, may not be able to reach the desired accuracy before exhausting memory resources. Table 4.2 demonstrates the superiority of Restarted Lyap in precisely such a scenario, where mem max = 250 and the right-hand side CC * , C ∈ R n 2 ×s , s = 25, is a low-rank approximation of the matrix C ∈ R n 2 ×n 2 representing the discretization of exp((x p 1 + x p 2 + x p 3 + x p 4 ) 1/p ), p = 2, on the hypercube [−1, 1] 4 . All the EKSM variants are forced to stop as soon as a space of dimension 250 is constructed; in tests not reported here, we found that mem max = 400 allows EKSM to reach the desired residual tolerance. SKSM with two-pass Lanczos seems to suffer from the large rank of C. Indeed, the computed residual norm differs from the actual one by some orders of magnitude, likely due to the loss of orthogonality in the computed basis. A full, or perhaps even partial, re-orthogonalization of the basis  may fix this issue but doing so in a memory-sensitive manner remains open. On the other hand, Restarted Lyap successfully reaches the desired residual tolerance, thus demonstrating its potential in not only memory-limited situations but also for matrix equations whose right-hand side has high rank.

Convection-diffusion equation.
We turn our attention to the main problem, a Sylvester equation of the form (1.1), where the coefficient matrices A and B stem from the second-order, centered, finite-difference discretization of the 3D convection-diffusion operators on the unit cube Ω = (0, 1) 3 , respectively. The viscosity parameter is ε = 0.01 while the convection vectors w A , w B are defined as w A = (x sin(x), y cos(y), e z 2 −1 ), w B = (yz(1 − x 2 ), 0, e z ).
If the operators in (4.1) are discretized with n equidistant nodes in each direction, then the nonsymmetric matrices A and B are each of dimension n 3 . We consider two different problem sizes for (1.1), n = 25 and n = 80. The resulting problems are of dimension 15625 and 512000, respectively. The low-rank matrices C, D ∈ C n 3 ×3 are once again drawn from a normal random distribution, and they are such that CD * F = 1. For n = 25, all the EKSM variants we tested-EKSM (exact), EKSM (BGM-RES), and EKSM (BGMRES+ILU)-need 21 iterations to converge, in which case, two extended Krylov subspaces of dimension 132 are constructed. As before, we set the memory buffer of our compress-and-restart routine equal to the memory consumption of EKSM; i.e., we set mem max in Algorithm 2.2 equal to 264. With this setting, Restarted Sylv needs 2 restarts for a total of 85 iterations to converge, and it computes an approximate solution of rank 57, equal to the rank of the solution returned by all the EKSM variants.
In Figure 4.3 (left) we depict the ranks of both the residual and of the approximate solution computed at each Restarted Sylv restart, together with the rank of the final solution, while in Figure 4.3 (right), the entire relative residual norm history is reported. We again see that the low-rank compression procedure (Algorithm 2.1) is able to maintain a moderate rank in both the residual and the approximate solution.
We remind the reader that two different subspaces have to be generated, one by   the matrices A and B have dissimilar spectral properties. This can be appreciated by looking at the results in Table 4.3. In this example, iteratively solving linear systems with B requires more iterations than with A. Therefore, a larger number of Bcalls is required in EKSM (BGMRES), even though the extended Krylov subspaces generated by A and B have the same dimensions. This phenomenon is reversed in EKSM (BGMRES+ILU), indicating that the ILU preconditioner for B performs better than the one for A. Our compress-and-restart procedure is not influenced by such issues, though, since by design, the spaces for A and B are computed to the same dimension (assuming no breakdowns). It is indeed possible, and in some cases desirable, to allow for different basis sizes for A and B, especially if the operators differ significantly in size or complexity. However, this flexibility is not trivial to implement, especially with the compression at step 23 of Algorithm 2.2, which could cause C (k+1) and D (k+1) to have different numbers of columns. For the simplicity of presentation and implementation, we therefore do not explore this option further in the present work.
The extra memory allocation required by the iterative solution of linear systems with A and B during the basis construction must be taken into account for precisely identifying the memory demands of EKSM (BGMRES) and EKSM (BGMRES+ILU). To the best of our knowledge, such an issue has not yet been rigorously explored in the literature, and it should not be underestimated. Increasing the density of the discretization grid to n = 80 (for a problem size of 512000), leads to 26 iterations for EKSM (exact) to converge and the construction of two extended Krylov subspaces of dimension 162 each. With mem max = 324, EKSM (BGMRES) and EKSM (BGM-RES+ILU) are not able to achieve the prescribed accuracy, because at some point, the number of (outer) extended Krylov basis vectors already computed plus the number of vectors needed by the (inner) block polynomial Krylov subspace to accurately compute the next (outer) basis vector exceeds mem max .
We conclude this example by pointing out that Restarted Sylv turns out to be competitive also in terms of computational time for both n = 25 and n = 80; see Table 4.4. Indeed, a preconditioner tailored to the problem may benefit EKSM with inexact solves, but the design of an effective preconditioner is a difficult task and largely problem-dependent. Our compress-and-restart procedure achieves excellent performance without the need for preconditioning while still managing severe memory limitations.

Conclusions.
Modern computing architectures pose many challenges demanding the optimization of not only operation counts (FLOPs) but also memory allocation and movement. Much work has been devoted to adapting iterative methods for large and sparse linear systems to these architectures, but straightforward extensions of successful strategies for linear systems to matrix equations are not always feasible. We have demonstrated how to apply a common and effective Krylov subspace technique for linear systems, namely restarts, by introducing a compression step to mitigate the growing rank of the residual. The resulting compress-and-restart method is viable for both Sylvester and Lyapunov matrix equations, and given a fixed memory requirement, it tunes the computable basis size automatically at each restart. Compared to extended Krylov subspace methods (EKSM), which require an inner solver for applications of A −1 and therefore either well-designed preconditioners or fast sparse Cholesky and LU factorizations, our compress-and-restart polynomial Krylov methods require very little set-up and converge with competitive timings in situations where (unrestarted) EKSM run out of memory.
Our compress-and-restart method is a success not only for matrix equations but potentially other classes of higher-order problems where memory resources are even more limited, due to the curse of dimensionality. Moreover, our compress-and-restart paradigm is not restricted to the use of block polynomial Krylov subspaces; different approximation spaces can be used as well.