Deflated GMRES with Multigrid for Lattice QCD

Lattice QCD solvers encounter critical slowing down for fine lattice spacings and small quark mass. Traditional matrix eigenvalue deflation is one approach to mitigating this problem. However, to improve scaling we study the effects of deflating on the coarse grid in a hierarchy of three grids for adaptive mutigrid applications of the two dimensional Schwinger model. We compare deflation at the fine and coarse levels with other non deflated methods. We find the inclusion of a partial solve on the intermediate grid allows for a low tolerance deflated solve on the coarse grid. We find very good scaling in lattice size near critical mass when we deflate at the coarse level using the GMRES-DR and GMRES-Proj algorithms.

and fermionic forces in Hybrid Monte Carlo. Moreover, as the fermion mass approaches physically relevant values, the Dirac operator becomes extremely ill conditioned. This ill conditioning leads to exceptional eigenvalues, which drastically slow convergence of linear equations. Adaptive multigrid (MG) [1] is one method that deals with both the strong scaling and critical slowing down at the same time, and has been used successfully for the Wilson, overlap and staggered fermion discretizations [2,3,4]. Adaptive MG creates a hierarchy of coarsened operators from the original fine Dirac operator by exploiting its near null kernel. This shifts critical slowing down to the coarsest level, where the components of the error attributed to the exceptional eigenvalues can be more easily dealt with. However, the cost of the coarse grid solve can be very large when cast in terms of fine grid equivalence.
Deflation has long been used as a method of dealing with exceptional eigenvalues in many fields, but is not yet heavily used in modern LQCD simulations, partly because of eigenvector storage costs for large systems. Adaptive MG allows for deflation to be employed on the coarsest level, where storage requirements of deflation are much smaller [5,6,7]. The preferred method of MG in LQCD is to use it as a preconditioner for an outer Krylov solver [8]. Because every iteration of the outer Krylov solver represents a new right hand side for the MG preconditioner, deflation with projection methods [9,10] can be efficiently employed on the coarsest level. We demonstrate the effect that deflation on the coarsest level has by comparing to MG without coarse grid deflation, and the effect that this deflation has for multiple right hand sides. We observe that a partial solve on the intermediate grid in conjunction with deflation and projection methods on the coarse grid allows for a partial coarse grid solve. This partial solve on the intermediate grid reduces the number of outer iterations for convergence, and we observe no sign of critical slowing down resurgence on the higher grid levels with the use of a deflated partial coarse grid solve.

Methods
We work with the Wilson-Dirac operator in the two-dimensional lattice Schwinger Model [11], which shares many physical characteristics with 4D LQCD, and as such is a good algorithmic testing ground. We created 10 gauge configurations within QCDLAB 1.0 [12] for lattices of size 64 2 , 128 2 and 256 2 at β = 6.0. All values are averaged over seperate solves for each configuration.The method of coarsening follows that of reference [4]. A hierarchy of three grids was created by solving the residual system DD † e = −DD † x, where x is a random vector, for 12 near null vectors on the fine grids. This system was solved to a tolerance of 10 −4 , and the near null vectors were constructed using ψ = x + e.
The near null vectors are globally orthonormalized, then subsequently chirally doubled using the projectors 1 2 (1 ± σ 3 ). They are then locally blocked and locally orthonormalized using a 4 2 grid within the lattice to form the columns of the prologantor matrix, P . The intermediate grid operatorD, is then formed viaD = P † DP . This process is repeated to form the coarse grid operator.
As an outer solver, we use FGMRES(8) [13], and two iterations of GMRES [14] as a pre and post smoother on the fine and intermediate levels. For our deflated solve on the coarse grid, we solve to a tolerance of 10 −8 for the first outer iteration, followed by a projected solve to a tolerance of 10 −2 for subsequent outer iterations. It was observed that relaxing the tolerance for the non deflated solve on the coarse grid in the same fashion resulted in a large increase in outer iterations of FGMRES, so a solve to a tolerance of 10 −8 was performed. We remark that the inclusion of a W-cycle, where the coarse grid is visited twice for every outer iteration, as is performed in reference [2], may have ameliorated this problem, albeit at the price of increased coarse and intermediate matrix vector products. Since we aim to reduce the overall cost of the full solve, this method was avoided.

Results
An indication that critical slowing down has been relayed to the coarsest level is a constant number of fine operator applications for increasing lattice volume as the mass gap approaches zero.
where n f ine , n int , and n coarse are the size of the Dirac operator for the fine, intermediate and coarse grids, respectively. Figure 2 shows a comparison of CGNE and GMRES-DR to non deflated and deflated MG preconditioned FGMRES.
When evaluated in terms of fine equivalent Mvps, non deflated MG is nearly as expensive as CG on the normal equations. However, performing a deflated solve on the coarse grid drastically reduces the number of fine equivalent Mvps.
It outperforms MG without deflation, and is more effective than pure deflation on the finest grid.
We also observe that coarse grid deflation is more effective than traditional deflation on the fine grid for multiple right hand sides. Figure

Conclusions
Multigrid is an extremely effective algorithm to transfer critical slowing down to coarser operators, where it can be dealt with more efficiently. We have shown that the cost of a full solve on the coarse grid can be very large, but    can be significantly reduced by a deflated and projected low tolerance solve.
This method of deflation is more effective than deflation on the fine grid alone, without the increased storage costs associated with deflation. We also observe a characteristic synergy between MG and coarse grid deflation over multiple right hand sides that is not achieved by fine grid deflation or MG alone. Our method of deflation with partial solves shows a very mild dependence on lattice size, and is a significant step towards solving the strong scaling problem.