Accelerated Solution of the Frequency-domain Maxwell's Equations by Engineering the Eigenvalue Distribution of the Operator References and Links at the Final Stage of Our Work, We Were Made Aware of a Related Work

We introduce a simple method to accelerate the convergence of iterative solvers of the frequency-domain Maxwell's equations for deep-subwavelength structures. Using the continuity equation, the method eliminates the high multiplicity of near-zero eigenvalues of the operator while leaving the operator nearly positive-definite. The impact of the modified eigenvalue distribution on the accelerated convergence is explained by visualizing residual vectors and residual polynomials. A 3D finite-difference frequency-domain code for electromagnetic induction tomography, " J. Elements for plasmonic nanocircuits with three-dimensional slot waveguides, " Adv. Fast simulation of 3D electromagnetic problems using potentials, " J. " Finite-difference simulation of borehole EM measurements in 3D anisotropic media using coupled scalar-vector potentials, " Geophys. 71, G225–G233 (2006). A robust Maxwell formulation for all frequencies, " IEEE Trans. Improved finite-difference formulation in frequency domain for three-dimensional scattering problems, " IEEE Trans. Three-dimensional finite-difference method for the analysis of microwave-device embedding, " IEEE Trans.namaker, " Schelkunoff potential for electromagnetic field: proof of existence and uniqueness " (to be published), where an equation similar to our Eq. (7) with s = −1 was developed. 19. A. Jennings, " Influence of the eigenvalue spectrum on the convergence rate of the conjugate gradient method, " IMA J. On the sharpness of an asymptotic error estimate for conjugate gradients, " BIT Numer. The finite-element method for finding modes of dielectric-loaded cavities, " IEEE Trans. Mixed and penalty formulations for finite element analysis of an eigenvalue problem in electromag-netism, " Comput. 2 min , take the first equation of Eq. (4.29) in [32] and then multiply the extra factor µ = µ 0 to account for the difference between Eq. (4.17a) in [32] and Eq. (1) in the present paper. 32. W. Shin and S. Fan, " Choice of the perfectly matched layer boundary condition for frequency-domain Maxwell's equations solvers, " J. A generalized minimal residual algorithm for solving nonsymmetric linear systems, " SIAM J. QMR: a quasi-minimal residual method for non-hermitian linear systems, " Numer.


Introduction
To understand electromagnetic (EM) and optical phenomena, it is essential to solve Maxwell's equations efficiently.In the frequency domain, assuming a time dependence e +iωt and nonmagnetic materials, Maxwell's equations reduce to where ε is the electric permittivity (which can be complex); µ 0 is the magnetic permeability of vacuum; ω is the angular frequency; E and J are the electric field and electric current source density, respectively.To solve Eq. (1) numerically, one can use a method such as the finite-difference frequencydomain (FDFD) method [1][2][3] or the finite element method (FEM) [4] to construct a large system of linear equations Ax = b, (2) where A is a matrix representing the operator [∇ × (∇× ) − ω 2 µ 0 ε]; x is an unknown column vector representing E; b is a column vector representing −iω µ 0 J.The matrix A thus constructed is sparse (with only 13 nonzero elements per row when generated by the FDFD method) and typically very large (often with more than 10 million rows and columns for three-dimensional (3D) problems).To solve a system with such a large and sparse matrix, iterative methods are usually preferred to direct methods [5].However, in the "low-frequency regime" where the wavelength is much longer than the grid cell size ∆, it is well-known that convergence is quite slow when the iterative methods are directly applied to solve Eq. (1).The low-frequency regime arises, for example, in nanophotonics [6][7][8] and geophysics [9][10][11] where structures have feature sizes that are at deepsubwavelength scale, and it will be defined more rigorously in Sec. 2. The huge null space of the operator ∇ × (∇× ) was shown to be the origin of the slow convergence [10,11], and several techniques to improve the convergence speed have been developed.
The first class of techniques is based on the Helmholtz decomposition, which decomposes the E-field as E = Ψ + ∇ϕ, where Ψ is a divergence-free vector field and ϕ is a scalar field [9][10][11][12][13][14][15].Because ∇ • Ψ = 0, Eq. ( 1) is written as where the operator ∇ × (∇× ), which has a huge null space, is replaced with the negative Laplacian −∇ 2 , which is positive-definite for appropriate boundary conditions and thus has the smallest possible null space.However, these techniques either solve an extra equation for the extra unknown ϕ at every iteration step [9][10][11][12], which can be time-consuming, or increase the number of the rows and columns of the matrix by about 33% [13][14][15], which requires more memory.
The second class of techniques utilizes the charge-free condition The condition (4) holds at every source-free (i.e., J = 0) position, where Eq. ( 1) can be modified to for an arbitrary constant s; note that the right-hand side is 0 because J = 0.In this class of techniques, Eqs. ( 1) and ( 5) are solved at positions with and without sources, respectively.Reference [16] applied the above technique with s = +1 to boundary value problems described in [17] and achieved accelerated convergence.Such boundary value problems satisfied J = 0 everywhere, so Eq. ( 5) was solved throughout the entire simulation domain.
However, Ref. [16] did not conduct a detailed comparison of convergence speed between different values of s.It also did not report whether its technique leads to accelerated convergence for problems with sources, even though many problems have nonzero electric current sources J inside the simulation domain.Reference [1] applied the technique with s = +1 to problems with sources, but only in order to suppress spurious modes rather than to accelerate convergence.
In this paper, we develop a modification of Eq. ( 1) that improves convergence speed even if electric current sources J exist inside the simulation domain [18].Unlike the previous technique that made the modification only at source-free positions, our technique modifies Eq. ( 1) everywhere including positions with sources.For the modification, we utilize the continuity equation which can be derived by taking the divergence of Eq. ( 1).When Eq. ( 6) is manipulated appropriately and then added to Eq. ( 1), we obtain for a constant s.The modified equation (7) is the equation to solve in this paper.
The solution E-field of Eq. ( 7) is the same as the solution of the original equation ( 1) regardless of the value of s, because the solution of Eq. ( 1) always satisfies Eq. ( 6).However, the choice of s affects the convergence speed of iterative methods significantly.In this paper, we demonstrate that s = −1 induces faster convergence speed than other values of s by comparing the convergence behavior of iterative methods for s = −1, 0, +1; the latter two values of s are of particular interest, because s = 0 reduces Eq. ( 7) to the original equation ( 1) and s = +1 is the value that Ref. [16] used in Eq. ( 5), which is similar to Eq. (7).
We also show that the difference in convergence behavior results from the different eigenvalue distributions of the operators for different s.There are many general mathematical studies about the dependence of the convergence behavior on the eigenvalue distribution [19][20][21][22][23][24][25][26].Our aim here is instead to provide an intuitive understanding of the convergence behavior specifically for the operator of Eq. (7).For this purpose, at each iteration step we visualize the residual vector and residual polynomial, which are widely used concepts to explain the convergence behavior of iterative methods [5] and also defined briefly in Sec. 3. As a result, we find that convergence speed deteriorates substantially for s = 0 because the operator has eigenvalues clustered near zero, and for s = +1 because the operator is strongly indefinite.
The rest of this paper is organized as follows.In Sec. 2 we investigate the eigenvalue distribution of the operator in Eq. ( 7) for s = 0, −1, +1 for a simple homogeneous system.We also define the low-frequency regime rigorously in the section.In Sec. 3, we relate the eigenvalue distribution with the convergence behavior of an iterative method.In Sec. 4, we solve Eq. ( 7) for a wide range of realistic 3D problems to compare the convergence behavior of an iterative method for the three values of s, and we conclude in Sec. 5.

Eigenvalue distribution of the operator for a homogeneous system
In this section, we consider the operator in Eq. ( 7) for a homogeneous system and show that the properties of the eigenvalue distribution of the operator strongly depend on the value of s.The impact of s on the eigenvalue distribution has been studied in detail in the literature of the deflation method (also known as the penalty method) [27][28][29].Here we only highlight those aspects that are important for the present study.For a homogeneous system where ε is constant, Eq. ( 7) is simplified to where the operator is Hermitian for real ε.Because ε is constant in this section, the eigenvalue distribution of T is shifted from the eigenvalue distribution of a Hermitian operator by a constant −ω 2 µ 0 ε.In the low-frequency regime such shift is negligible, and thus the eigenvalue distribution of T 0 approximates that of T very well.Hence, we examine the eigenvalue distribution of T 0 below to investigate the eigenvalue distribution of T .
Table 1.Properties of the eigenvalue distributions of T 0 for different s.Depending on the sign of s, T 0 has very different eigenvalue distributions in terms of the multiplicity of the eigenvalue 0 and the definiteness of T 0 .s = 0 s < 0 s > 0 multiplicity of λ = 0 very high low definiteness of T 0 positive-semidefinite indefinite In Appendix A, we show that F k e −ik•r with are the three eigenfunctions of both ∇ × (∇× ) and ∇(∇• ) for each wavevector k.We also show in the same appendix that the corresponding three eigenvalues are for ∇ × (∇× ), and for ∇(∇• ).Therefore, T 0 has as three eigenvalues for each wavevector k.Equation (14) indicates that the eigenvalue distribution of T 0 is greatly affected by the value of s.Specifically, the multiplicity of the eigenvalue 0 depends critically on whether s is 0 or not: for s = 0 T 0 has a very high multiplicity of the eigenvalue 0 because Eq. ( 14) has 0 as an eigenvalue for every k, whereas for s = 0 T 0 does not have such a high multiplicity of the eigenvalue 0. The definiteness of T 0 also depends on the value of s: for s ≤ 0 T 0 is positivesemidefinite because the three eigenvalues in Eq. ( 14) are always nonnegative, whereas for s > 0 T 0 is indefinite because Eq. ( 14) has both positive and negative numbers as eigenvalues.The different properties of the eigenvalue distributions of T 0 for s = 0, s < 0, and s > 0 are summarized in Table 1.
The above description of the eigenvalue distributions of T 0 should approximately hold for the eigenvalue distributions of T as well in the low-frequency regime, as mentioned in the discussion of Eqs. ( 9) and (10).Moreover, even though T is a differential operator defined in an infinite space, it turns out that the description also applies to the matrix A discretized from T that is defined in a spatially bounded simulation domain.
To demonstrate, we numerically calculate the eigenvalues of A for a two-dimensional (2D) system shown in Fig. 1, a square domain filled with vacuum.The domain is discretized on a finite-difference grid with N x × N y = 50 × 50 cells and cell size ∆ = 2 nm.Therefore, the matrix A for each s has 3N x N y = 7500 rows and columns, where the extra factor 3 accounts for the three Cartesian components of the E-field.We choose ω corresponding to the vacuum wavelength λ 0 = 1550nm, which puts the system in the low-frequency regime as will be seen at the end of this section.The matrices A are constructed for three values of s: 0, −1, and +1, each of which represents each category of s in Table 1.
The distributions of the numerically calculated eigenvalues of A for s = 0, −1, +1 are shown as three plots in Fig. 2. In each plot, the horizontal axis represents eigenvalues, and it is divided  1.A 2D square domain filled with vacuum (ε = ε 0 ) for which the eigenvalue distribution of T is calculated numerically for s = 0, −1, +1.The domain is homogeneous in the z-direction, whereas its x-and y-boundaries are subject to periodic boundary conditions.The square domain is discretized on a finite-difference grid with cell size ∆ = 2 nm.The domain is composed of 50 × 50 grid cells, which lead to 7500 eigenvalues in total.A vacuum wavelength λ 0 = 1550 nm, which puts the system in the low-frequency regime, is assumed for the electric current source to be used in Sec. 3. into 41 intervals t −20 , . . .,t 0 , . . .,t 20 where t 0 ∋ 0. The height of the column on each interval corresponds to the number of the eigenvalues in the interval.The eigenvalue distributions of A shown in Fig. 2 agree well with the description of the eigenvalues of T 0 in Table 1: the very tall column on t 0 in Fig. 2(a) indicates the very high multiplicity of λ ≃ 0 for s = 0, and the eigenvalues distributed over t j<0 and t j>0 in Fig. 2(c) indicate a strongly indefinite operator for s = +1.In addition, the height of the column on t 0 in Fig. 2(a) is about 2500, or one third of the total number of eigenvalues, which agrees with Eq. ( 14) for s = 0 where one of the three eigenvalues is 0 for each k; the columns on t j>0 are about 1.5 times taller in Fig. 2(b) than in Fig. 2(a), which also agrees with Eq. ( 14) where the number of |k| 2 increases from two for s = 0 to three for s = −1.
We end the section by providing a quantitative definition of the low-frequency regime.Suppose that A 0 is the matrix discretized from T 0 of Eq. (10).For s = 0, the eigenvalues of A 0 range from 0 to 8/∆ 2 min , where ∆ min is the minimum grid cell size [31]; note that the range agrees with Fig. 2(a).The eigenvalue distribution of A is the shifted eigenvalue distribution of A 0 by −ω 2 µ 0 ε.The low-frequency regime is where the magnitude of the shift is so small that A has an almost identical eigenvalue distribution as A 0 .Therefore, the condition for the low-frequency regime is Equation ( 15) is consistent with the condition introduced in [10], but here we provide a condition that is based on a more accurate estimate of the maximum eigenvalue of A 0 .We can rewrite Eq. ( 15) in terms of the vacuum wavelength λ 0 as where ε r = ε/ε 0 is the relative electric permittivity.The system described in Fig. 1 satisfies Eq. ( 16), so it is in the low-frequency regime.

Impact of the eigenvalue distribution on the convergence behavior of GMRES
In this section, we explain how the different eigenvalue distributions for different values of s examined in Sec. 2 influence the convergence behavior of an iterative method to solve Eq. ( 8).For each of s = −1, 0, +1, we discretize Eq. ( 8) using the FDFD method for the system illustrated in Fig. 1 with an x-polarized electric dipole current source placed at the center of the simulation domain.We then solve the discretized equation by an iterative method to observe the convergence behavior.The iterative method to use in this section is the general minimal residual (GMRES) method [33], which is one of the Krylov subspace methods [5].We use GMRES without restart because the system is sufficiently small.
Like other iterative methods, GMRES generates an approximate solution x m of Ax = b at the mth iteration step.As m increases, x m eventually converges to the exact solution.We assume that convergence is achieved when the residual vector satisfies r m / b < τ, where • is the 2-norm and τ is a user-defined small positive number.
In practice, τ = 10 −6 is sufficiently small for accurate solutions.We use x 0 = 0 as an initial guess solution throughout the paper.Figure 3 shows r m / b versus the number m of iteration steps for the three values of s.As can be seen in the figure, the convergence behavior of GMRES is quite different for different s, with s = −1 far more superior than the other two choices of s.
The overall trend of the convergence behavior shown in Fig. 3 is consistent with the mathematical theories of iterative methods.For example, the convergence stagnates initially for s = 0, and according to [21] this is typical behavior of GMRES for a matrix with many eigenvalues close to 0 such as our A for s = 0 (see Fig. 2(a)).Also, the convergence is very slow for s = +1, and Ref. [23] argues that in general the Krylov subspace methods converge much more slowly for indefinite matrices such as our A for s = +1 (see Fig. 2(c)) than for definite matrices.In this section we provide a more intuitive explanation for the convergence behavior by using the residual polynomial.
We first review the residual polynomial of GMRES briefly.Suppose that P m is the set of all polynomials pm of degree at most m such that pm (0) = 1. ( For each pm ∈ P m , we can define a column vector rm ≡ pm (A)r 0 .At the mth iteration step of GMRES, the residual vector r m of Eq. ( 17) is the rm with the smallest 2-norm [5].We refer to the pm for rm = r m as the residual polynomial p m .Therefore, from Eq. ( 19) we have Below, we show how the eigenvalue distribution of A influences p m at each iteration step and hence influences the convergence behavior of GMRES.The matrix A ∈ C n×n for our homogeneous system described in Fig. 1 is Hermitian because it is discretized from the Hermitian operator T of Eq. ( 8).Hence, the eigendecomposition of A is where with real eigenvalues λ i and the corresponding normalized eigenvectors v i , and V † is the conjugate transpose of V ; note that V is unitary, i.e., V † V = I.Substituting Eq. ( 21) in Eq. ( 19), we We then define a column vector whose ith element, which is referred to as zmi below, is the projection of rm / b onto the direction of the ith eigenvector v i .Similarly, we also define From Eq. ( 25) for m = 0 and Eqs. ( 23) and ( 24), we obtain which can be written element-by-element as Because zm = rm / b , GMRES minimizes zm to z m when it minimizes rm to r m at the mth iteration step.According to Eq. ( 27), |z mi | is minimized to 0 when pm has λ i as a root.Thus, the most ideal pm has all the n eigenvalues of A as its roots, because it reduces zm to 0. However, pm has at most m roots, and m, which is the number of iteration steps, is typically far less than n.Therefore, pm needs to have its roots optimally placed near the eigenvalues to minimize zm .Hence, the eigenvalue distribution of A greatly influences the convergence behavior of GMRES.
We now seek to understand the convergence behavior of GMRES for the different choices of s.We begin with s = −1.In Fig. 4 we plot r m / b for s = −1 as bar graphs at the first few iteration steps.The horizontal axis in each plot represents eigenvalues.We divide the range of eigenvalues into the same 41 intervals t −20 , . . .,t 0 , . . .,t 20 used in Fig. 2; note that t 0 ∋ 0. The height of the column on each interval is the norm of the projection of r m / b onto the space spanned by the eigenvectors whose corresponding eigenvalues are contained in the interval.More specifically, the height of the column on t j after m iteration steps is Note that [∑ j h 2 m j ] 1/2 = r m / b , and thus the sum of the squares of the column heights is a direct measure of convergence.
A few properties of r m / b for s = −1 shown in Fig. 4 are readily predicted from the corresponding eigenvalue distribution of the matrix A presented in Fig. 2(b).For instance, A has no eigenvalues in t j<0 , and therefore r m / b has components only in t j≥0 throughout the iteration process as demonstrated in Fig. 4. Also, A has very few eigenvalues in t 0 , and thus r 0 / b has a very weak component in t 0 as can be seen in the m = 0 plot in Fig. 4. Now, we relate r m / b with the residual polynomial to explain the convergence behavior of GMRES for s = −1.The residual polynomial p m (λ ), which is obtained by solving a least squares problem, is also plotted in Fig. 4 at each iteration step.As the iteration proceeds, the residual polynomial in Fig. 4 has more and more roots, but only in t j≥0 , because the eigenvalues exist only in t j≥0 and the roots of residual polynomials should stay close to the eigenvalues as mentioned in the discussion following Eq.( 27).Also, as Eq. ( 27) predicts, the columns in each plot of Fig. 4 almost vanish at the roots of the residual polynomial.Therefore, all the columns quickly shrink as the number of the roots of the residual polynomial increases in the iteration process of GMRES.The fast reduction of the column heights provides visualization of the fast convergence of GMRES for s = −1 shown in Fig. 3. Next, we examine the convergence behavior for s = 0. Figure 5 shows r m / b for s = 0 at the first few iteration steps.Note that r 0 / b has a tall column on t 0 because A has many eigenvalues in t 0 as shown in Fig. 2(a).Also, the tall column on t 0 persists during the initial period of the iteration process.
To explain the above convergence behavior for s = 0, we show that for a nearly positivedefinite matrix the column on t 0 is persistent during the initial period of the iteration process of GMRES in general.For that purpose, we compare the three polynomials pm ∈ P m shown in Fig. 6.The three pm are chosen as candidates for the residual polynomial p m for a nearly positive-definite matrix, and therefore the roots of the polynomials are placed in t j≥0 according to the discussion following Eq.(27).The three pm have the same roots except for their smallest roots: pm in Fig. 6(a) does not have its smallest root in t 0 , whereas pm in Figs.6(b) and 6(c) do.Note that the latter two pm can shrink the column on t 0 more effectively than the first pm according to Eq. ( 27).
However, the slopes at the roots of the latter two pm are steeper than the slopes at the corresponding roots of the first pm as shown in Fig. 6.In Appendix B, we prove rigorously that the slopes of pm at all roots indeed increase as the smallest root decreases in magnitude.In general, pm with steeper slopes at the roots oscillates with larger amplitudes around the horizontal axis because it varies faster around the axis; compare the amplitudes of oscillation in Fig. 6(a) with those in Figs.6(b) and 6(c).The increased amplitudes of oscillation amplify |z mi | overall according to Eq. ( 27), and thus zm as well.
In other words, shrinking the column on t 0 (by placing the smallest root of pm in t 0 ) is achieved only at the penalty of amplifying the columns on t j>0 .This penalty is too heavy when the columns on t j>0 constitute a considerable portion of zm .Therefore, roots of residual polynomials are not placed in t 0 until the columns on t j>0 become quite small, which results in the persistence of the column on t 0 during the initial period of the iteration process.
Because the height of the column on t 0 remains almost the same at the initial iteration steps of GMRES, h 00 of Eq. ( 28), which is the initial height of this column, provides an approximate lower bound of z m = r m / b during the initial period of the iteration process.A more accurate lower bound is calculated as the norm of r 0 / b projected onto the eigenspace of the eigenvalue closest to 0. For our example system, for s = 0 the calculated lower bound is 0.707.Note that r m / b for s = 0 indeed stagnates initially at this value in Fig. 3.For s = −1 the calculated lower bound is 4.16 × 10 −7 , at which r m / b also stagnates as shown in Fig. 3.However, this value is much smaller than the lower bound for s = 0, because for s = −1 the initial height of the column on t 0 is almost negligible as shown in the m = 0 plot in Fig. 4. In fact, the value is smaller than the conventional tolerance τ = 10 −6 mentioned below Eq. ( 17), so the stagnation does not deteriorate the convergence speed for s = −1.
Lastly, we examine the convergence behavior for s = +1.Figure 7 shows r m / b for s = +1 at some first (m = 0, 4, 7, 11) and later (m = 120, 140) iteration steps.Because the matrix A for s = +1 has both positive and negative eigenvalues as indicated in Fig. 2(c), r m / b has components in both t j>0 and t j<0 , but in the present example the components of r m / b are concentrated in t j<0 initially (m = 0 plot in Fig. 7).Thus GMRES begins with the roots of residual polynomials placed in t j<0 (m = 4 plot in Fig. 7).However, such residual polynomials have large values in t j>0 , so they amplify the initially very small components of r m / b in t j>0 according to Eq. ( 27), and eventually we reach a point where the components of r m / b in t j>0 and t j<0 become comparable (m = 7 plot in Fig. 7).Afterwards, GMRES places the roots of residual polynomials in both t j>0 and t j<0 so that the components of r m / b in both regions are reduced.
We note that the convergence behavior for s = +1 is initially quite similar to that for s = −1 because r 0 / b for s = +1 has components concentrated in t j<0 and only a very weak component in t 0 .Therefore, r m / b reduces quickly for s = +1 without stagnation during the initial period of the iteration process as shown in Fig. 3.
During the later period of the iteration process, however, the reduction of r m / b for s = +1 slows down significantly, and eventually s = +1 produces the slowest convergence among the three values of s as shown in Fig. 3.The slow reduction of r m / b is due to the very persistent column on t 0 : comparing the plots for m = 120 and m = 140 in Fig. 7 shows that the column barely reduces for 20 iteration steps.
We have shown earlier that the column on t 0 is quite persistent for a nearly positive-definite matrix.The argument relied on the properties proved in Appendix B about a polynomial pm ∈ P m with only positive roots.We can easily extend the proof in the appendix to pm with both positive and negative roots, and then show that the column on t 0 is persistent also for a strongly indefinite matrix, which explains the slow convergence for s = +1 described above.However, the explanation is insufficient to explain why the convergence is much slower for s = +1 than for s = 0 as indicated in Fig. 3.
Here, we show that the column on t 0 is in fact even more persistent for a strongly indefinite matrix than for a nearly positive-definite matrix.For that purpose, we compare the two polynomials pm ∈ P m shown in Fig. 8.As can be seen from the locations of their roots, they are candidates for the residual polynomials for different matrices: pm shown in Fig. 8(a) is appropriate for a nearly positive-definite matrix (referred to as A def below), and pm shown in Fig. 8(b) is appropriate for a strongly indefinite matrix (referred to as A ind below).Moreover, we choose these two pm to have the same smallest-magnitude root ζ 0 in t 0 .Being elements of P m , both pm satisfy Eq. ( 18).Hence, we have | p′ m (ζ 0 )| ≃ 1/|ζ 0 | for both pm , where p′ m is the first derivative of pm .Now, we note that | p′ m | evaluated at a root of pm tends to decrease as the root gets closer to the median of the roots; see Appendix C for a more rigorous explanation.Hence, | p′ m | ≤ 1/|ζ 0 | tends to hold at most roots of pm for A def , because ζ 0 is one of the farthest roots from the median of the roots.On the other hand, | p′ m | ≥ 1/|ζ 0 | tends to hold at most roots of pm for A ind , because ζ 0 is one of the closest roots to the median of the roots.Therefore, pm for A ind has much steeper slopes at most roots than pm for A def in general, and thus has larger amplitudes of oscillation around the horizontal axis, as demonstrated in Fig. 8.
Combined with Eq. ( 27), the above argument shows that shrinking the column on t 0 (by placing the smallest-magnitude root of pm in t 0 ) increases z m much more for a strongly in- definite matrix than for a nearly positive-definite matrix.Therefore, the column on t 0 should be much more persistent for a strongly indefinite matrix than for a nearly positive-definite matrix in general, which explains the much slower convergence for s = +1 than for s = 0 in Fig. 3.In summary of this section, we have shown that s = −1 produces the most superior convergence behavior; s = 0 induces stagnation during the initial period of the iteration process due to the high multiplicity of eigenvalues near zero; s = +1 leads to the slowest convergence overall due to the strongly indefinite matrix.We have provided a graphical explanation of the difference in the convergence behavior of GMRES, for which a strong theoretical basis exists, using a simple system of a homogeneous dielectric medium.
The arguments provided in this section can be easily extended to show that s = −1 is indeed optimal among all values in general.Compared with the case of s = −1, according to Eq. ( 14), for s > 0 A is always more strongly indefinite and therefore the convergence should be slower; for −1 < s < 0 A has more eigenvalues clustered near zero and thus the initial stagnation period should be longer; for s < −1 A has a wider eigenvalue value range, so the condition number of A should be larger and the convergence should be slower [23].In other words, s = −1 is the value that leaves A nearly positive-definite while removing the eigenvalues clustered near zero sufficiently without increasing the condition number.With separate numerical experiments we have verified that s = −1 is indeed superior to values other than s = 0 and s = +1 as well.
In the next section we will see that the difference in the convergence behavior for different choices of s is in fact quite general in practical situations.

Convergence behavior of QMR for 3D inhomogeneous systems
In this section, we solve Eq. ( 7) for 3D inhomogeneous systems of practical interest by an iterative method, and demonstrate that s = −1 still induces faster convergence than s = 0 and s = +1.We note that the systems examined in this section are inhomogeneous and have complex ε in general.The analyses in Secs. 2 and 3, therefore, do not hold strictly here.Nevertheless, we will see that the analyses for the homogeneous system in the previous sections provide insight  [34], ε silica r = 2.085 [35], ε silicon r = 12.09 [35], and ε gold r = −10.78− i0.79 [36], respectively.Table 2. Specification of the finite-difference grids used for the three systems in Fig. 9. Slot uses a nonuniform grid with smoothly varying grid cell size.The matrix A has 3N x N y N z rows and columns, where the extra factor 3 accounts for the three Cartesian components of the E-field.

Slot
Diel Array N x , N y , N z 192, 192, 240 220, 220, 320 220, 220, 130 ∆ x , ∆ y , ∆ z 2 ∼ 20 nm 10 nm 5, 5, 20 nm in understanding the convergence behavior for more general systems examined in this section.The three 3D inhomogeneous systems we consider are illustrated in the first row of Fig. 9.To prevent reflection of EM waves from boundaries, we enclose each system by the stretchedcoordinate perfectly matched layer (SC-PML), because SC-PML produces a much betterconditioned matrix than the more commonly used uniaxial PML (UPML) [32].For each system, we construct three systems of linear equations Ax = b corresponding to s = −1, 0, +1 by the FDFD method.The number of the grid cells N x , N y , and N z and the grid cell sizes ∆ x , ∆ y , and ∆ z of the finite-difference grid used to discretize each system are shown in Table 2. Considering the parameters summarized in Table 3 and the condition ( 16), all the three systems are in the low-frequency regime.
The constructed systems of linear equations are solved by the quasi-minimal residual (QMR) method [37], which is another Krylov subspace method.GMRES that was used in Sec. 3 to solve a 2D problem is not suitable for 3D problems here because it requires too much memory [5].
The second row of Fig. 9 shows the convergence behavior of QMR for the three systems.
Table 3. Parameters used in Eq. ( 16) for the three systems in Fig. 9.When substituted in Eq. ( 16), these parameters prove that all the three systems are in the low-frequency regime.Note that for all three systems the choice of s = −1 results in the fastest convergence, and the choice of s = +1 barely leads to convergence.The three systems shown in Fig. 9 are chosen deliberately to include different materials such as dielectrics and metals and geometries with different degrees of complexity.Therefore, Fig. 9 suggests that the superiority of s = −1 over both s = 0 and s = +1 is quite general.Even though both the iterative method and the systems in this section are significantly different from those in the previous section, the convergence behaviors are very similar.We explain the resemblance as follows.
The matrix for an inhomogeneous system is actually not much different from that for a homogenous system in many cases.Indeed, most inhomogeneous systems consist of several homogeneous subdomains.Inside each homogeneous subdomain of such an inhomogeneous system, the differential operator in Eq. ( 7) for the inhomogeneous system is the same as the differential operator (9) for a homogeneous system, whereas at the interfaces between the subdomains it is not.Nevertheless, the number of finite-difference grid points assigned at the interfaces is usually much smaller than that of the grid points assigned inside the homogeneous subdomains.Therefore most rows of the matrix for the inhomogeneous system should be the same as those for a homogeneous system discretized on the same grid.
In addition, the differential operator (9) for a homogeneous system is nearly Hermitian in the low-frequency regime even if ε is complex, because it is approximated well by the Hermitian operator (10).
Hence, the matrix for an inhomogeneous system is actually quite similar to the nearly Hermitian matrix for a homogeneous system.Because QMR reduces to GMRES for Hermitian matrices in exact arithmetic [38], it is reasonable that the convergence behavior of QMR for an inhomogeneous system is similar to that of GMRES for a homogeneous system.
The matrix for an inhomogeneous system deviates more from that for a homogeneous system as the number of homogeneous subdomains increases, because then the number of grid points assigned at the interfaces between homogeneous subdomains increases.Therefore, we can expect that the convergence behavior for an inhomogeneous system would deviate from that for a homogeneous system as the number of homogeneous subdomains increases.Such deviation is demonstrated in Fig. 9(c), where the system has many metallic pillars; note that the convergence behavior for s = −1 is no longer very different from that for s = 0 in this case.

Conclusion and final remarks
We have introduced a new method to accelerate the convergence of iterative solvers of the frequency-domain Maxwell's equations in the low-frequency regime.The method solves a new equation that is modified from the original Maxwell's equations using the continuity equation.
The operator of the newly formulated equation does not have the high multiplicity of nearzero eigenvalues that makes the convergence stagnate for the original operator.At the same time, the new operator is nearly positive-definite, so it avoids the long-term slow convergence that indefinite operators suffer from.
In this paper, we have considered only nonmagnetic materials (µ = µ 0 ).For magnetic mate-rials (µ = µ 0 ), we note that a similar equation which can also be formulated from Maxwell's equations and the continuity equation, can be used instead of Eq. ( 7) to accelerate the convergence of iterative methods.We leave the discussion on the optimal value of s in this equation for future work.
Because our method achieves accelerated convergence by formulating a new equation, it can be easily combined with other acceleration techniques such as preconditioning and better iterative methods.
Fig.1.A 2D square domain filled with vacuum (ε = ε 0 ) for which the eigenvalue distribution of T is calculated numerically for s = 0, −1, +1.The domain is homogeneous in the z-direction, whereas its x-and y-boundaries are subject to periodic boundary conditions.The square domain is discretized on a finite-difference grid with cell size ∆ = 2 nm.The domain is composed of 50 × 50 grid cells, which lead to 7500 eigenvalues in total.A vacuum wavelength λ 0 = 1550 nm, which puts the system in the low-frequency regime, is assumed for the electric current source to be used in Sec. 3.

Fig. 2 .
Fig.2.The eigenvalue distribution of A discretized from T for (a) s = 0, (b) s = −1, and (c) s = +1 for the vacuum-filled domain illustrated in Fig.1.All 7500 eigenvalues λ of A are calculated for each s and categorized into 41 intervals in the horizontal axis that represents the range of the eigenvalues; the unit of the horizontal axis is nm −2 .The height of the column on each interval represents the number of the eigenvalues in the interval.In (b) and (c), the black dots indicate the eigenvalue distribution for s = 0 shown in (a).The vertical axes are broken due to the extremely tall column at λ ≃ 0 in (a).The local maxima at λ = ±1 nm −2 are the Van Hove singularities[30] arising from the lattice structure imposed by the finite-difference grid.

Fig. 3 .
Fig.3.Convergence behavior of GMRES for the vacuum-filled domain illustrated in Fig.1.Three systems of linear equations discretized from Eq. (8) for s = 0, −1, +1 are solved by GMRES.In the iteration process of GMRES for each s, we plot the relative residual norm r m / b at each iteration step m.Notice that for s = 0 the relative residual norm stagnates initially; for s = −1 it stagnates around m = 100; for s = +1 it does not stagnate, but decreases very slowly.The upper and lower "X" marks on the vertical axis indicate the values around which our theory expects r m / b to stagnate for s = 0 and s = −1, respectively.

4 Fig. 4 .
Fig. 4. Initial evolution of r m / b for s = −1.Relative residual vectors r m / b are visualized at three iteration steps m = 0, 2, 4.In each plot, the column on each interval represents the norm of r m / b projected onto the sum of the eigenspaces of the eigenvalues contained in the interval.Notice that all the columns almost vanish only after four iteration steps.In the plots for m = 2 and m = 4, the residual polynomials p m are also plotted as solid curves; note that they always satisfy the condition(18).

4 Fig. 5 .
Fig. 5. Initial evolution of r m / b for s = 0. Relative residual vectors r m / b are visualized at three iteration steps m = 0, 2, 4.In each plot, the column on each interval represents the norm of r m / b projected onto the sum of the eigenspaces of the eigenvalues contained in the interval.Notice that most columns almost vanish only after four iteration steps, except for the very persistent column at λ ≃ 0. In the plots for m = 2 and m = 4, the residual polynomials p m are also plotted as solid curves; note that they always satisfy the condition(18).

Fig. 6 .
Fig. 6.Impact of the magnitude of the smallest root of a polynomial pm ∈ P m on the oscillation amplitudes of pm .Three pm of degree 6 are shown.In each figure, a solid line represents a polynomial; an open dot on the horizontal axis indicates the smallest root; solid dots indicate the other roots; dashed lines show the slopes of the polynomial at the roots.The three polynomials have the same roots except for their smallest roots: the smallest root in (a) becomes smaller positive and negative roots in (b) and (c), respectively.Notice that the slopes at all roots in (a) become steeper in (b) and (c) as the smallest root decreases in magnitude, and as a result the amplitudes of oscillation of pm around the horizontal axis increase.

Fig. 7 .
Fig. 7. Evolution of r m / b for s = +1.Relative residual vectors r m / b are visualized at iteration steps m = 0, 4, 7 in the first row and at m = 11, 120, 140 in the second row.In each plot, the column on each interval represents the norm of r m / b projected onto the sum of the eigenspaces of the eigenvalues contained in the interval.The vertical scale of the plot is magnified as the iteration proceeds.Notice that the column at λ ≃ 0 is very persistent during the later period of the iteration process (m = 120, 140).In the plots for m = 4, 7, 11, the residual polynomials p m are also plotted as solid curves; the residual polynomials are not plotted for m = 100 and m = 120 because they have too many roots.

Fig. 8 .
Fig. 8. Candidates for the residual polynomials for (a) a nearly positive-definite matrix and (b) strongly indefinite matrix.In each figure, a solid line represents a polynomial pm ∈ P m ; an open dot on the horizontal axis indicates the smallest-magnitude root; solid dots indicate the other roots; dashed lines show the slopes of the polynomial at the roots.The two polynomials have the same smallest-magnitude root ζ 0 , and thus have approximately the same slope −1/ζ 0 at their smallest-magnitude roots.Note that for both pm the slopes get steeper at the roots further away from the median of the roots.Hence, the slopes of most dashed lines are gentler than 1/|ζ 0 | in (a) and steeper than 1/|ζ 0 | in (b).As a result, pm in (b) has larger amplitudes of oscillation around the horizontal axis than pm in (a).

Fig. 9 .
Fig. 9. Three inhomogeneous systems for which Eq. (7) is solved for s = −1, 0, +1 by QMR: (a) a slot waveguide bend formed in a thin silver film (Slot), (b) a straight silicon waveguide (Diel), and (c) an array of gold pillars (Array).The figures in the first row describe the three systems.The directions of wave propagation are shown by red arrows, beside which the vacuum wavelengths used are indicated.For all three systems, the waves are excited by electric current sources J strictly inside the simulation domain.The plots in the second row show the convergence behavior of QMR.Note that for all three systems QMR converges fastest for s = −1, whereas it barely converges for s = +1.The relative electric permittivities of the materials used in these systems are ε silver r = −129 − i3.28 [34], ε silica and the eigenvalue equation for ∇(∇• ) is − .(31) and (32) for a given k, we obtain λ = 0, |k| 2 , |k| 2 ,