Generalized rational Krylov decompositions with an application to rational approximation

Generalized rational Krylov decompositions are matrix relations which, under certain conditions, are associated with rational Krylov spaces. We study the algebraic properties of such decompositions and present an implicit Q theorem for rational Krylov spaces. Transformations on rational Krylov decompositions allow for changing the poles of a rational Krylov space without recomputation, and two algorithms are presented for this task. Using such transformations we develop a rational Krylov method for rational least squares fitting. Numerical experiments indicate that the proposed method converges fast and robustly. A MATLAB toolbox with implementations of the presented algorithms and experiments is provided.

1. Introduction.Numerical methods based on rational Krylov spaces have become an indispensable tool of scientific computing.Rational Krylov spaces were initially proposed by Ruhe in the 1980s for the purpose of solving large sparse eigenvalue problems [29,31,32].Since then many more applications have been found in model order reduction [18,14], large-scale matrix functions and matrix equations [12,1,21,22], and nonlinear eigenvalue problems [33,24,39,23], to name just a few.
In this paper we study various algebraic properties of rational Krylov spaces, using as starting point a generalized rational Krylov decomposition where A ∈ C N ×N is a given matrix, and the matrices V m+1 ∈ C N ×(m+1) and {K m , H m } ⊂ C (m+1)×m are of maximal rank.Throughout this paper the underlined matrices have one row more than they have columns.Our matrix decomposition approach is inspired by the work of Stewart [35,37] who studied transformations on a (polynomial) Krylov decomposition which is a special case of (1.1) with K m = I m , the m × m identity matrix with an appended row of zeros.Indeed, all results in this paper apply to polynomial Krylov spaces as well.The outline of this work is as follows: in section 2 we study algebraic properties of rational Arnoldi decompositions (a special case of (1.1) where (H m , K m ) is an unreduced upper-Hessenberg pencil) and relate these decompositions to the poles and the starting vector of a rational Krylov space.Section 3 provides a rational implicit Q theorem about the uniqueness of rational Arnoldi decompositions.We also show how the rational functions associated with the rational Krylov space can be evaluated at any point z ∈ C by computing a full QR factorization of zK m − H m .In section 4 we show that when the lower m × m part of the pencil (H m , K m ) is regular then the range of V m+1 in (1.1) is a rational Krylov space.Via transformations on such decompositions we are able to move the poles of a rational Krylov space to arbitrary positions (even to the eigenvalues of A), and we give two algorithms for this task.Finally, in section 5 we incorporate one of these algorithms into an iterative method, called RKFIT, for finding a rational function R m (z) of type (m, m) such that F v − R m (A)v 2 is minimal, where {A, F } ⊂ C N ×N are given matrices and v ∈ C N .
All algorithms and numerical experiments presented in this paper are contained in a MATLAB toolbox [2] available for download. 1  Notation.Matrices are labeled with uppercase Latin letters (e.g., A or H) and their elements with the corresponding lowercase Latin letters and indices (e.g., a ij or h ij ).(Underlined) lowercase Latin letters in bold with an index k (e.g., q k , h k , h k ) denote the kth column of the corresponding matrix, or just its leading (k + 1 or) k components.Vectors are also labeled by (underlined) lowercase Latin letters in bold (e.g., b or v k ).(Underlined) uppercase Latin letters (Q k , H k , H k ) with an index k denote the corresponding submatrix made of the leading k columns, or their leading (k + 1 or) k rows, as appropriate.By RV we denote the range of a matrix V .
2. Rational Arnoldi decompositions.Given a matrix A ∈ C N ×N , a starting vector v ∈ C N , and an integer m < N , the associated polynomial Krylov space of order m + 1 is defined as K m+1 = K m+1 (A, v ) = span{v , Av , . . ., A m v }.There exists an integer M ≤ N , called the invariance index for (A, v ), such that Throughout this work we assume that 0 < m < M , in which case the space K m+1 is isomorphic to P m , the linear space of polynomials of degree at most m, i.e., any w ∈ K m+1 corresponds to a polynomial p ∈ P m satisfying w = p(A)v , and vice versa.
Given a nonzero polynomial q m ∈ P m with roots disjoint from the spectrum Λ(A), we define the associated rational Krylov space as Q m+1 = Q m+1 (A, v , q m ) := q m (A) −1 K m+1 (A, v ). (2.1) Note that q m (A) is nonsingular since no root of q m is an eigenvalue of A and therefore Q m+1 (A, v , q m ) is well defined.Clearly, we have v ∈ Q m+1 (A, v , q m ).Moreover, dim(Q m+1 ) = dim(K m+1 ) so that Q m+1 is A-variant (i.e., AQ m+1 ⊆ Q m+1 ) if and only if m + 1 < M .The roots of q m are called poles of the rational Krylov space and denoted by ξ 1 , ξ 2 , . . ., ξ m .Infinity is an allowed root and it symbolises that q m is not of exact degree m, i.e., deg(q m ) < m.We now show that the poles of a rational Krylov space are uniquely determined by the starting vector and vice versa.
Lemma 2.1.Let Q m+1 (A, v , q m ) be a given A-variant rational Krylov space, i.e., m + 1 < M .Then the poles of Q m+1 (A, v , q m ) are uniquely determined by the starting vector v , or equivalently, the starting vector of Q m+1 (A, v , q m ) is uniquely, up to scaling, determined by the roots of q m .Proof.We first show that given an A-variant polynomial Krylov space K m+1 (A, q ), all vectors w ∈ K m+1 (A, q ) that satisfy K m+1 (A, q ) = K m+1 (A, w ) are of the form w = αq , α = 0. Assume to the contrary that there exists a polynomial p j with 1 ≤ deg(p j ) = j ≤ m such that w = p j (A)q .Then A m+1−j w ∈ K m+1 (A, w ), but for the same vector we have A m+1−j w = A m+1−j p j (A)q ∈ K m+1 (A, q ).This is a contradiction to K m+1 (A, q ) = K m+1 (A, w ).
To show that the poles are uniquely determined by the starting vector v , assume that Q m+1 (A, v , q m ) = Q m+1 (A, v , qm ).Using the definition of a rational Krylov space (2.1), this is equivalent to K m+1 (A, qm (A)v ) = K m+1 (A, q m (A)v ).This space is A-variant, hence by the above argument we know that q m (A)v = αq m (A)v , α = 0.This vector is an element of K m+1 (A, v ) which is isomorphic to P m .Therefore q m = αq m and hence q m and qm have identical roots.Similarly one shows that if In the following we aim to establish a one-to-one correspondence between rational Krylov spaces and a particular type of matrix decompositions.As a consequence we are able to study the algebraic properties of rational Krylov spaces using these decompositions.Recall that a matrix H m ∈ C (m+1)×m is called upper-Hessenberg if all the elements below the first subdiagonal are zero, i.e., if i > j + 1 implies h ij = 0. Further, we say that H m is unreduced if none of the elements on the first subdiagonal are zero, i.e., h i+1,i = 0.For convenience, we now generalize this terminology from matrices to pencils H m , K m .Definition 2.2.Let {K m , H m } ⊂ C (m+1)×m be upper-Hessenberg matrices.We say that H m , K m is an unreduced upper-Hessenberg pencil if |h j+1,j | + |k j+1,j | = 0 for all j = 1, . . ., m.
We are now ready to introduce the notion of a rational Arnoldi decomposition.Definition 2.3.Let A ∈ C N ×N be a given matrix.A relation of the form ) is of full column rank, H m , K m is an unreduced upper-Hessenberg pencil of size (m + 1) × m, and the quotients h j+1,j /k j+1,j , called poles of the decomposition, are outside the spectrum Λ(A) for j = 1, . . ., m.
The columns of V m+1 are called the basis of the decomposition and they span the space of the decomposition.If V m+1 is orthonormal, we say that (2.2) is an orthonormal RAD.
It is noteworthy that both H m and K m in the RAD (2.2) are of full rank.To see this take any ξ ∈ C and subtract ξV m+1 K m from both sides of (2.2).This leads to Since H m , K m is unreduced there are at most m numbers ξ such that H m − ξK m is not unreduced.For any other ξ the right-hand side in (2.3) is of full rank and so must be the left-hand side.Therefore K m is of full rank.If A is nonsingular, comparing the ranks of the left-and right-hand side in (2.2) we now see that H m is of full rank as well.If A is singular, then zero is not an allowed pole and therefore H m is unreduced and hence of full rank.Furthermore, any RAD (2.2) can be transformed into an orthonormal RAD using the thin QR decomposition V m+1 = QR.Setting Vm+1 = Q, Km = RK m , and Hm = RH m , we obtain the decomposition A Vm+1 Km = Vm+1 Hm , spanning RV m+1 (i.e., R Vm+1 = RV m+1 ) and satisfying h j+1,j /k j+1,j = hj+1,j / kj+1,j for all j = 1, . . ., m.We call these two RADs equivalent.
Definition 2.4.Two RADs with the same matrix A ∈ C N ×N are equivalent if they span the same space and have the same poles.
From now on we assume all RADs to be orthonormal.In Theorem 2.5 we show that for every rational Krylov space Q m+1 (A, v , q m ) there exists an RAD (2.2) spanning Q m+1 (A, v , q m ) and conversely, if such a decomposition exists it spans a rational Krylov space.To proceed it is convenient to write the polynomial q m in factored form, and to label separately all the leading factors with some scalars ) is independent of the scaling of q m any choice of the scalars h i+1,i and k i+1,i is valid as long as their ratio is ξ i .When we make use of (2.4) without specifying the order of appearance of the poles, we mean any order.The fact that q j | q j+1 gives rise to a sequence of nested rational Krylov spaces where Q j+1 = Q j+1 (A, v , q j ) for j = 0, 1, . . ., m.
Proof.Let (2.2) hold and define the polynomials {q j } m j=0 as in (2.4).Note that these are nonzero polynomials since the pencil H m , K m is unreduced.We show by induction that for j = 1, . . ., m, and with v = v 1 .In particular, for j = m we obtain V m+1 = q m (A) −1 K m+1 (A, v ).Consider j = 1.Reading (2.2) column-wise, first column only, and rearranging the terms yields ) which together with the fact v 1 ∈ q 1 (A) −1 K 2 (A, v ) proves (2.5) for j = 1.Let us assume that (2.5) holds for j = 1, . . ., n − 1 < m.We now consider the case j = n.Comparing the nth column on the left-and the right-hand side in (2.2) and rearranging the terms gives and hence (2.9) It follows from (2.8) and (2.9) that v n+1 ∈ q n (A) −1 K n+1 (A, v ).The induction hypothesis asserts {v 1 , v 2 , . . ., v n } ⊆ q n (A) −1 K n+1 (A, v ) which concludes this direction.
Alternatively, let V m+1 = q m (A) −1 K m+1 (A, v ) be a rational Krylov space with a basis {v 1 , . . ., v m+1 } satisfying (2.5).Thus for n = 1, . . ., m there holds Consequently, there exist numbers {h in , k in } n i=1 ⊂ C such that (2.7) holds.These relations can be merged into matrix form to get (2.2) with the pencil H m , K m being unreduced as a consequence of q m being a nonzero polynomial.
By an inductive argument, using the decomposition (2.2), we arrive at the explicit formula [29] where p 0 (z) := 1 and for j > 0 we have p j (z) = det zK j − H j .The polynomials q j are given by (2.4).Note that p j (z) is the determinant of the upper j × j submatrix of zK j − H j , whilst (−1) j q j (z) is the determinant of its lower j × j submatrix.We only give a brief outline for a proof of (2.10).From (2.6) the relation (2.10) follows for j = 1.Assume (2.10) has been established for j = 1, . . ., n < m and insert it into (2.8).Then (2.10) follows for j = n + 1 by noticing that the right-hand side of (2.8) represents the Laplace expansion of det (zK n − H n ) along the last column.
3. A rational implicit Q theorem.A special case of an orthonormal rational Arnoldi decomposition (2.2) is the polynomial Arnoldi decomposition (1.2).The corresponding polynomial q m is constant and RV m+1 = K m+1 (A, v 1 ).The implicit Q theorem, see [36,Theorem 3.3], states that once the first column of V m+1 is fixed, so is, up to column scaling, the whole matrix There is no essential difference between V m+1 and H m on one side and Vm+1 and Hm on the other.In this sense we say that both V m+1 and H m are essentially uniquely determined by the first column of V m+1 .With Theorem 3.1 we now generalize this result to RADs.Our proof uses the proof of [36,Theorem 3.3] as a template.
Apart from the column scaling of V m+1 , in the rational case the decomposition (2.2) is also invariant (in the sense that it spans the same space, the poles remain unchanged, and the upper-Hessenberg structure is preserved) under right-multiplication by upper-triangular nonsingular matrices R m .Therefore V m+1 and (H m , K m ) are essentially the same as Vm+1 and with poles ξ j = h j+1,j /k j+1,j .For every j = 1, . . ., m the orthonormal matrix V j+1 and the pencil H j , K j are essentially uniquely determined by the first column of V m+1 and the poles ξ 1 , . . ., ξ j .
Proof.The proof goes by induction on j.We assume without loss of generality that h j+1,j = 0 for all j = 1, . . ., m.Otherwise, if h j+1,j = 0 for some j, then 0 = ξ j / ∈ Λ(A) and we can consider interchanging the roles of H m and K m and using where We make frequent use of this relation, reading it column-wise.It is worth noticing that the jth column of L (ξ) m has all but eventually the leading j components equal to zero, and that L (ξ) j is of full rank for all j and ξ.We are now ready to prove the statement.
The jth column in (3.1) for ξ = ξ j gives Since v 1 , . . ., v j+1 are orthonormal we have Rearranging the two equations above we deduce To obtain the last equality we have used , which are the first j − 1 columns in (3.1) with ξ = ξ j .Now, to see that (3.2) defines v j+1 uniquely up to scaling we note that since h j+1,j = 0 the vector q j is also nonzero.Further, q j is essentially unique since L (ξ j ) j−1 ∈ C j×(j−1) is of full column rank and essentially unique by hypothesis.We can set h j+1,j = I − V j V * j A (ξ j ) V j q j 2 .It remains to prove the essential uniqueness of H j , K j .Let us normalize q j = ρ j qj so that qj 2 = 1.Recalling l for i = 1, 2, . . ., j. Defining hj := V * j A (ξ j ) V j qj and hj+1,j := ρ −1 j h j+1,j , the vectors hj and kj := qj + hj /ξ j , where qj = q * j 0 * , are essentially unique.Further, hj+1,j hj+1,j and similarly for k j = l (ξ j ) j + h j /ξ j we find * , and Z j = Z j−1 z j .Since Z j is upper-triangular and nonsingular, H j , K j = Hj Z j , Kj Z j is essentially unique as Hj , Kj is.A further comment for the case m = N − 1 is required.For the polynomial case, i.e., AV N e N is uniquely defined by the starting vector and A and AV N = V N H N holds.This last decomposition is usually stated as the (polynomial) implicit Q theorem and essential uniqueness of H N is rightfully claimed.Let us consider a more general RAD N and right-multiplying with e N gives V * N AV N k N = h N , and for any k N we can find the corresponding h N such that AV N K N = V N H N holds.Therefore we cannot say that H N , K N is essentially unique.In fact, for the polynomial case k N = e N is tacitly fixed.
As already mentioned, a polynomial Krylov space K m+1 (A, v ) with orthonormal basis V m+1 is related to a decomposition of the form where H m is upper-Hessenberg.For a rational Krylov space we have an RAD (2.2) with an upper-Hessenberg pencil H m , K m rather than a single upper-Hessenberg matrix H m .It has been shown in [13,38,40,27] that decompositions of the form (3.3) with H m being semiseparable plus diagonal2 are related to rational Krylov spaces in the same way as RADs are.Corresponding implicit Q theorems have been developed.
if ξ j = 0 then Set k j := q j + h j /ξ j .
else Set k j := h j , and replace h j := q j .end if For j = m set q j+1 := q j+1 , where L

end for
We prefer to work with the pencil H m , K m instead of the semiseparable plus diagonal representation since the former is widely used in practice.In fact, this pencil is a by-product of the rational Krylov method used to construct rational Krylov bases, and which is stated in Algorithm 3.1.We use the notation and correspondingly H m , otherwise.
The vector q j in Alg.3.1 is often called continuation combination as it specifies onto which linear combination of the previously computed basis vectors v 1 , . . ., v j the operator A (ξ) is applied to, cf.line 3, Alg.3.1.The choice made in line 7 is due to Ruhe [32] and it guarantees that the new vector w will be linearly independent of the previous vectors and hence expand the space, provided that we have not reached the invariance index.This can be seen from the proof of Theorem 3.1 or equivalently from (3.1) with m replaced by j.
In (2.10) we have given explicit formulas for the orthonormal vectors v j+1 , implicitly defined by the starting vector v 1 and the poles ξ 1 , . . ., ξ j .Note that the determinants appearing in the formulas do not change if the pencil H j , K j is rightmultiplied by an upper-triangular nonsingular matrix.We now give a formula for (a multiple of) the continuation vector q j+1 = q 2) be an RAD associated with an A-variant space RV m+1 , and let the polynomials p j , q j ∈ P j be as in (2.10).Define the polynomials p [m] j (z) := q m (z)q j (z) −1 p j (z) ∈ P m , j = 0, 1, . . ., m.

Equivalently, p
[m] j (z) is the determinant of the m × m minor of zK m − H m resulting from the removal of the jth row.Then for any ξ ∈ C there holds q (ξ) m+1 = 0 and q where q (ξ) Proof.Let ξ ∈ C be an arbitrary scalar.Label the roots of p j (z) as ϑ [j] 1 , . . ., ϑ [j] j , for j = 1, . . ., m, and the roots of q m as ξ 1 , . . ., ξ m , eventual roots at infinity included.It follows from (2.10) that for all j = 1, . . ., m and i = 1, . . ., j we have ξ j = ϑ [j] i .Otherwise we would have v j+1 ∈ Q j (A, v 1 , q j−1 ) and V j+1 would not be of full column rank.We are ready to prove that q (ξ) m+1 = 0 .Assume that q (ξ) m+1 = 0 .In particular, the first component of q (ξ) m+1 is zero, and thus so is p 1 we may further restrict ξ ∈ {ξ 2 , . . ., ξ m }.Looking at p [m] j (ξ) = 0 for the remaining j = 2, . . ., m we exclude one by one all the ξ j and have ξ ∈ ∅.Hence there is no ξ such that q (ξ) j−1 (A)v 1 and span the A-variant space K m+1 (A, v 1 ).The natural isomorphism between K m+1 (A, v 1 ) and P m allows us to write the decomposition in scalar form as for all z ∈ C. If ξ = 0 the result follows by setting z = 0. Otherwise, using z = ξ, and subtracting p gives the result for ξ = 0.

Remark 3.3 (Evaluating rational functions). The introduction of polynomials p
[m] j is necessary only for the case when q m (ξ) = 0, since then q m (ξ) −1 is infinite.

Moving the poles.
Let us give a brief resume.For a fixed rational Krylov space Q m+1 = Q m+1 (A, v , q m ) the poles are uniquely defined by the starting vector v and, up to scaling of v , the reverse is true.Further, by Theorem 2.5, there exists an orthonormal RAD (2.2) spanning Q m+1 with starting vector v and poles q m .Upon fixing the order of appearance of the poles, Theorem 3.1 guarantees the RAD to be essentially unique.
Observe that Q m+1 can be interpreted as a rational Krylov space with starting vector being almost any vector from Q m+1 .Indeed, let qm ∈ P * m have roots disjoint from Λ(A), then We are now interested in transforming an RAD (2.2) for Q m+1 (A, v , q m ) into one for Q m+1 (A, qm (A)q m (A) −1 v , qm ).For the moment this is only of theoretical importance, however, in subsection 4.3 we show a connection with implicit filtering and provide references to the literature.Further, an application of the ideas developed here to rational approximation is given in section 5.
To get an RAD for Q m+1 (A, qm (A)q m (A) −1 v , qm ) one can focus on either the "new starting vector" or the "new poles".The result is essentially the same and both the starting vector and poles change.We first look at the case when the new starting vector is given as v = V m+1 c, for a nonzero c ∈ C m+1 , and later we focus on the case when the new poles qm are prescribed.
4.1.Moving the poles implicitly.Let v = V m+1 c ∈ Q m+1 (A, v , q m ) be a nonzero vector and take any nonsingular matrix P of size m + 1 such that P e 1 = c.Then where Vm+1 = V m+1 P , Hm = P −1 H m , and Km = P −1 K m .This construction guarantees the first column v1 of Vm+1 to be v , however, the pencil Hm , Km may loose the upper-Hessenberg structure.In the following we aim at recovering this structure in (4.2) without affecting v1 .For that purpose we generalize the notion of RADs by first giving a technical definition.For a matrix X m ∈ C (m+1)×m the notation X −m is used to denote its lower m × m submatrix.
Definition 4.1.Let { Km , Hm } ⊂ C (m+1)×m be matrices.We say that the pencil Hm , Km is regular if the lower m × m subpencil H−m , K−m is regular, i.e., qm (z) = det z K−m − H−m is not identically equal to zero.
Note that an upper-Hessenberg pencil of size (m + 1) × m is unreduced if and only if it is regular.We are now ready to introduce decompositions of the form (4.2).The notion of (orthonormal) basis, space and equivalent decompositions are the same as for RADs.We call a generalized RKD with an upper-Hessenberg pencil a generalized RAD.The two definitions above let us speculate that the unique poles associated with v are the eigenvalues of H−m , K−m .The justification follows from Theorem 2.5 (or Theorem 3.1) and the following result.
Theorem 4.3.Any generalized RKD is equivalent to a generalized RAD with the same starting vector.
Proof.Let (4.2) be a generalized RKD.We need to bring both Hm and Km into upper-Hessenberg form.To achieve this it suffices to bring H−m , K−m into generalized Schur form.The existence of unitary Note that R Vm+1 = RV m+1 with the poles of Hm , Km and H m , K m being identical.The first vector v1 = v 1 is unaffected.This discussion is summarized in Algorithm 4.1, used to replace the starting vector v with v = V m+1 c.Note that there is no guarantee that by transforming an RAD the resulting decomposition is also an RAD, i.e., some poles may be moved to eigenvalues of A. We prove later (cf.Theorem 4.4) that if v = V m+1 c = p m (A)q m (A) −1 v then the poles of the decomposition are always the roots of p m , even if they coincide with eigenvalues of A.

Moving the poles explicitly.
If the vector v is not given as a linear combination v = V m+1 c of the basis vectors V m+1 but rather by specifying the new poles qm one can compute c = V * m+1 v , where v = qm (A)q m (A) −1 v , and still use Alg.4.1 to recover the new decomposition.The vector v = qm (A)q m (A) −1 v can be computed cheaply as a rational Arnoldi approximation v = V m+1 qm (A m+1 )q m (A m+1 ) −1 V * m+1 v , where A m+1 := V * m+1 AV m+1 , see for instance [22].In the following we present an approach that works directly with the pencil H m , K m , changing the poles iteratively one by one, and thence requires no information about the reduced matrix A m+1 .
Moving the first pole.The poles are the ratios of the subdiagonal elements of H m , K m .Applying a Givens rotation G acting on planes (1, 2) from the left of the pencil does not destroy the upper-Hessenberg structure and, as we show, can move the first pole anywhere.We now derive the formulas for s = e Additionally, G is chosen so that ξ1 = h21 / k21 .Using the notation t = s/c, we derive Using standard trigonometric relations we arrive at if t = ∞, and otherwise, s = 1 and c = 0. Formula (4.1) asserts (with the roots of qm being ξ1 , ξ 2 , . . ., ξ m ) that this process replaces the starting vector v 1 with a multiple of A − ξ1 Note that (4.5) holds even if ξ1 ∈ Λ(A) and/or ξ 1 ∈ Λ(A), as long as the starting decomposition exists.If however ξ 1 / ∈ Λ(A) we can further write Moving all poles.Changing the other ratios with Givens rotations results in the loss of the upper-Hessenberg structure.However, the poles are the eigenvalues of the pencil H −m , K −m which is (already) in generalized Schur form.After changing the first pole, using the Givens rotation approach just described, the poles can be reordered (see for instance [25,26]) with the aim of bringing an unchanged pole to the front of the decomposition so that it can be changed using a Givens rotation.This process is formalized in Algorithm 4.2 and an illustration is presented in Figure 4.1.
Let us now consider Alg.4.2 when k = m.For notational convenience only, we assume the poles to be finite.As we have shown in (4.5), after applying the first Givens rotation the starting vector v 1 gets replaced with v where γ 1 ∈ C * is a scaling factor.By reordering the poles we do not affect the "new starting vector" v 1 and bring ξ 2 to the leading positions, i.e., second row, first column, where the next Givens rotation acts.Thus, for j = 2 the Givens rotation replaces v 1 , for some γ 2 ∈ C * .Using (4.6) we obtain

Reasoning inductively we deduce
where 1 , q m is given by (2.4), and qm is defined in an analogous manner.The above discussion is the gist of the following result.
Find Givens rotation G to replace the pole ξ j with ξ where = k − j + 1.

7.
Update Vm+1 := Vm+1 Q m+1 , Hm := Q * m+1 Hm Z m , and Km := Appling the first Givens rotation to replace Reordering the generalized Schur form of the lower 5 × 5 part.V m+1 H m , having poles q m and still spanning Q m+1 .According to Lemma 2.1, v 1 is collinear with v .Therefore, it follows from (4.7) that v1 = γ qm (A)q m (A) −1 v for some scalar γ ∈ C * .The other direction follows from Theorem 2.5 and (4.7) after using Alg.4.2.Theorem 4.4 shows that Alg.4.1 and Alg.4.2 are equivalent, provided that equivalent input data are given.It also shows, together with Theorem 2.5 and Theorem 4.3, that an m + 1-dimensional space V m+1 is a rational Krylov space if and only if there exist a generalized RKD spanning V m+1 .
Remark 4.5 (Recovering the polynomial Krylov space).With qm (z) = 1, there holds ), and we can recover a polynomial Arnoldi decomposition for K m+1 (A, q m (A) −1 v ) from an RAD for Q m+1 (A, v , q m ) using Alg.4.2 with all poles ξj set to infinity.In this particular case, a simpler method is to bring the pencil H m , K m from upper-Hessenberg-upper-Hessenberg to upper-Hessenberg-upper-triangular form using just Givens rotations; see, e.g., [34, p. 495].Bringing the pencil to upper-triangular-upper-Hessenberg form would move all the poles to zero.

4.3.
Implicit filters in the rational Krylov method.Implicit filtering aims at compressing the space where 1 ≤ k ≤ m, q m = q k • qm−k , and p k ∈ P k is a polynomial with roots (infinity allowed) in the region we want to filter out.In applications this technique is usually used to deal with large memory requirements or orthogonalization costs for V m+1 , or to purge unwanted or spurious eigenvalues (see, e.g., [4,7,8] and the references given therein).Implicit filtering for RADs was first introduced in [8] and further studied in [7].Alg.4.2 can easily be used for implicit filtering.In fact, applying Alg.4.2 with the k poles ξj being the roots of p k implicitly applies the filter p k (A)q k (A) −1 to the RAD.The k "new" poles correspond to the rightmost k columns in Vm+1 , Km and Hm , cf. Figure 4.1.Hence, truncating the decomposition to the leading m + 1 − k columns completes the process.The derivation and algorithms in [7,8] are different, and it would perhaps be interesting to compare them.This is, however, not done here.Pertinent ideas for polynomial Krylov methods have recently appeared in [4] where the authors relate implicit filtering in the Krylov-Schur algorithm [35,37] with partial eigenvalue assignment.
As an alternative to Alg. 3.1 for a Hermitian matrix A, it was proposed in [28] to use the spectral transformation Lanczos method with change of (the repeated) pole.The approach for changing poles taken here is different and more general.
5. An application to rational least squares approximation.Given matrices {A, F } ⊂ C N ×N and a unit 2-norm vector v ∈ C N , we consider in this section the following rational least squares problem: find a rational function R min m (z) of type (m, m), with m < M , such that This is a nonlinear approximation problem as the denominator of R min m is unknown.Hence an iterative algorithm is required.
Let q m ∈ P m be a given polynomial and consider the linear space of rational functions of type (m, m) with denominator q m , denoted by P m /q m .Each element R m ∈ P m /q m is in a one-to-one correspondence with an element R m (A)v of the rational Krylov space Q m+1 (A, v , q m ).Instead of (5.1) we now consider a linear Compute RAD AV m+1 K m = V m+1 H m with v 1 = v / v 2 and poles {ξ j } m j=1 .
approximation problem: find a unit 2-norm vector v This means that F v is best approximated by an element of Q m+1 (A, v , q m ).Problem (5.2) is easy to solve.Let V m+1 ∈ C N ×(m+1) be an orthonormal basis of Q m+1 and write y = V m+1 c with c ∈ C m+1 and c 2 = 1.Then the inner minimum in (5.2) is a linear least squares problem whose solution R m (A)v is given by orthogonal projection of A minimizer c can be obtained as a right singular vector of (I − V m+1 V * m+1 )F V m+1 corresponding to a smallest singular value σ min .We now exploit that by Theorem 4.4 we can associate with v a "new" rational Krylov space Q m+1 (A, v , qm ) = Q m+1 (A, v , q m ), where the roots of qm (the "new" poles) can be computed from c using Alg.4.1.The new vector-pole pair (v , qm ) is optimal in the sense that F v − Ȓm (A)v 2 is minimal (and equal to σ min ) among all Ȓm ∈ P m /q m , and there is no better vector-pole pair associated with Q m+1 (A, v , q m ).Replacing v back to v , we hope that the new rational Krylov space Q m+1 (A, v , qm ) contains a better approximation to F v than Q m+1 (A, v , q m ).In this case we have found an improved denominator qm for the rational function R m in (5.1).
The procedure described is iterated, computing a new rational Krylov space at each iteration and changing the poles by modifying the starting vector.The complete procedure is given in Algorithm 5.1 under the name RKFIT, which stands for Rational Krylov Fitting.A MATLAB implementation of RKFIT is available in [2].
Discussion.In this and the following subsections we briefly discuss Alg.5.1 in a list of comments and some numerical experiments.A more detailed analysis will be given in a separate publication [3].
1.If A = diag(λ j ) and F = diag(ϕ j ) are diagonal matrices, and v = [v 1 , . . ., v N ] T , then (5.1) corresponds to a rational weighted least squares problem General rational optimization problems of this type are nonconvex and hence no iterative method for their solution comes with a guarantee to find a global minimum.One popular approach for solving such problems is known as vector fitting [20,19], which similarly to Alg. 5.1 is based on iterative relocation of the poles of rational least squares approximants in partial fraction form.
2. The use of partial fractions as basis functions in [20] may result in poorly conditioned linear algebra problems to be solved, and orthonormal vector fitting [10] tries to overcome this problem by using instead an expansion of R m into orthonormal rational functions; orthonormal with respect to a measure supported on the imaginary axis.The orthonormal rational functions used in [10] are computed by a Gram-Schmidt procedure applied with a quadrature approximation for the inner products.The implications of quadrature-based vector fitting on optimal H 2 approximation of dynamical systems are discussed in [11].
3. In the case that A is a normal matrix, RKFIT can be interpreted as an orthonormal vector fitting algorithm where two rational functions r m and ȓm with common denominator q m are orthonormal with respect to a discrete inner product defined as (See [5,9] for the theory of orthogonal rational functions and their relation to rational Krylov spaces.)RKFIT is different from orthonormal vector fitting in that it uses a discrete inner product defined by a measure not necessarily supported on the imaginary axis.The orthonormal rational functions are computed by the rational Krylov method without the need for explicit quadrature.In [6] it is advocated to use orthogonal rational basis functions with fixed poles for least squares fitting.This leads to a linear least squares problem but does not resolve the problem of pole relocation.4. If A has Jordan blocks of size 2 or greater then also derivatives of R m (z) at (some of) the eigenvalues of A are fitted.Consider, for example, Then each component of R m (A)v is a weighted sum of derivatives R (j) m (λ) and one can use v to choose the weights as required.This generalizes naturally to matrices A with more than one Jordan block.
5. If F = f (A), each iteration of Alg.5.1 requires the computation of f (A)V m+1 .If one is interested in scalar rational approximation problems where A is a diagonal matrix (see point 1), or a Jordan block matrix (see point 4), then f (A) is easy to compute.Otherwise rational Krylov techniques can be used to approximate f (A)V m+1 directly (see, e.g., the review [22]).
after each iteration of RKFIT (solid red) and VFIT (dashed blue).The two convergence curves for each method correspond to different choices for the initial poles Ξ 1 (circles), Ξ 2 (squares), and Ξ 3 (triangles), respectively.Right: Plot of |F (z)| over an interval on the imaginary axis overlaid with the approximants |R m (z)| obtained after 10 iterations of RKFIT and VFIT with initial poles Ξ 1 (the curves are visually indistinguishable).
6.If A, F , and v are real-valued and the initial poles {ξ j } m j=1 appear in complex conjugate pairs, it is natural to enforce real arithmetic in all operations of Alg.5.1.This can be achieved by using the real form of the rational Krylov method [30] instead of Alg.3.1, and a generalized real Schur form (see, e.g., [36, § 3.1]) in Step 2 of Alg.4.1.Our RKFIT implementation [2] provides a 'real' option for this purpose.
7. The output vector c ∈ C m+1 returned by Alg.5.1 collects the coefficients of the approximant R m (A)v in the rational Krylov basis V m+1 , i.e., R m (A)v = V m+1 c.Using Theorem 3.2 and Remark 3.3 we find that R m (z) can be evaluated for any point z ∈ C by computing a full QR factorization of zK m − H m and forming an inner product of c with the last column q (z) m+1 of the Q factor scaled by its first entry, i.e., R m (z) = (q In the following we discuss three experiments with the aim of providing further insight and showing the applicability of RKFIT.Accompanying MATLAB scripts to reproduce these experiments are available as part of [2].All computations were performed with MATLAB version R2013a on an Intel Core i5-3570 processor running Scientific Linux, Release 6.4 (Carbon).

Experiment 1:
Fitting an artificial frequency response.We first consider a diagonal matrix A ∈ C N ×N with N = 200 logarithmically spaced eigenvalues in the interval [10 −5 i, 10 5 i].The matrix F = F (A) is a rational matrix function of type (19,18) given in partial fraction form in [20, subsection 4.1], and v = [1, 1, . . ., 1] T .We compare RKFIT to the vector fitting code VFIT [20,19] which is available online. 3e consider three different sets of starting poles, namely Note that the initial poles Ξ 1 are placed on the branch cut of z 1/2 , which is a reasonable initial guess for the poles of R m .Some of the poles Ξ 2 are located very close to the eigenvalues of A whose spectral interval is approximately [0,4].The convergence of the relative error per iteration of RKFIT and VFIT is shown on the left of Figure 5.2.In order to use VFIT for this problem we have diagonalized A and provided the code with weights corresponding to the components of v in the eigenvector basis of A. All tests converge within at most 9 iterations, with the fastest convergence achieved by RKFIT with initial guess Ξ 1 .On the right of Figure 5.2 we show the error |z 1/2 − R m (z)| over an interval containing the spectrum of A. • Ξ 1 : 16 poles equal to 0; • Ξ 2 : 16 poles equal to −10; • Ξ 3 : 16 infinite poles.Note that A is not diagonalizable and therefore VFIT cannot be applied as in the previous two experiments.On the left of Figure 5.3 we observe excellent convergence of RKFIT within 2 iterations starting with the initial poles Ξ 1 and Ξ 3 .
With the initial poles Ξ 2 the error stagnates on a higher level, possibly trapped nearby a non-global minimum.As is the case with any nonlinear iteration, RKFIT is not guaranteed to converge to a global minimum (if it even exists).We currently do not have a good explanation why the initial guess Ξ 2 is bad, but we have verified that ξ = −10 lies in the 10 −6 -pseudospectrum of A and hence the initial rational Krylov space may have too large components in just a few eigendirections of A.  , where R m is the rational least squares approximant obtained after 10 iterations of RKFIT with initial poles Ξ 1 .The eigenvalues of the Grcar matrix and its 10 −6 -pseudospectrum are also shown.
6. Summary and future work.We introduced the notion of generalized rational Krylov decompositions and studied their connections with rational Krylov spaces.We generalized the implicit Q theorem to the rational case and have provided some insight for the continuation combination proposed by Ruhe [32] for building rational Krylov spaces.Algorithms for transforming generalized RKDs and thereby changing the poles and starting vector of the associated spaces were presented.These algorithms, in particular Alg.4.2, can also be employed for implicit restarting in polynomial and rational Krylov decompositions and even eigenvalue assignment, cf.[4].A comparison with existing algorithms for the same purpose might be interesting.

Definition 4 . 2 .
A relation of the form (4.2) where Vm+1 is of full column rank and Hm , Km is regular is called a generalized rational Krylov decomposition.The generalized eigenvalues of H−m , K−m are called poles of the decomposition.If the poles of (4.2) are outside the spectrum Λ(A), then (4.2) is called a rational Krylov decomposition (RKD).
are both upper-triangular follows from [15, Theorem 7.7.1].Multiplying A Vm+1 Hm = Vm+1 Km from the right with Z m and "inserting"

Fig. 4 . 1 :
Fig. 4.1: Looking at the 6 × 5 upper-Hessenberg pencil while Algorithm 4.2 is applied on the corresponding RAD with k = 2.The original poles are the ratios x/,. . .|/.The first two poles are replaced with / and /.The transition from × to ⊗ symbolizes that the element potentially changes.

Fig. 5 . 1 :
Fig. 5.1: Least-squares approximation of a rational function F (z) of type (19, 18) using RKFIT and the vector fitting code VFIT.Left: Relative error F(A)v −R m (A)v 2 / F (A)v 2after each iteration of RKFIT (solid red) and VFIT (dashed blue).The two convergence curves for each method correspond to different choices for the initial poles Ξ 1 (circles), Ξ 2 (squares), and Ξ 3 (triangles), respectively.Right: Plot of |F (z)| over an interval on the imaginary axis overlaid with the approximants |R m (z)| obtained after 10 iterations of RKFIT and VFIT with initial poles Ξ 1 (the curves are visually indistinguishable).

5. 3 .
Experiment 3: Exponential of a nonnormal matrix.We consider the approximation of F v with the matrix exponential F = exp(A) of a Grcar matrix A of size N = 100 generated in MATLAB via A = -5*gallery('grcar',N,3).The eigenvalues and 10 −6 -pseudospectrum of A are shown on the right of Figure 5.3.The vector is v = [1, 1, . . ., 1] T and we consider different sets of initial poles for RKFIT,

Fig. 5 . 3 :
Fig. 5.3: Least-squares approximation of the function F (z) = exp(z) using RKFIT.Left: This plot shows the relative approximation error F(A)v − R m (A)v 2 / F (A)v 2after each iteration of RKFIT (solid red) for different choices of initial poles Ξ 1 , Ξ 2 , and Ξ 3 , respectively.Right: A plot of |F (z) − R m (z)| over a region in the complex plane together with the poles of R m (green crosses), where R m is the rational least squares approximant obtained after 10 iterations of RKFIT with initial poles Ξ 1 .The eigenvalues of the Grcar matrix and its 10 −6 -pseudospectrum are also shown.