Paraunitary Matrices, Entropy, Algebraic Condition Number and Fourier Computation

The Fourier Transform is one of the most important linear transformations used in science and engineering. Cooley and Tukey's Fast Fourier Transform (FFT) from 1964 is a method for computing this transformation in time $O(n\log n)$. From a lower bound perspective, relatively little is known. Ailon shows in 2013 an $\Omega(n\log n)$ bound for computing the normalized Fourier Transform assuming only unitary operations on two coordinates are allowed at each step, and no extra memory is allowed. In 2014, Ailon then improved the result to show that, in a $\kappa$-well conditioned computation, Fourier computation can be sped up by no more than $O(\kappa)$. The main conjecture is that Ailon's result can be exponentially improved, in the sense that $\kappa$-well condition cannot admit $\omega(\log \kappa)$ speedup. The main result here is that `algebraic' $\kappa$-well condition admits no more than $O(\sqrt \kappa)$ speedup. The definition of algebraic condition number is obtained by formally viewing multiplication by constants, as performed by the algorithm, as multiplication by indeterminates, giving rise to computation over polynomials. The algebraic condition number is related to the degree of these polynomials. Using the maximum modulus theorem from complex analysis, we show that algebraic condition number upper bounds standard condition number, and equals it in certain cases. Algebraic condition number is an interesting measure of numerical computation stability in its own right. Moreover, we believe that the approach of algebraic condition number has a good chance of establishing an algebraic version of the main conjecture.


Introduction
The (discrete) normalized Fourier transform is a complex linear mapping sending an input x ∈ C n to y = F x ∈ C n , where F is an n × n unitary matrix defined by F (k, ) = n −1/2 e −i2πk /n . (1) in C n , representing a linear transformation of the input. Each coordinate of the vector is a complex number stored in a memory location (or rather, two real numbers, one representing the real part and the other the imaginary part)t. The next state is obtained from the current one by a simple linear algebraic transformation. We will define this precisely in what follows.
As for lower bounds, it is trivial that computing the Fourier Transform requires a linear number of steps. Papadimitriou proves in [10] that the graph structure of Cooley and Tukey's FFT algorithm is optimal for a version of the transformation over finite fields. The result, however, assumes that the algorithm can only multiply by roots of unity of order n, and cannot 'take advantage' of any algebraic identity involving these roots. It is not clear how to apply these results to computation over the reals, or over the complex fields. Winograd [11] was able to reduce by 20% the number of multiplications in Cooley and Tukey's FFT, without changing the number of additions. In [12] the same author shows that at least linearly many multiplications are needed for performing Cooley and Tukey's algorithm. We note that the separation between 'additions' and 'multiplications' should be done with care in the real/complex setting, due to a question of scaling, as we now explain. Cooley and Tukey's original algorithm was defined for the unnormalized transform √ nF . A normalized version would need to scale down the final result by a factor of √ n. We should be, at the very least, conscious of the issues arising when outputting numbers of magnitude Ω( √ n) larger than the input numbers. At first sight this seems to be hardly an issue when computing with 64 bit words, but for various optimization purposes and applications we might actually want to squeeze multiple numbers in one computer word, in which case this may be an issue. One way to deal with this is to normalize each addition in the original FFT by dividing by √ 2. But this introduces a new multiplication together with each addition, so the separation between multiplications and additions is an approach that relies on scale. In 1973, Morgenstern proved that if the modulus of constants used in an unnormalized Fourier transform algorithm are bounded by a global constant, then the number of steps required is at least Ω(n log n). He used a potential function related to matrix determinant. However, this result is also scale dependent, because it does not provide lower bounds for the (normalized) F . Ailon [1] considered a computational model allowing only 2 × 2 rotation gates acting on a pair of coordinates. He showed an Ω(n log n) lower bound on the number of such gates required for the (normalized) Fourier transform. The proof was done by defining a potential function on the matrices M (t) defined by the transformation that takes the input to the machine state in step t. The potential function is simply the sum of Shannon entropy of the probability distributions defined by the squared modulus of elements in the matrix rows. (Due to orthogonality, each row, in fact, defines a probability distribution). That work raised the question of why shouldn't it be possible to gain speed by 'escaping' the unitary group? In [3] he added nonzero, constant multiplication gates acting on a single coordinate to the model. By extending the entropy function to a so-called quasi-entropy, so called because it is applied to possibly negative numbers, the approach allows proving that a speedup in computing any scaling of FFT requires either working with very large or very small numbers (asymptotically growing accuracy). The bounds are expressed using a quantitative relationship between time (number of gates) versus the notion of condition number of matrices. More precisely, it is shown that speedup of FFT by factor of b implies that at least one M (t) must be Ω(b)-ill-conditioned. In [2], Ailon further showed that speedup implies ill-condition at many points in the computation, affecting independent information in the input. The work herein provides another result in this thread.
Ailon's results [1,3,2] are not satisfactory in the sense that the tradeoff between speed and condition number is quite weak, and is believed to be suboptimal. For example, in the extreme case the results tells us that, if a linear time Fourier transform algorithm existed, then it would require a condition number of Ω(log n) for most M (t) 's. Such a restriction, though a step toward resolving the question of complexity of Fourier transform, is very mild, because (at least intuitively) such condition number can be accommodated using O(log log n)-bit words. The net result is total running time (counting bit operations) of O(n log log n). The main open question, informally stated, is as follows. We will restate it in a precise way in Section 6.

Our Contribution
This work defines a new algebraic notion of matrix condition number, which upper bounds the usual definition of condition number, and equals it in some cases. Using this new notion, we are able to show that κ-well conditioned computation allows only O( √ κ) speedup of FFT, in other words, a quadratic improvement on the previous result (albeit with a weaker notion of condition). More importantly, we believe that an algebraic version of Conjecture 1 is achievable with respect to algebraic condition, and justify this belief at the end, in Section 6.
We also note that in the quantum world (see eg [9] and references therein), an O(log 2 n)time algorithm is known using quantum gates, each unitary acting on only 4 entries. However, we do not deal with quantum computation here.

Fourier Transform Types
In theory and practice of engineering and computer science, there are two main types of Fourier transforms considered. The n-dimensional Discrete Fourier Transform (DFT), as defined in (1), is a complex unitary mapping defined by the characters of the Abelian group Z/nZ. The Walsh-Hadamard transform is a real orthogonal mapping defined by the characters of the n dimensional binary hypercube (Z/2Z) n . More precisely, for n an integer power of 2, the (i, j)'th matrix element is defined as 1 , where ·, · is dotproduct, and [p] denotes the bit representation of the integer p ∈ {0, . . . , n − 1} as a vector of log n bits. Similarly to FFT for DFT, there is an O(n log n) algorithm for computing the Walsh-Hadamard transform of an input x.
Throughout, we will assume n is an integer power of 2 and will use F to denote either the n−Walsh-Hadamard, which is a real orthogonal transformation, or the real representation of the (n/2)-DFT, which is a complex unitary transformation. By real representation, we mean the standard view of a complex number a + ιb as a column vector (a, b) T in two dimensional real space, and the multiplicative action a + ιb → (c + ιd)(a + ιb) given by left-multiplication by the matrix In that way, a vector in n dimensional complex space embeds to a 2n dimensional real space, and a n×m complex matrix embeds as a (2n)×(2m) real matrix. This allows us to avoid defining a computational model over the complex field, because such computation can be emulated by reals anyway.

Computational Model and Notation
We remind the reader of the computational model discussed in [1,3], which is a special case of the linear computational model. The machine state represents a vector in R for some ≥ n, where initially it equals the input x ∈ R n (with possible padding by zeroes, in case > n). Each step (gate) is either a rotation or a constant. A rotation applies a 2×2 orthogonal operator mapping a pair of machine state coordinates (rewriting the result of the mapping to the two coordinates). Note that a rotation includes the 2×2 mapping affecting a single coordinate, eg −1 0 0 1 . (Technically this is a reflection, but we refer to it as a rotation gate here.) A constant gate multiplies a single machine state coordinate (rewriting the result) by a positive real constant. In case = n we say that we are in the in-place model. Any nonsingular linear mapping over R n can be decomposed into a sequence of rotation and constant gates in the in-place model, and hence our model is universal. Normalized FFT works in the in-place model, using rotations only. A restricted method for dealing with > n was developed in [3]. We focus in this work on the in-place model only.
Since both rotations and constants apply a linear transformation on the machine state, their composition is a linear transformation. If A n is an in-place algorithm for computing a linear mapping over R n , it is convenient to write it as A n = (M (0) = Id, M (1) , . . . , M (m) ) where m is the number of steps, M (t) ∈ R n×n is the mapping that satisfies that for input x ∈ R n (the initial machine state), M (t) x is the machine state after t steps. (Id is the identity matrix). The matrix M (m) is the target transformation, which will typically be F in our setting. For t ∈ [m] := {1, 2, . . . , m}, if the t'th gate is a rotation, then M (t) defers from M (t−1) in at most two rows, and if the t'th gate is a constant, then M (t) defers from M (t−1) in at most one row. Throughout, for a matrix M , we let M i,: denote the i'th row of M . The symbol ι denotes √ −1. For an integer a, notation [a] is shorthand for the set {1, . . . , a}. All logarithms are assumed to be base 2, unless the base is explicit, as in log 5 6. For a matrix M over the field of real or complex numbers, M is the spectral norm.

Replacing constant gates with an indeterminate
Fix an in-place algorithm A = (M (0) = Id, M (1) , . . . , M (m) = F ) computing F . Let C A denote the set of values used in the constant gates. Let 0 < ∆ ≤ 11 be some real number. If each c ∈ C A is an integral power of ∆, then we say that A is integral with respect to ∆. We will assume that ∆ ≥ 2/3 (otherwise we could replace ∆ with ∆ 1/ for a suitable integer , with respect to which A is still integral). Assume that A is integral with respect to some 2/3 ≤ ∆ ≤ 1. (In what follows, we shall make this assumption, but remove it in the appendix using limit arguments.) We define a new algorithm A ∆ with corresponding matrices (M is now an n × n matrix over the ring of Laurent polynomials with complex coefficients in an indeterminate z. 1 (Throughout, by polynomials we refer to Laurent polynomials, and by proper polynomials we mean that only nonnegative power monomials are allowed). To get accustomed to our notation, note that for some matrix A = A[z] in this ring we can write e.g. A 1,3 [7], referring to the (complex) number in the first row, third column of the matrix obtained by assigning the value 7 to z in M ∆ depending on whether the t'th gate is a rotation, or a constant gate. In case of rotation, the matrix M In case of a constant gate with value 0 < c ∈ C A , we replace the value c with the By the integrality of A with respect to ∆, the number log ∆ c is indeed an integer. From the definition of M In words, this means that the t'th matrix in the original algorithm A is obtained by polynomial evaluation at ∆.
Throughout, for a polynomial p = p[z], we let coeff i (p) denote the coefficient (matrix, vector or scalar) of p corresponding to z i . In particular, p = i coeff i (p)z i . 2

Paraunitary Matrices and Algebraic Condition Number
Here and throughout, for a complex number u,ū is complex conjugation.
Paraunitary matrices are used in signal processing (see e.g. [13]), and have some useful properties for our purposes. First, they form a multiplicative group. This fact is useful for verifying the following: The following is easy to see from the definitions, using simple induction: is a (complex) unitary matrix for any ω a complex number on the complex unit circle.
This will be used below. We now remind the reader of the definition of a κ-well conditioned algorithm [2,3]. In this work, we will refer to this standard notion of well conditionedness geometric, because it is related to geometric stretching of vectors under linear operators.

Definition 5. The (geometric) condition number of a nonsingular complex matrix M is
We define a different, though not unrelated (as we shall see below) notion of condition number of an algorithm. We first define it for ∆-integral algorithms, and then extend to general algorithms.
Although the definition seems to depend on it, the parameter ∆ is immaterial for defining (and thinking about) algebraic condition number. It is trivial to see that if A is integral with respect to both ∆ and ∆ and is ∆-algebraically κ-well conditioned, then it is also ∆ -algebraically κ-well conditioned. In fact, even if an algorithm is integral with respect to no ∆, we can extend the definition by using limits. Being algebraically κ-well conditioned is henceforth a property of any algorithm A, regardless of integrality. The precise definitions are given in Appendix A, for completeness. We henceforth define: If we consider some intermediary matrix M (t) given rise to by our algorithm A, then algebraic well conditioning cannot be determined from M (t) 'in vitro', because it depends on how we reached M (t) from the initial state Id. This is in contrast with the notion of (geometric) condition, which can be determined from a single matrix. Nevertheless, the following lemma tells us that the two notions are quantitatively related.

Lemma 8. If an algorithm
A is algebraically κ-well conditioned, then it is geometrically κ-well conditioned. There exist algorithms for which the two notions of condition number coincide.
The proof of this connection requires some complex analysis, and in particular the maximum modulus theorem. We provide it here only for the case A is integral with respect to some ∆. The extension to general A is not difficult using limit arguments, and we present it in Appendix A.2.

XX:7
By Claim 3 and Definition 2, the inverse of the matrix M By construction, the polynomial matrix M d is a proper polynomial, and hence again by the (weak) maximum modulus theorem, ∆ , and this holds in particular for z = ∆. Hence the last inequality gives ∆ d (M (t) ) −1 ≤ 1 or: Combining (2) with (3), we get where the rightmost inequality is by the lemma's assumption. This concludes the first part of the result.
To show tightness, it is enough to work with 2 × 2 matrices. Consider an algorithm A performing two steps. The first multiplies the first coordinate by a positive number ∆, and the second multiplies the second coordinate by ∆ −1 . It is easy to see that both the algebraic and the standard notions of condition number coincide for this case.
We are now ready to state our main result.
The proof is presented in the remainder of the paper, for the case A is integral with respect to some 2 3 ≤ ∆ ≤ 1. The general case, using limit arguments, is deferred to Appendix A.3. We will use the notation ρ := log κ 2 log ∆ −1 , or ∆ −ρ = √ κ. We will also make the implicit assumption that κ = n o (1) , because otherwise the statement of the theorem is obvious from the trivial observation that computation of F requires linearly many steps.

Polynomial Matrices: Norms and Entropy
We will need to define a norm for polynomials, polynomial vectors and polynomial matrices, over the complex field. For a polynomial p = p[z] ∈ C[z], we define The following is well known from harmonic analysis: 3 The matrix M ∆ is real, and hence its transpose equals its conjugate-transpose. We prefer to keep the complex notation (·) * .)

For a polynomial vector
where we remind the reader that v i is the i'th coordinate of v. Finally, for a polynomial matrix A, we define A 2,∞ as shorthand for max i A i,: . Let M denote a real paraunitary matrix. Then we extend the definition of the quasientropy potential from [3,2] as follows: where h : C → C is defined by h(u) = −u log |u|. (Note for reviewers: No, this is not a mistake, and we do not want to define h as −|u| log |u|. We will be using h for possibly negative and even complex values of u, where this makes a difference. 4 ) For the Fourier transform F , Φ(F ) = Θ(n log n) .
We also define a preconditioned version of Φ for any paraunitary matrix M , parameterized by two polynomial matrices A, B: where the outer sum is over k for which the internal sum is not null, namely in the range Note that Φ Id,Id (M ) = Φ(M ). The general idea of a preconditioned quasi-entropy potential was developed in [2]. The idea there (and here) is to carefully choose fixed A and B, and to track the potential along the evolution of M under algorithm steps. The following key lemma bounds the absolute value of the change in entropy of a paraunitary matrix, incurred by multiplication of rows by monomials (corresponding to constant multiplication in the original algorithm) and by planar rotations (corresponding to rotations in the original algorithm).

Lemma 10. (1) Let M [z] be a paraunitary matrix, and let A[z], B[z] be some complex matrices. Let M [z] be a paraunitary matrix obtained from M [z] by multiplication by a diagonal matrix with monomials on the diagonal, that is
(2)If M is obtained from M by a planar rotation action, then Proof. Part (1) is trivial, from the definitions. For part (2), polynomial matrix potential function is, equivalently, a scalar matrix potential function (defined in [3]) if we replace each polynomial by a row-vector of its coefficients. Therefore, using the potential change bound theorem of [3], we have the statement of the theorem. For self containment, we also provide a full proof in Appendix C.

The preconditioners
We now define our two polynomial matrix preconditioners, A and B, that will be used to define a potential for proving the main theorem. For convenience, we will define µ = 1 − ∆, and let be the smallest integer such that ∆ ≤ 1/2. By our assumptions on ∆, this also implies ∆ ≥ 1/4, hence ∆ = Θ(1). Also note that The preconditioner A is defined as . Claim 11. The (matrix) coefficient of P corresponding to z −i is exactly F ∆ i for all i in the range R := {ρ, ρ + 1, . . . , ρ + }.  (8) so that the potential is high. At the same time, in the beginning of the computation, the potential is low because A has only very few nonzeroes (namely, on the diagonal). Creating this big potential gap, which we quantify in Lemma 13 below, is the one of the requirement for the proof to work. The other is upper bounding the potential change at each step, as we now do.

Lemma 12. For any paraunitary M ,
The first equality was by definition, the second by (5), the fourth by changing order of integration and summation, the fifth by the definition of Y , the sixth by our definition of polynomial conjugation, the seventh by the fact that polynomial evaluation commutes with polynomial multiplication, the eighth by definition of matrix multiplication, the ninth by properties of conjugation of products, the tenth by unitarity of X, the eleventh by commutativity of p (a polynomial over scalars) and P (a polynomial over matrices), the twelfth by paraunitarity of U , and the fourteenth again by (5). Now, by our construction, M A equals U [z]p[z]X ∈ X for U = M , X = Id, p[z] = 1 + ∆z −1 + ∆ 2 z −2 + · · · ∆ ρ+ z −ρ− . Using (11),

As for M B, notice that it equals
Therefore again we use (11) to obtain: This concludes the proof.
Let us now compute the preconditioned quasi-entropy of M

XX:11
Proof. For the left bound in the lemma statement: Recall that ∆ −ρ = κ 1/2 = n o (1) . Also recall that by definition of , ∆ = Θ(1). Therefore log ∆ −k is o(log n) for all k ∈ R, and where the second equality is from F 2 i,j = n (unitarity of F ) and definition of Φ, the third is from the fact that ∆ i = Θ(1) for all i = 0 . . . , and the fourth from (7). For the right bound in the lemma statement: Fixing i ∈ [n], we upper bound the absolute value of the last inner sum, which equals: To bound E 1 , we first Lemma 12, to argue that for all i ∈ [n], B i,: 2 = +1. (To be precise, it was only shown that B i,: 2 = Θ( ) but it is obvious from the proof of Lemma 12 that the exact value is + 1.) This in particular implies that This can be easily shown to be Θ(κ −1/2 ), using standard calculus, and using the fact that the coefficients ∆ −k are equal to each other (and to ∆ ρ = κ −1/2 ) up to a multiplicative global constant. We provide details in Appendix B. To bound E 2 , we use the fact that for all k ∈ R, ∆ −k = Θ(∆ ρ ) = Θ(κ −1/2 ) and log κ = o(log n). Therefore |E 2 | is But by the well known 1 / 2 bound, we have that Combining, we have that |Φ A,B (Id)| = o( κ −1/2 n log n).
Combining Lemma 13 and 10 with Claim 12, we get that the algorithm A increases the potential defined by the preconditioned quasi-entropy Φ A,B (·) from o n log n √ κ to Ω n log n √ κ , and at each step it can change the potential by no more than Θ( √ )Θ( √ ) = Θ( ). Hence, the number of steps is as stated in Theorem 9, concluding its proof.

Future Work
The main open question in this line of work is Conjecture 1. Since algebraic conditionedness is a weaker notion than typical (geometric) conditionedness, it should be expected that proving the conjecture with respect to the algebraic notion of condition number is easier: [Algebraic version] κ-Algebraic well conditioned computation allows only O(log(κ)) speedup of FFT.
We now explain why this conjecture is very reasonable, using insights from this work. Recall that we started with an algorithm A = (Id = M (0) , M (1) , . . . , M (m) = F ) working in R n , and mapped it to a pseudo-algorithm A ∆ working in n dimensions over the ring of polynomials, for the purpose of analysis. We then applied the resulting operator M (m) ∆ outputted by A ∆ to the preconditioner A by left-multiplication, yielding a polynomial matrix that is not far from being paraunitary, and with multiple copies of F scaled down by Θ( √ κ) as coefficients. This scaling down is the reason for the denominator we get in the bound of Theorem 9. Interestingly, very recently Ailon and Yehuda [5] were able to show that any algorithm computing a so-called "ε-Perturbation of Fourier", defined as Id +εF (for some small epsilon) in an almost perfectly conditioned algorithm must make at least Ω n log n log(1/ε) steps. 5 Moreover, the weaker bound Ω n log n √ 1/ε , explained as a 'warmup' in [5], used ideas that are somewhat similar to the ideas in this paper. This leads us to believe that ideas possibly similar to those achieving the logarithmic dependence in 1/ε in [5] might be used to get logarithmic dependence on κ as stated in our conjecture. So far we have failed in these attempts. It is interesting to study algebraic condition number in its own right as a measure of numerical stability. Consider the following related question: Lemma 8 tells us that if an algorithm is algebraically κ well conditioned then it is also (geometrically) κ-well conditioned. The converse is clearly not true, but perhaps some weak form of the converse should hold. In the extreme case, for example, note that an algorithm that is (geometrically) 1-well conditioned must also be algebraically 1-conditioned, because such an algorithm can only use constants 1 and −1. Could we generalize this observation to obtain some kind of converse for Lemma 8?
Some more noteworthy open problems: (1) Extending the results to the extra-memory (non in-place) model. (2) Obtaining computational lower bounds for computing a Johnson-Lindenstrauss transform. There has been work showing that a JL transform can be sped up using the Fourier transform [4], but computational lower bounds for Fourier do not imply computational lower bounds for JL. Note that there are tight lower bounds for dimensionality parameters of JL [8], but they are not computational.

A The General Case: Algorithms That are Not Integral
In this section we show how to extend the main result in this work to the case in which the algorithm A is integral with respect to no ∆.

A.1 Extension of 'Well Conditioned' Definition to the General Case
Fix an algorithm A = (Id = M (0 , . . . , M (m) ) in R n . Let be an infinite, increasing sequence of numbers in the open interval 2 3 , 1 , tending to 1 in the limit. For each index q we define another algorithm A q = (M The resulting algorithm A q is clearly integral with respect to ∆ q . Additionally, it is not hard to see by standard limit calculus and by induction, that tor each t = 0 . . . m, lim q→∞ M (t) q = M (t) . We are now ready to extend Definition 6 to the non integral case.
It should be noted that if A is algebraically κ-well conditioned, then any sequence ∆ 1 < ∆ 2 < · · · (and not just one) will serve as a witness for the definition. It should also be noted that Definition 15 coincides with Definition 6 for algorithms A that are ∆-integral for some ∆. This can be seen, for example, by choosing ∆ q = ∆ 1/q and κ q = κ for all q.

A.2 Proof of Lemma 8 for The General Case
Assume A = (M (0) , . . . , M (m) ) is algebraically κ-well conditioned (per the general Definition 15). Let ∆ 1 < ∆ 2 < · · · and κ 1 , κ 2 , . . . be as guaranteed to exist by the definition. Then by Lemma 8 (established for the integral case), the algorithm A q is geometrically κ qwell conditioned for all q. But the geometric condition number of a matrix is a continuous invariant in the domain of nonsingular matrices, because both spectral norm and matrix inverse are continuous functions over matrices. Therefore, for all t = 0 . . . m, M (t) (which is also the limit of M (t) q ) is geometrically κ-well conditioned.

A.3 Proof of Theorem 9 for the General Case
Assume A = (M (0) , . . . , M (m) ) is algebraically κ-well conditioned (per the general Definition 15). Let ∆ 1 < ∆ 2 < · · · and κ 1 , κ 2 , . . . be as guaranteed to exist by the definition. Let F q = M (m) q , namely, the resulting matrix computed by A q . First we note that lim Fq→F = F . For each q, we now want to use the statement of the theorem in the integral case, which we have proved. The problem is that the resulting matrix F q does not necessarily have the two key properties of Fourier transform used in the theorem: (1) unitarity and (2) high potential Φ(F q ).
In order to adjust the proof for the case of ∆ q -integral algorithm A computing F q (instead of F ), we will first assume that q is large enough so that This is achievable, because F i,j = 0 for all i, j, for both types of Fourier transform. 6 In particular, (13) implies: The trick is now to use preconditioners A q , B q as defined in Section 5. 6 We could also make due with zero valued matrix elements, but for simplicity we avoid this technicality here.

XX:15
where ρ q is log κq 2 log ∆ −1 q and q is the smallest integer such that ∆ k ≤ 1/2. Note that in the definition of B q we used F and not F q . This is important, because we need Claim 12 and Lemma 13 to work (they rely on B being a product of a real unitary matrix and another paraunitary matrix). Lemma 13 now reads as For the proof of Lemma 13, the only thing worth noting is that the expression in line (11) becomes −ρq that is, one copy of F i,j (coming from the preconditioner A q ) is replaced by (F q ) i,j , and the other remains untouched. The remainder of the proof of the lemma proceeds in an obvious way, taking advantage of (14). The resulting lower bound for the number m of steps that the algebraically κ q -well conditioned algorithm A q needs to compute F q is for sufficiently large q, for a global C. Taking q to infinity and recalling that κ q → κ we conclude Theorem 9's proof for the general case.