How exponentially ill-conditioned are contiguous submatrices of the Fourier matrix?

We show that the condition number of any cyclically contiguous $p\times q$ submatrix of the $N\times N$ discrete Fourier transform (DFT) matrix is at least $$ \exp \left( \frac{\pi}{2} \left[\min(p,q)- \frac{pq}{N}\right] \right)~, $$ up to algebraic prefactors. That is, fixing any shape parameters $(\alpha,\beta):=(p/N,q/N)\in(0,1)^2$, the growth is $e^{\rho N}$ as $N\to\infty$ with rate $\rho = \frac{\pi}{2}[\min(\alpha,\beta)- \alpha\beta]$. Such Vandermonde system matrices arise in many applications, such as Fourier continuation, super-resolution, and diffraction imaging. Our proof uses the Kaiser-Bessel transform pair, and estimates on sums over distorted sinc functions, to construct a localized trial vector whose DFT is also localized. We warm up with an elementary proof of the above but with half the rate, via a periodized Gaussian trial vector. Using low-rank approximation of the kernel $e^{ixt}$, we also prove another lower bound $(4/e\pi \alpha)^q$, up to algebraic prefactors, which is stronger than the above for small $\alpha, \beta$. When combined, the bounds are within a factor of two of the numerically-measured empirical asymptotic rate, uniformly over $(0,1)^2$, and they become sharp in certain regions. However, the results are not asymptotic: they apply to essentially all $N$, $p$, and $q$, and with all constants explicit.

1. Introduction and main results. The size-N discrete Fourier transform (DFT) matrix F has elements F jk = e 2πijk/N , j, k ∈ Z, −N/2 ≤ j, k < N/2 , where the row and column index sets should be taken as N -periodic; we center them on zero for convenience later. 1 Taking the DFT of a vector in C N , for instance via the fast Fourier transform (FFT) algorithm, is equivalent to multiplication by F . That F is full rank and its matrix condition number, cond(F ), is 1 follows from the unitarity of F/ √ N . However, it is well known that contiguous submatrices of F of size p× q are approximately low rank, with ǫ-rank of order pq/N [13,31,41]. On the other hand, their condition number must be finite, because any submatrix (or its adjoint) may be obtained by deleting columns of a Vandermonde, hence nonsingular, matrix (z k j for the nodes z j = e 2πij/N ). Yet, Vandermonde matrices are suspected to be exponentially ill-conditioned unless nodes are equispaced over the entire unit circle [33]. This leads one naturally to ask: how do the condition numbers of Fourier submatrices behave? Their growth has been established to be exponential [29,Thm. 3.1], and preliminary study exists for the case p = q = N/2 [33, Thm. 6.2 and Table 4]. Yet, what is the exponential rate, and how does it depend on p and q? Figure 1.1 illustrates this growth, and hints at a universal rate depending only on the scaled submatrix shape. There appear to have been few rigorous answers to these quite simple and fundamental questions.
The conditioning of Fourier submatrices has consequences in applications because it controls the numerical stability, or noise amplification, of various function and image reconstruction problems: 1. In Fourier extension (or continuation) methods [8,20,27], with applications including the numerical solution of PDEs [2], Fourier series coefficients are solved by collocation on a grid covering a fraction of the periodic interval. In Condition numbers of all possible submatrices A of the size-N Fourier matrix F , as a function of their size p × q. Panels (a-c) compare this for three increasing N values, which are still rather small. Each vertical (p) axis is oriented downwards to match the usual sense for matrices. The color scale is logarithmic, illustrating exponential growth both in N , and as (p, q) tends to the diagonal. See Figure 4.1 for a study of the N → ∞ exponential rates and comparison to lower bounds proven in this work. The diagonal symmetry p ↔ q is exact for each N , but the inversion symmetry (p, q) ↔ (N − p, N − q) is only true asymptotically as N → ∞ (see Section 5.2).
the common case of a uniform grid, the system matrix is a contiguous submatrix of F with shape parameter α := p/N equal to the covered fraction, and β := q/N equal to this fraction divided by the oversampling. 2 The apparent exponential ill-conditioning is well documented [8,9] [1, Fig. 13]. However Adcock et al. [1] concluded, "there is no existing analysis [of submatrices of F ] akin to that of Slepian's for the prolate matrix. . . ". Zhu et al. [41] have since supplied one such missing piece; this paper supplies another. 2. Super-resolution imaging has importance in microscopy, astronomy, radar, and medicine (see [12,11,29,26,5] and references within). Even when source locations are known, and other fascinating issues such as sparsity and clustering are put aside, a linear system must be solved for the amplitudes, given known Fourier series coefficients (in the discrete model) up to some bandlimit p. A pathological arrangement for q such sources is a regular grid ("clump") with spacing some factor (the so-called SRF [12,11,5]) times smaller than the Nyquist spacing. The system matrix becomes a p× q Fourier submatrix, with α = 1/SRF, and the exponential blow-up of its conditioning is a fundamental obstacle in the presence of noise. 3. In coherent X-ray diffraction imaging, the data are squared magnitudes of the Fourier transform of an unknown image [28,7]. This data often excludes a region around the k-space origin, due to excessive intensity. However, since such data are also the Fourier transform of the image autocorrelation, such missing data may be recovered by solving a linear system involving a Fourier submatrix [3]. In the one-dimensional (1D) model, q ≪ N is then the number of missing data, and α = 1 − 2/m where m > 2 is an oversampling factor. The recovered data can help with the subsequent phase retrieval problem; this motivated the present study. More widely, linear systems involving rectangular Vandermonde matrices arise in a variety of signal processing and parameter identification problems [6,26]. The square of the singular values of Fourier submatrices are the eigenvalues controlling the maximum space-and frequency-concentration of periodic discrete prolate spheroidal sequences (P-DPSS) [19,21,27,41], the fully discrete (C N → C N ) analogues of prolate spheroidal wavefunctions [36,32]. As Zhu et al. [41] state, "there exist comparatively few results concerning the P-DPSS eigenvalues." 1.1. Results. Each of our three results is a lower bound on the condition number of contiguous Fourier submatrices, that, barring algebraic prefactors, is exponential in the submatrix size. Each bound has all constants explicit. To provide insight, we also present these as N → ∞ asymptotic rates when the shape parameters (α, β) are held fixed (Table 1.1). Yet we emphasize that the main theorems are not asymptotic results-in particular, between them they cover all N and, using the p ↔ q symmetry of the condition number, all p and q.
We include our first result, even though it will be essentially superceded by Theorem 2, because its prefactor is slightly stronger, and moreover its elementary proof (Section 2) is instructive.
Theorem 1. Let A be a cyclically contiguous p × q submatrix of the N × N discrete Fourier matrix F given by (1.1), with 2 < q ≤ p < N − 2. Then the condition number of the matrix A obeys where q is the largest even integer smaller than q, and p is the smallest integer of the same parity as N larger than p. This applies to the square or "tall" case q ≤ p; if instead q > p (the submatrix is "fat"), one applies this theorem to its Hermitian adjoint A * , since cond(A * ) = cond(A). It is thus possible to rephrase the theorem (and the two below) in a symmetrized form that applies without such conditions on p and q. We will do this in terms of fixed fractional sizes or shape parameters α := p/N and β := q/N . Since q − q and p − p are at most 2, they may be absorbed into a prefactor. Thus, 3 where the rate ρ, and the implied constant, depend on α and β. Explicitly, ρ(α, β) = Combining this with the case β > α by swapping α and β, we get the rate ρ = π 4 [min(α, β) − αβ] applying for all (α, β) ∈ (0, 1) 2 . These two forms are summarized in the first row of Table 1.1. Our next result is similar but has a doubled rate, and a more involved proof (Section 3). Here I 0 will denote the modified Bessel function of order zero [30, (10.25.2)].
Theorem 2. Let A be a cyclically contiguous p × q submatrix of the N × N Fourier matrix, with 1 ≤ q ≤ p < N . Then Using the asymptotic I 0 (z) ∼ e z / √ 2πz given in [30, (10.3.4)], and keeping only dominant terms, we get, in terms of the shape parameters, Summary of exponential growth rates ρ = ρ(α, β) of the lower bound e ρN on the condition number of a p × q contiguous submatrix of the N × N Fourier matrix, as a function of fixed shape parameters α := p/N and β := q/N , asymptotically as N → ∞. We drop algebraic prefactors, and drop O(1) changes in p, q, since we assume p, q ≫ 1. Each row summarizes a different theorem proven in this paper. The second column presents the simpler "non-fat" q ≤ p submatrix case, and the third column the general case. The † is a reminder that Theorem 3 only applies for α, β < 4/eπ ≈ 0.468.
with an explicit rate ρ precisely double that from the first theorem, for each α and β. This is summarized in the abstract, and in the second row of Table 1.1.
Our final main result (proved in Section 4) improves upon the above in the case of small α and β, i.e., in the "corner" of (α, β) space.

Relation to prior work.
Here we compare our findings to the few existing lower bounds on the condition number. At the end of this section we discuss an asymptotic connection to eigenvalues of the prolate matrix. We also note that there have been a couple of small-scale numerical studies [1, Fig. 13] [33, Table 4] [26, Fig. 4(a)].
The foundational work of Edelman-McCorquodale-Toledo [13,Thm. 3] showed that square Fourier submatrices A with α = β = 1/n, n = 2, 3, . . . , have an ǫ-rank of α 2 N , asymptotically as αN → ∞. However, their analysis did not access cond(A). Recently Zhu et al. [41,Cor. 1] gave a more refined ǫ dependence of the asymptotic distribution of singular values of such an A. They show that σ j (A) ≤ ǫ for j ≥ j ǫ , with a formula for j ǫ that is at least 4 α 2 N + ( 4 π 2 log 8αN + 6) log(16/ǫ 2 ). To convert this to a lower bound on condition number one equates j ǫ to αN , the submatrix size, then solves for ǫ. Simplifying somewhat, the resulting lower bound cannot be stronger than which is super-algebraic, but falls short of e ρN for any positive ρ.
To our knowledge, the chief prior exponential lower bounds on cond(A) are the following three.
2) Moitra [29,Thm. 3.1] gives the only proof of which we are aware that the condition number of a submatrix of general shape grows at least exponentially, although no rate is given. His method is similar to that of our Theorem 2-although we found it independently-but with trial vector v chosen as a high power of the Fejér kernel. By tracking the rate in his proof 5 we get, in our notation, This has a similar form as our first two theorems (see Table 1.1), but Theorem 2 improves upon it by a factor of about 4.5, for all (α, β).
3) In the super-resolution literature, lower bounds on the condition number have been proven in the case of fixed q, and α ≪ 1. They take the general form cond(A) ≥ (cα) −q+1 ∝ e βN log(1/cα) , similar to the last row of our Table 1.1. The strongest such result that we know of is that of Li-Liao [26] (see also [24,Ex. 5.1]). In our notation, [26,Prop. 3] states where in the second form we bounded the central binomial coefficient. Thus their constant is c = π. Their proof exploits the qth-order finite difference trial vector v j = (−1) j (q−1)!/(q−j)!(j −1)!, j = 1, . . . , q, whose first q moments vanish, following Donoho [12, §7.4]. The result (1.5) has restrictive conditions: α ≤ 1/(C(q) √ p), where one may check that C(q) ∼ 4 q . Thus (1.5) does not apply for fixed q and α as N → ∞. In constrast, our Theorem 3 applies to all submatrix sizes up to 0.468N , and furthermore its rate constant c is 4/e ≈ 1.47 times stronger.
Remark 4 (The prolate matrix and sharpness). There is a limit in which the singular values of A are already well understood [13,1,41,5]. When N → ∞ with α and q held constant (so that the height of A is much larger than its width), a Riemann sum shows that A * A tends to a multiple of Slepian's prolate matrix [35,40]. The latter is P (q, α/2) in standard notation, with elements (P (q, α/2)) jk = α sinc(πα(j − k)), j, k = 1, . . . , q, recalling that sinc x := (sin x)/x for x = 0 and 1 otherwise. Thus, in this limit, the singular values of A are the square-roots of the eigenvalues of P (q, α/2). Slepian proved the q → ∞ asymptotic for the smallest such eigenvalue [35,Eqs. (13), (58)], Recalling that q = βN , the term θ −q predicts an exponential growth rate for cond(A) of which should be compared with our results in the second column of Table 1.1.
We emphasize that, in contrast to the above prolate asymptotics which fix q and α with N → ∞, our Theorems 1-3 are not asymptotic: they apply to arbitrary N and essentially arbitrary p and q.

Overview of proof methods.
We now outline the tools used to prove the three main theorems. The definition of matrix condition number is where σ 1 ≥ σ 2 ≥ · · · ≥ σ min(p,q) =: σ min are the singular values of A. All three proofs place exponentially small upper bounds on σ min (A), then combine this with the following simple bound. Proposition 5. The largest singular value of any p × q submatrix of the Fourier matrix (1.1) obeys The result follows since σ 1 (A) = A . Theorems 1 and 2 will use the variational bound on σ min . Namely, if q ≤ p, then If instead q > p (a "fat" submatrix), (1.9) no longer holds, explaining the hypothesis q ≤ p in Theorems 1 and 2.
Our trick to construct a trial vector v for which Av is nearly (or exactly) known is by embedding this matrix-vector product in the larger one, F f , for some f ∈ C N . Firstly we note that cyclic horizontal or vertical translations of the submatrix location 6   00  00  00 00  00  00  00  00 00  00  00  00   11  11  11 11  11  11  11  11 11  11  11  11   00  00  00 00  00  00  00  00 00  00  00  00   11  11  11 11  11  11  11  11 11  11  11  11 000000  000000  000000 000000  000000  000000  000000  000000 000000  000000  000000  000000  000000 000000  000000   111111  111111  111111 111111  111111  111111  111111  111111 111111  111111  111111  111111  111111 111111  111111   0000000  0000000  0000000 0000000  0000000  0000000  0000000  0000000 0000000  0000000  0000000  0000000  0000000   1111111  1111111  1111111 1111111  1111111  1111111  1111111  1111111 1111111  1111111  1111111  1111111  1111111 00  00  00 00  00  00  00  00 00  00  00  00  00 00   11  11  11 11  11  11  11  11 11  11  11  11  11 11   00  00  00 00  00  00  00  00 00  00  00  00  00 00   11  11  11 11  11  11  11  11 11  11  11  11  11 The main proof idea for Theorems 1 and 2. a) shows the submatrix A (pink) within the N × N Fourier matrix F . b) shows the cyclically shifted A (pink) again within F , and the formation of the DFT matrix-vector product F f . The vector f is exponentially small outside of the central index set Q, and has a known DFT which is exponentially small in the high frequency index set P . The length-q vector (shown in red) is v = f | Q , i.e., restricted to the "input" indices. within F do not affect its singular values. This can be seen by noting that a cyclic right-translation by m of a submatrix A is equivalent to left-multiplication of A by the diagonal unitary matrix with diagonal entries {e 2πijm } j∈J , where J is the set of row indices of A (this has been pointed out, e.g., in [41, Sec. III.C]). Thus we translate the index sets of A to be the p highest magnitude output frequencies (see P defined by (2.9)), and the q lowest-magnitude inputs Q := {j ∈ Z : −q/2 ≤ j < q/2}; see Figure 1.2. The trial vector v will then be f | Q , the restriction of f to the central index set Q. The desired Av is then (F f )| P , plus a correction of size at most the norm of f outside Q, which is arranged to be small or zero.
Thus we seek an f that is exponentially small outside of Q, whose DFT, F f , is known and exponentially small in P . We build this from a known (1.10) Setting ω = k/N makes the left-hand side equal to the kth entry of F f , for a vector f whose jth entry is f j = n∈Z f (j + nN ), a sample of the N -periodization of the original function f . In Theorem 1 we choose the Gaussian Fourier pair, which is of course well localized both in position and frequency space. It also has monotonic tails, allowing sums to be bounded by simple integrals. However, the problem of optimal (in the L 2 (R) sense) simultaneous position-and frequency-localization was solved by Slepian and co-workers in the form of the prolate spheroidal wavefunctions (PSWFs) of order zero, in particular the first such function ψ 0 [36,25,32]. Thus one might hope that by choosing f as a scaled truncated ψ 0 , makingf a scaled (but not truncated) ψ 0 , one could beat the Gaussian rate. Yet, we were unable to find estimates on the tails of ψ 0 that enable bounding the right-hand side sum in (1.10). Instead, Theorem 2 relies on the so-called Kaiser-Bessel (KB) Fourier transform pair [22,15] defined by (3.2), which has the same optimal exponential rate of localization as the PSWF (see, e.g., [4,Sec. 5]). The non-compactly-supported member of this pair involves only sinc and other elementary functions, allowing a bound on its tail sum. Since the tail is oscillatory rather than monotonic (see Figure 3.1), the bound is involved, although elementary. Here we draw on the thesis of Fourmont [14]; he was concerned with error of the nonuniform FFT [15,23,4] where the criteria for a good spreading kernel are similar to those for f . We suspect that this work is the first to exploit the KB pair as an analysis (as opposed to numerical) tool.
In contrast, to prove Theorem 3, σ min is bounded from above via the SVD rank approximation theorem 6 [17, Thm. 2.5.3]. Namely, σ q is bounded from above by the error of any rank-(q − 1) approximation of A, in the operator norm. A rapidlyconvergent approximation comes by sampling a carefully chosen continuous kernel expansion of the form on regular grids in x and t, so that its samples give the elements of A. This sampling idea has been used by O'Neil-Rokhlin to bound the numerical rank of A [31,Cor. 3.4].
The structure of the rest of this paper is as follows. The next three sections correspond to the three main theorems: Section 2 proves the elementary Gaussian rate, Section 3 proves the doubled rate based on the Kaiser-Bessel pair, then Section 4 proves the further improved rate in the corner region of the (α, β) plane. Finally, Section 5 has a detailed numerical comparison to the empirical growth rate, discusses symmetries (both exact and near) in the (α, β) plane, and draws some conclusions. A short Appendix proves the Kaiser-Bessel transform pair.
2. Proof of elementary Gaussian rate (Theorem 1). We start with a simple identity that the DFT of a periodized discrete Gaussian is another periodized discrete Gaussian. Proposition 6. Let N ∈ N, σ > 0, and define the N -periodic vector f ∈ C N by its entries Then the kth component of the DFT of f is Proof. With the ("number theorist") Fourier transform convention we have the usual continuous Fourier transform pair Inserting this f into the Poisson summation formula (1.10) and setting ω = k/N gives the discrete sums By grouping terms with the same index modulo N , and using (1.1) and (2.1), the left-hand side can be seen to be (F f ) k . Proposition 7. (2.1) and (2.2) may be bounded uniformly over their centered index ranges by (non-periodic) Gaussians, namely Proof. Assume j ≥ 0, then we may drop the cross-term in the square, then exploit monotonicity to bound a Gaussian sum by its integral, to get The negative terms are handled by shifting the sum which, since j ≤ N/2, is termwise no larger than the non-negative sum, giving an overall factor of two, hence (2.5). The j < 0 case is handled by noting f −j = f j . (2.6) follows by an identical method. Lemma 8. Let A be a cyclically contiguous p × q submatrix of the N × N discrete Fourier matrix F , with p ≥ q (i.e., the submatrix is either square or "tall"), q > 2, and p < N − 2. Then, in terms of p and q in Theorem 1, the smallest singular value of A obeys Proof. As discussed in Section 1.3, we translate A to be centered vertically about the highest frequency and horizontally about the lowest; see Figure 1.2(b). We then apply (1.9), with v chosen as follows. We select the central q elements of f , given by (2.1), with width parameter σ > 0 to be specified later, i.e., v := {f j } −q/2≤j<q/2 . (2.8) Let g ∈ C N be v embedded in the vector of length N , i.e. with elements g j = f j for −q/2 ≤ j < q/2, zero otherwise. The remainder we denote by h ∈ C N , that is, h j = 0 for all −q/2 ≤ j < q/2, and h j = f j otherwise. In summary, f = g + h, with g supported only in the central region Q while h is supported only in its complement; see and using the inequality (a − b) 2 < 2(a 2 + b 2 ), we bound (2.10) To bound the first sum in (2.10) we set K := (N − p)/2 as a lower bound on the half-width of P , the complement of P in the full set {−N/2 ≤ k < N/2}. Using this and (2.6), To bound the second term in (2.10), we use (2.5) and apply an identical method to get Substituting the last two results and F 2 = N into (2.10) and gathering common terms gives Choosing the width σ to balance the two exponential rates gives We can now bound the prefactor (1 + 1/ √ 8πσ) 2 ≤ (1 + 1 − p/N /2 √ q) 2 ≤ 3 using q ≥ 1 implied by the hypothesis q > 2. Also, by the arithmetic-geometric inequality, Finally, the crude lower bound v 2 2 ≥ 1 results by keeping only the term j = n = 0 in the definition (2.1) of f j . Combining this with the square-root of (2.13), and simplifying 24 √ 2 < 6, gives (2.7). Remark 9. A similar proof idea-an explicit trial function (in their case a truncated Gaussian) that is near a function whose Fourier transform nearly has compact support-was used by Landau and Pollak in [25,Lem. 4] to show that the prolate eigenvalue µ 0 is exponentially close to 1.
With σ min upper bounded by Lemma 8, and σ 1 ≥ √ p by Proposition 5, Theorem 1 follows from inserting these two bounds into the definition of cond(A), namely (1.7).
3. Proof of Kaiser-Bessel doubled rate (Theorem 2). Theorem 2 is an immediate consequence of the lower bound on σ 1 (A) (Proposition 5), and the following upper bound on σ min (A).

1)
where I 0 is the modified Bessel function of the first kind of order zero.
Before proving this, we introduce the main tool. The Kaiser-Bessel analytic Fourier transform pair is, given a parameter σ > 0, with the Fourier convention (2.3). Apparently due to B. F. Logan [16], it is stated without proof in [22, p. 232-233], and since has become popular for windowing and gridding in signal processing (see [15,23,4] and references within). Since it is not listed in standard tables, and we know of no published proof, we include one in the Appendix. The parameter σ may be interpreted as a "cut-off" frequency: for 2π|ω| < σ the sinc (and hence sin) has imaginary argument so is exponentially large (around e σ ), whereas for all 2π|ω| ≥ σ it is bounded by 2.
Our precise choice of Fourier pair will be informed by the need to bound an infinite algebraic sum (1.10) over its frequency argument. Since φ in (3.2) is discontinuous at ±1 (see Figure 3.1(a)), the tails ofφ have sinc-type oscillations whose amplitude decays only as |ω| −1 (see Figure 3.1(d)); thus such algebraic sums are not absolutely convergent, making the analysis tricky.
Remark 11. A very similar type of sum over the tails of the functionφ in (3.2) has already been bounded in a detailed analysis by Fourmont [14,15]. However, he relies on the fact that his sum is over the tail of an algebraic series with zero offset, allowing him to build on the uniformly bounded Fourier series k>0 (sin kx)/k = (π − x)/2 for 0 < x < 2π, zero for x = 2π. When series with arbitrary offset are instead allowed, as we will require, logarithmic divergence is possible, as shown by the support falls within a q-sized interval, the deplinthed KB pair becomes 3) which will play the role of (2.4) in the following proof of Theorem 10. This pair is shown by Figure 3.1(a,b,e).
Proof. We apply Poisson summation (1.10) to the pair (3.3), and set ω = k/N , to give an explicit formula for the action of the DFT on f , the vector with entries f j = f (j), −N/2 ≤ j < N/2, being the discrete samples of (3.3). This action is As in the proof of Lemma 1, the submatrix A is now arranged to sit within F so that its input (column) index set is −q/2 ≤ j < q/2, and the output (row) index set P is as in (2.9).
However, in contrast to the Gaussian case, σ may now be chosen up front, as follows. We need to ensure that for each index k ∈ P , all of the arguments in the terms in (3.4) fall at or beyond cutoff (see Figure 3.1(b)), so that their contribution is exponentially small relative to f . It is sufficient to check this for the term m = 0, because |k| ≤ N/2. This gives the criterion πq|k|/N ≥ σ for all k ∈ P . For all k ∈ P we have |k| ≥ (N − p)/2 = (1 − α)N/2, so the choice optimally satisfies the criterion, and we fix this from now on. In order to bound (3.4) we will break up the sum into two one-sided sums, each of which has the form where a = πq is the spacing and b the offset for the arithmetic progression of frequencies am + b. Using the even symmetry of the sinc function, we now split the sum (3.4) into three parts, m < 0, m > 0, and m = 0, to get, in terms of (3.6), the three terms Here in both instances of the one-sided sum S σ (πq, b), the offset b obeys b > σ, because |k| ≤ N/2 and σ < πq/2. Thus each sum satisfies the conditions of Lemma 13, stated and proved in the next subsection, which bounds it by uniformly over k ∈ P . From the above formula for (F f ) k , applying the triangle inequality, the fact that | sinc y| ≤ 1 for y ∈ R, then the bound (3.7), gives, for all Here σ = 25. The narrow plots on the far right are for higher x, and a zoomed vertical scale, showing that the difference (e) has the fastest asymptotic decay (x −2 as opposed to Defining v ∈ C q as the central q entries of f just as in (2.8), and noting that the compact support of f (t) makes all other entries f j = 0, we have Av = (F f ) k∈P and We combine this with the simple bound v ≥ f 0 = f (0) = I 0 (σ) − 1, and recall σ min (A) ≤ Av / v for any v = 0, to complete the proof.
3.1. Technical lemmas on one-sided warped sinc sums. The one-sided sum S σ (a, b) defined by (3.6) involves the sinc of each frequency in an algebraic progression {am + b} m≥0 , but also the sinc of each frequency in a "warped" sequence { (am + b) 2 − σ 2 } m≥0 . The proof of Theorem 10 relied on our ability to bound algebraic sums over the difference between warped and unwarped sinc functions, expressed by the main Lemma 13 below. This lemma is nontrivial, because the sum of a single sinc over a general algebraic progression is at best conditionally convergent 13 (because its tail decays with power −1), and may not even be finite (see Remark 11). One might hope that the difference between the sinc functions in (3.6) eventually has more rapid decay, because the difference in their two arguments tends to zero as m → ∞. This is true-illustrated by Figure 3.1(c-e)-and is the idea behind its proof, which occupies the rest of this section.
Remark 12. Our technique adapts several steps from the thesis of Fourmont [14,Sec. 2.5], who showed how phased sums over a sinc whose argument is a warped algebraic progression (with offset b = na, n ∈ N) can be bounded by the phased sum over the unwarped sinc. This requires decomposing the sinc function into numerator and denominator factors, and showing how the effect of warping each factor becomes small for large enough m. His task was to bound the aliasing error in the non-uniform FFT algorithm when using (3.2) as a spreading kernel [15]. Our task is in one way simpler because there is no phase, but we also need to handle general b.
Proof. From now on we use x = am + b to denote a frequency in the progression. Then, since x > 0, which is Fourmont's decomposition. By the triangle inequality, and using that | sin y| ≤ 1, We apply the "denominator warp" Proposition 14 stated below (using y = a/2 and the definition of α) to the first summand. We also apply the "numerator warp" Lemma 15 stated below, which applies since a ≥ π, to bound the second sum by 5. We then split off the first term of the remaining sum and bound the rest by an integral: where in the final two steps we used 1/b < 2/a and σ/b < 1.
The rest of this subsection is devoted to the required bounds on the effect of warping on the numerator and denominator of the sinc function. The first estimate shows that the effect of warping the denominator is O(x −3 ) as frequency x → ∞, with an explicit constant.
Proposition 14 (Denominator warp). Fix σ > 0 and y > σ, then Proof. Since x > σ, expanding the left side gives and applying x ≥ y to the first factor gives the result.
The following shows that the effect of warping the numerator on the sum is uniformly bounded over the allowed set of parameters σ, a, b. Its proof needs several results which complete the subsection.
Lemma 15 (Numerator warp). Let a ≥ π, σ ∈ (0, a/2), and b ≥ a/2. Then, using the abbreviation Proof. Fixing σ, will make frequent use of the frequency warping deviation function, Now, noting sin √ x 2 − σ 2 = sin(x− R(x)), applying the addition formula, subtracting sin x from both sides and dividing by x gives Applying the triangle inequality termwise, and bounds on sin and cos, gives by Lemmas 17 and 16 respectively. Lemma 16. Let a ≥ π, σ ∈ (0, a/2), and b ≥ a/2. Then, with R(x) as in (3.13), and using the abbreviation (3.14) Proof. By Proposition 18 below, R(x) = O(x −1 ) so that the tail of the sum is O(x −2 ) and hence summable. However, this only becomes useful for sufficiently large m. Thus the idea will be to split the sum at index m 0 := ⌈σ 2 /a⌉ ≥ 1, chosen so that, given the hypotheses on σ, a, and b, This follows easily from the fact that x = am + b ≥ am 0 + b ≥ σ 2 + b, that b > σ, and from Proposition 18. For the first m 0 terms we use the crude bound | sin R(x)| ≤ 1, but use sin y ≤ y for 0 < y ≤ 1 and (3.15) in the tail sum, and get (3.16) After splitting off its first term, we bound the rest of the finite sum by an integral with upper limit a(m 0 − 1) where the replacement of a by its lower limit π is justified by checking that the function of a has negative derivative for all a > 0. The infinite sum in (3.16) is similarly bounded by an integral using σ ≤ a/2 and a ≥ π. Adding the two above bounds completes the proof. Lemma 17. Let a ≥ π, σ ∈ (0, a/2), and b ≥ a/2. Then, with R(x) as in (3.13), and using the abbreviation Proof. The proof is very similar to that of Lemma 16. We choose m 0 in the same way. Then for all m ≥ m 0 , since R(x) ≤ 1 then, by Taylor's theorem followed by Proposition 18, Again splitting the sum, and bounding the first m 0 terms via | cos R(x) − 1| ≤ 2, and the rest via the above formula, The first sum is no more than 2, using (3.17). The second sum we bound by its first term plus an integral to get using σ 2 < am 0 + b, σ ≤ a/2 and a ≥ π. Adding these two bounds finishes the proof.
Finally, the above two proofs relied on the following fact that the effect of warping the numerator is O(x −1 ).
Proof. Since σ < x, we expand 4. Proof of improved rate for small α and β (Theorem 3). As with the other two main theorems, Theorem 3 combines a lower bound on σ 1 (A) (Proposition 5) and an upper bound on σ min (A), in this case the following lemma.
Lemma 19. Let A be a cyclically contiguous p×q submatrix of the N ×N discrete Fourier matrix F , with 1 < q ≤ p < 4N/eπ + 1. Then Proof. Following [31, Lem. 3.2] we start with the Bessel-Chebyshev expansion arising by inserting t = cos θ into the Jacobi-Anger expansion e ix cos θ = n∈Z i n J n (x)e inθ . The submatrix elements can be generated by sampling this kernel on the product of the regular grids x j = (−1 + 2j/(p − 1))W , j = 1, . . . , p and t k := −1 + 2k/(q − 1), k = 1, . . . , q. Scaling the x-domain size by setting W = π(q − 1)(p − 1)/2N insures that the matrix with elements A jk = e ixj t k is, up to irrelevant left and right multiplication by diagonal unitary matrices, equal to any given p × q contiguous submatrix of the size-N Fourier matrix F . Defining for n = 0, 1, . . . the sequences of vectors u n ∈ C p and v n ∈ C q , with elements (u 0 ) j ≡ 1, (u n ) j := 2i n J n (x j ) for n = 1, 2, . . . , and (v n ) k := T n (t k ) for n = 0, 1, . . . , we rewrite the matrix of samples of (4.2) as the sum of rank-1 outer products Since q ≤ p, then σ min (A) = σ q (A). The rank approximation theorem (e.g., [17,Thm. 2

.5.3]) bounds
where in the penultimate step we used |T n (t)| ≤ 1, and in the last step Siegel's bound for J n (nz) [30, 10.14.5] where g(z) := ze , for z = x/n ≤ 1. Note that q − 1 > W is equivalent to p < 2N/π + 1 which holds by the hypotheses of the lemma. We now observe that g(z) ≤ ez/2 in 0 < z ≤ 1, bound g(W/n) by g(W/(q − 1)), and recall W/(q − 1) = π(p − 1)/2N < 1, to get a bounded geometric sum when (p − 1)/N < 4/eπ, a hypothesis of the lemma, giving (4.1). Remark 20. We initially tried the low-rank expansion of e ixt resulting from the Taylor series for e z , following [10, Lem. 1], but found that this gave a rate replacing the exponential term in (4.1) with [eπ(p − 1)/2N ] q−1 , which is exactly half the rate in (4.1), and even weaker than (1.5). It is therefore possible that yet other low-rank expansions, such as the double-Chebyshev [34, App. A], could further improve the rate in the corner region α, β ≪ 1.

Discussion.
5.1. Numerical study of sharpness of exponential rate bounds. Figure 1.1 showed cond(A) for various fixed submatrices A, at small N . However, Theorems 1-3 may be seen as lower bounds on the asymptotic exponential growth rate with respect to N , for fixed shape (α, β), as summarized in Table 1.1. Thus we numerically measured the empirical asymptotic growth rateρ(α, β) along the family of submatrices sharing a given (α, β), but growing N , to see how close our lower bounds were to this rate. The result is Figure 4.1. Comparing its panels (a) and (b) shows that (the symmetrized) Theorem 2 captures well the main features of the empirical rate. Indeed, the ratioρ/ρ, plotted in (c), is less than 1.2 in 54% of the area of the square (0, 1) 2 . This ratio approaches 1 in the neighborhoods of (1, 0) and (0, 1), as expected since Theorem 2 has a sharp rate there (Remark 4). However, in quite a tight band around the diagonal α = β the empirical rate exceeds this lower bound by a significant factor. This factor is about 1.45 at (1/2, 1/2), and along the diagonal grows slowly but without bound as the corners (0, 0) and (1, 1) are approached.
However, we know that the rate of Theorem 3 beats that of Theorem 2 inside the corner (0, α * ) 2 , where α * ≈ 0.117 (see Section 1.1). Hence, in Figure 4.1(d-f) we zoom in to the neighborhood of this region, comparing the empirical rate to the stronger of these two theorems (their cross-over at max(α, β) = α * is visible as kinks in the contour lines in (e) and (f)).
As panel (f) shows, the ratio now appears uniformly bounded, reaching its maximum at α = β = α * . An estimate of this maximum is given by the ratio at (0.12, 0.12), which is less than 1.91. As β → 0, the ratio peaks at a value about 1.7 at α = α * . As (α, β) → (0, 0), the ratio drops. Remark 4 on the prolate asymptotic suggests that the ratio tends to 1 (logarithmically slowly) if approached along the axes.
Remark 21 (measuring the empirical rateρ(α, β)). Generating Figure 4.1 needed an accurate measurement of the constantρ in the asymptotic model cond(A) ∼ ceρ N as N → ∞. Recall that here A takes a sequence of integer sizes p×q with p/N = α and q/N = β fixed (and, thus, rational). This places constraints on the allowable p and q, making the measurement subtle. As a example, to estimateρ(0.5, 0.49) the smallest possible case of A is a 50 × 49 submatrix with N = 100, and yet already cond(A) > 10 16 , exceeding the value reliably measurable in double-precision arithmetic.
There are two ways out of this quandary: either compute SVDs via a higher precision arithmetic library, or limit oneself to α and β with small rational denominators. (c) shows the ratio of (a) to (b), i.e., the sharpness of the theorem. The breakup into small dots along the diagonals of (a) and (c) are contouring artifacts. Panels (d,e,f ) show a zoom around the origin of panels (a,b,c) respectively, on a grid of spacing 1/50, but also applying Theorem 3 in the corner (0, α * ) 2 (shown dotted), where it is stronger than Theorem 2.
We opted for the latter. This led to the choices of 1/30 and 1/50 for the grid spacings in Figure 4.1, which were adequate. We wrote a simple code which, given such a pair (α, β), uses a bisection search along the family of A with this shape, identifying the largest A for which cond(A) does not exceed an upper limit of around 10 16 . This data point is combined with another taken from the A nearest to half this size (giving cond(A) ∼ 10 8 , bypassing any pre-asymptotic growth). Theρ returned is the slope between (the log of ) these two points. The accuracy is around ±0.02, estimated by comparing equivalent rational forms for (α, β). The code ran in MATLAB R2017a on a laptop with Intel i7 CPU, taking a few seconds to generate all figures in this paper. Certainly a more elaborate arbitrary-precision code could be built, but the above served our purposes well.
The code described above, plus those generating the other figures in this paper, can be found at https://github.com/ahbarnett/fourier-submat 5.2. Symmetry and near-symmetry in the (α, β) plane. The empirical rate plot Figure 4.1(a) appears to have two symmetries (i.e. the D 2 dihedral group).
The first is that cond(A) is invariant to the diagonal reflection (α, β) → (β, α), which follows immediately from the fact that a submatrix of swapped dimensions is the adjoint, and cond(A) * = cond(A).
The second symmetry in Figure 4.1(a) is that the rate also appears invariant under (α, β) → (1 − α, 1 − β), i.e., inversion about (1/2, 1/2), or complementing both row and column index sets. However, understanding this is more subtle. From the rate one might suspect that cond(A) itself is also invariant under changing the submatrix from size p × q to (N − p) × (N − q). But this cannot be true, as the case p = q = 1 shows: the condition number of any 1 × 1 matrix is unity, and letting A now be a (N − 1) × (N − 1) submatrix of F gives σ 1 (A) = √ N by the interlacing property (e.g. [39,Thm. 1]). Choosing the omitted row and column to be j = k = 0, it is easy to check that A acting on the vector of all ones is sent to the negative of this vector, showing σ min (A) ≤ 1 and cond(A) ≥ √ N (this turns out to be an equality), which is certainly not unity! The condition number of a submatrix of F is not invariant with respect to complementing its dimensions p × q to give (N − p) × (N − q), yet as N grows this becomes a near symmetry. Figure 1.1 shows how strikingly rapid this is: even at N = 8 the near-symmetry is clear, and by N = 16 the failure of symmetry is almost impossible to see. This seems quite mysterious until we spot the following identity.
Proposition 22. Let A be a p × q submatrix of F with p + q < N , and let D be the (N − p) × (N − q) submatrix of F defined by the complement of the row and column index sets of A. Then where C is the (N − p) × q submatrix of F with the same column indices as A, and the complement of the row indices.
Proof. If p < q then we apply the diagonal reflection symmetry, so that we can from now take p ≥ q. We apply a permutation moving A to the upper left position in F , which does not change cond(A) nor cond(D), then label the blocks as above, with the missing fourth block named B: The rest is simple singular value inequalities. If v is the normalized minimum right singular vector for A, then because A is square or "tall", u = Av = σ min is as small as possible over unit vectors v ∈ C q . However, by isometry of F , u 2 + w 2 = N , so that w = Cu is the largest possible, hence v is also the normalized maximum right singular vector for C. Thus (σ min (A)) 2 + (σ max (C)) 2 = N .
Swapping the words "minimum" and "maximum" in this argument requires additionally that C not be "fat", which is true by the assumption q < N − p. This gives (σ max (A)) 2 + (σ min (C)) 2 = N .
We now reason similarly for the blocks of F * = [A * , C * ; B * , D * ], multiplying it against [0; v] to sample its right two blocks. By the assumption, D * is not fat, so may play the role of A in (5.2), giving (σ min (D)) 2 + (σ max (C)) 2 = N .
However, C * is fat, so the minimum of C * v over unit-norm v ∈ C q is zero, leaving (σ max (D)) 2 = N . This proposition means that cond(A) for a submatrix shape (α, β) in the triangle α + β < 1 is strictly smaller than cond(A) for its inverted shape (1 − α, 1 − β). However, the relative difference between the two condition numbers vanishes as C, itself a submatrix of shape (1 − α, β), becomes highly ill-conditioned. Since as N grows this happens everywhere except α, β ≈ 0, the mystery of the near-symmetry of Figure 1.1 is explained. By the lower bound of Theorem 1 or 2 this occurs eventually for any (α, β) ∈ (0, 1) 2 , proving that the true asymptotic exponential growth rate is exactly inversion symmetric.
Intriguingly, the rates in Theorems 1 and 2 also obey this inversion symmetry; in their proofs this can be traced to the fixed "Heisenberg" product of the sizes of the sets P and Q (e.g. see Figure 1.2(b), and (3.5)). Yet Theorem 3 cannot obey it, because its maximum α allowed is less than 1/2. However, combining it with Proposition 22 allows it to be also applied near the (1, 1) corner, as follows.

5.3.
Conclusions and open problems. Fourier submatrices are quite elementary objects, arising in many applications, yet until now there have been very few rigorous bounds on their conditioning. Our new non-asymptotic lower bounds on the condition number of a p × q submatrix of the size-N Fourier matrix apply to all N and, when combined with the obvious p ↔ q symmetry, essentially all p and q. All constants are explicit. Interpreted as asymptotic results for p and q fixed fractions of N , as N → ∞, they give exponential lower bounds with rates listed in Table 1.1. Numerical study (Section 5.1) shows that their rates capture quite well the empirical growth, which is also exponential, uniformly over shape space. Section 5.2 explained a non-obvious symmetry effect in this space.
Our methods are elementary: Theorem 1 used a Gaussian trial vector, Theorem 2 a more sophisticated trial vector and quite detailed estimates, while Theorem 3, applying only in the "corner" region of small α and β, used the SVD rank approximation theorem. In Theorem 2, we achieved a rate that is sharp in the limit (α, β) → (1, 0), or (0, 1), due to using the (deplinthed) Kaiser-Bessel pair (3.3). This shares with the prolate spheroidal wavefunctions an optimal frequency localization rate in L 2 (R), but is more accessible. We suspect that this is its first use as a pure analysis tool.
Our rate is not sharp long the axis (α, 0), or (0, β), but possibly could be made so by using a discrete prolate spheroidal sequence (DPSS) [35] as the trial vector v. This would require estimating finite sums over the tails of H(ω) in [35], for which only asymptotics are known. It is also unclear whether this would simply duplicate (1.6). Another approach would be to extend the methods of [41]. Also see Remark 20. This paper studied lower bounds in depth; we did not address upper bounds at all, which seems to be quite an open area. On this topic, for now we direct the reader to [6], and rather special cases in recent super-resolution literature: [26, Thm. 2, case A = 1], [5,Thm. 3.2], and [24, §4]. These references also deal with the generalization to non-uniform Fourier matrices (also see [34]). In 2D and 3D applications, Kronecker products of Fourier submatrices arise; we explore one such direction in [3].
Our numerical study suggests fascinating open problems. There appears to be a universal exponential rate as a function of shape, as shown in Figure 4.1(a); what is it? (Answering this would be equivalent to writing an asymptotic form for the smallest P-DPSS eigenvalue [19,41].) In particular, what phenomenon explains the sudden spike in growth rate around the diagonal, where the submatrix tends to become square?
6. Acknowledgments. The author benefited from helpful discussions with Hannah Lawrence, Daan Huybrechs, Charlie Epstein, David Barmherzig, and Alex Townsend, and from corrections by Dominik Nagel. The Flatiron Institute is a division of the Simons Foundation.
Appendix A. Proof of the Kaiser-Bessel Fourier transform pair. Here we prove (3.2), in two steps. The first will be to establish a related Fourier transform, This is equivalent to a formula in Gradshteyn-Ryshik [18, 6.677 (6)], although neither of the books cited therein prove it or give a reference; the scent goes cold. We prove it simply by averaging a plane wave over the unit sphere. The second step will analytically continue this formula to imaginary b. Proof. We insert the integral representation [30, (10.9.1)] of the Bessel function J 0 (s) = (2π) −1 2π 0 e is sin φ dφ into the left-hand side of (A.1), then change variable z = cos θ, thus proving (A.1). The key step passing to the last line is invariance of the mean under rotation of the coordinate system, whose new z axis points in the direction (0, b, z). Although (A.1) assumed real b, our second step shows that, fixing k, both sides are in fact entire with respect to b. This holds for the right-hand side because the squareroot has singularities only at b = ±ik, but these are removed by the entire function sinc having only even powers at its origin. The left-hand side is entire because the integrand is entire for each z and continuous in (b, z), so one may apply [37,Thm. 5.4,Ch. 2]. By unique continuation the equality thus holds for all b ∈ C, in particular b = iσ, so, using I 0 (s) = J 0 (is) from [30, (10.27.6)], this gives 1 −1 I 0 (σ 1 − z 2 )e ikz dz = 2 sinc k 2 − σ 2 .