Quantitative invertibility of random matrices: a combinatorial perspective

We study the lower tail behavior of the least singular value of an $n\times n$ random matrix $M_n := M+N_n$, where $M$ is a fixed complex matrix with operator norm at most $\exp(n^{c})$ and $N_n$ is a random matrix, each of whose entries is an independent copy of a complex random variable with mean $0$ and variance $1$. Motivated by applications, our focus is on obtaining bounds which hold with extremely high probability, rather than on the least singular value of a typical such matrix. This setting has previously been considered in a series of influential works by Tao and Vu, most notably in connection with the strong circular law, and the smoothed analysis of the condition number, and our results improve upon theirs in two ways: (i) We are able to handle $\|M\| = O(\exp(n^{c}))$, whereas the results of Tao and Vu are applicable only for $M = O(\text{poly(n)})$. (ii) Even for $M = O(\text{poly(n)})$, we are able to extract more refined information -- for instance, our results show that for such $M$, the probability that $M_n$ is singular is $O(\exp(-n^{c}))$, whereas even in the case when $\xi$ is a Bernoulli random variable, the results of Tao and Vu only give a bound of the form $O_{C}(n^{-C})$ for any constant $C>0$. As opposed to all previous works obtaining such bounds with error rate better than $n^{-1}$, our proof makes no use either of the inverse Littlewood--Offord theorems, or of any sophisticated net constructions. Instead, we show how to reduce the problem from the (complex) sphere to (Gaussian) integer vectors, where it is solved directly by utilizing and extending a combinatorial approach to the singularity problem for random discrete matrices, recently developed by Ferber, Luh, Samotij, and the author.

• Even for M = O(poly(n)), we are able to extract more refined information -for instance, our results show that for such M, the probability that M n is singular is O(exp(−n c )), whereas even in the case when ξ is a Bernoulli random variable, the results of Tao and Vu only give a bound of the form O C (n −C ) for any constant C > 0.
As opposed to all previous works obtaining such bounds with error rate better than n −1 , our proof makes no use either of the inverse Littlewood-Offord theorems, or of any sophisticated net constructions.Instead, we show how to reduce the problem from the (complex) sphere to (Gaussian) integer vectors, where it is solved directly by utilizing and extending a combinatorial approach to the singularity problem for random discrete matrices, recently developed by Ferber, Luh, Samotij, and the author.
In particular, during the course of our proof, we extend the solution of the so-called 'counting problem in inverse Littlewood-Offord theory' from Rademacher variables (established in the aforementioned work of Ferber, Luh, Samotij, and the author) to general

Introduction
Let M n be an n × n complex matrix.Its singular values, denoted by s k (M n ) for k ∈ [n], are the eigenvalues of M † n M n arranged in non-increasing order.Of particular interest are the largest and smallest singular values, which admit the following variational characterizations: where • 2 denotes the usual Euclidean norm on C n , and S 2n−1 denotes the set of unit vectors in C n .In this paper, we will be concerned with the following problem: for an n × n random matrix M n and a non-negative real number η, bound the probability Pr(s n (M n ) ≤ η) from above.This general problem captures, as special cases, many interesting and well-studied problems.
At one extreme, when η = 0, the problem asks for an upper bound on the probability that M n is singular.Even in the case when the entries of M n are independent copies of a Rademacher random variable (i.e. a random variable which takes on the values ±1 with probability 1/2 each), this is highly non-trivial.Considering the event that two rows or two columns of M n are equal (up to a sign) shows that Pr(s n (M n ) = 0) ≥ (1 + o n (1))n 2 2 1−n , and it has been conjectured since the 1950s that this lower bound is tight.Despite this, even showing that Pr(s n (M n ) = 0) = o n (1) was only accomplished in 1967 by Komlós [16], who used the Erdős-Littlewood-Offord anti-concentration inequality to show that Pr(s n (M n ) = 0) n −1/2 .
A bound of the form Pr(s for some c ∈ (0, 1), was obtained much later in 1995 by Kahn, Komlós, and Szemerédi [13], who proved such an estimate with c = 0.999.Subsequently, using deep ideas from additive combinatorics, Tao and Vu [34] obtained such an estimate with c = 0.75, and by refining their ideas, Bourgain, Vu, and Wood [1] were able to lower this constant to c = 1/ √ 2. Recently, in a breakthrough work, Tikhomirov [41] (building on the geometric approach to non-asymptotic random matrix theory pioneered by Rudelson and Vershynin [26]) showed that Pr(s n (M n ) = 0) ≤ (1/2 + o n (1)) n , thereby settling the singularity conjecture for random Rademacher matrices up to lower order terms.
At the other extreme, one may ask for the order of s n (M n ) for a 'typical' realization of M n ; in our setup, this corresponds to the largest value of η for which one can obtain a bound of the form Pr(s n (M n ) ≤ η) ≤ 0.01 (say).For instance, confirming (in a very strong form) a conjecture of Smale, and a speculation of von Neumann and Goldstine, Edelman [2] showed that for M n whose entries are independent copies of the standard Gaussian, this implies, in particular, that for i.i.d.standard Gaussian random matrices, s n (M n ) is typically Ω(n −1/2 ).
Edelman's proof relied on special properties of the Gaussian distribution -for general distributions, especially those which are allowed to have atoms, this question is much more challenging.
In this case, building on intermediate work by Rudelson [25], and essentially confirming a conjecture of Spielman and Teng, Rudelson and Vershynin [26] showed in a landmark work that for a real random matrix M n with i.i.d.centered subgaussian entries of variance 1, which is optimal up to the constant c ∈ (0, 1) and the overall implicit constant.In recent years, much work has gone into establishing similar tail bounds under weaker assumptions: Rebrova and Tikhomirov [24] established the same estimate as Rudelson and Vershynin for i.i.d centered random variables of variance 1 (in particular, not assuming the existence of any moments higher than the second moment), and very recently (in fact, after the first version of the current paper appeared on the arXiv), Livshyts, Tikhomirov, and Vershynin [19] obtained such an estimate for real random matrices M n whose entries are independent random variables satisfying a uniform anti-concentration estimate, and such that the expected sum of the squares of the entries is O(n 2 ).Both of these works build upon the geometric framework of Rudelson and Vershynin.
For many applications, one would like to study random matrices whose entries have non-zero means.Whereas the results mentioned in the previous paragraph allow non-centered entries to some extent, they are unable to handle means larger than some threshold, due to their reliance on controlling various norms of the matrix.For instance, even the case when the mean of every entry is allowed to be in [−n, n] has thus far remained out of reach of the geometric methods.Hence, the geometric methods fail to provide sufficiently powerful bounds in the important setting of smoothed analysis, which we now discuss.

Smoothed analysis of the least singular value
In their work on the smoothed analysis of algorithms [31,30] in numerical linear algebra, Spielman and Teng considered random matrices of the form M n := M + N n , where M is a fixed (possibly 'large') complex matrix, and N n is a complex random matrix with i.i.d.(centered) entries of variance 1.Their motivation for studying this distribution on matrices was based on the following insight -even if the desired input to an algorithmic problem is a fixed matrix M, it is likely that a computer will actually work with a perturbation M + N n , where N n is a random matrix representing the effect of 'noise' in the system.Sankar, Spielman, and Teng [28] dealt with the case when the noise matrix N n has i.i.d.standard Gaussian entries, and found that such noise has a regularizing effect i.e. with high probability, the least singular value of M n is sufficiently large, even if this is not the case for M itself.More precisely, they showed that for an arbitrary n × n matrix M, which is optimal up to the constant 2.35.The proof of Sankar, Spielman, and Teng relied on special properties of the Gaussian distribution.Recently, using significantly different techniques, Tikhomirov [40] obtained such a result for all N n with independent rows satisfying a technical assumption (this assumption is general enough to include isotropic log-concave distributions).Motivated by more realistic noise models, especially those in which the noise distribution is allowed to have atoms (for instance, this is always the case with computers, see also the discussion in [38]), Tao and Vu [33,38] investigated the lower tail behavior of s n (M n ) for very general noise matrices N n .Using the so-called inverse Littlewood-Offord theory from additive combinatorics (see the discussion in Section 1.3), they showed that for any complex random variable ξ with mean 0 and variance 1, and for any constants A,C > 0, there exists a constant B > 0 (depending on A,C, ξ ; this is in general necessary, see [38,Theorem 3.1]) such that for any complex matrix M with M := s 1 (M) ≤ n C , if N n is a complex random matrix whose entries are i.i.d.copies of ξ , then Explicit dependence of B on A,C, ξ was given in [35] and subsequently sharpened (but not optimally) in [38], although, for known applications of Equation ( 1) in the literature, the exact dependence of B on A,C, ξ is not important for the analysis to go through (see the discussion in [38]).However, in applications, it is crucial that one can allow A to be any positive constant -this allows one to obtain estimates on s n (M n ) which can survive even a polynomial-sized (in n) union bound.As an example, in Tao and Vu's celebrated proof of the strong circular law [35,39], it is essential to have an estimate of the form Equation (1) for some A > 1. Proving estimates of the form Equation (1) with A > 1 is significantly more involved than proving such estimates for some A > 0, and involves a much deeper understanding of the anti-concentration properties of vectors -in particular, a decomposition of the sphere into just 'compressible' and 'incompressible' vectors, as is done in [25,9], is insufficient for this purpose.
We also emphasize that the estimate in Equation (1) holds for any complex random variable with mean 0 and variance 1. Working with complex random variables of this generality provides significant additional challenges for the geometric methods, owing to the fact that the metric entropy of the unit sphere in C n is twice that of the unit sphere in R n (see the discussion in [27]).Consequently, works based on the geometric method have thus far imposed further conditions on the dependence between the real and imaginary parts of the complex random variable, most commonly requiring the real and imaginary parts to be independent (see, e.g.[27,20]) in order to deduce bounds comparable to Equation (1).

Our results
We introduce a new framework for providing estimates on the lower tail of s n (M n ) in the general setting of smoothed analysis, with a particular focus on values of η 'close' to 0 (as opposed to obtaining the correct order of magnitude for '99 percent' of such matrices) .Our approach differs both from the geometric methods of Rudelson and Vershynin, as well as the additive combinatorial methods of Tao and Vu.Before discussing this further, we record our main result.
Theorem 1.1.Let ξ be an arbitrary complex random variable with mean 0 and variance 1.Let M be an n × n complex matrix with M ≤ 2 n 0.001 and let M n = M + N n , where N n is a random matrix, each of whose entries is an independent copy of ξ .
Then, for all α ≥ 2 −n 0.001 and for all η ≤ (C where C 1.1 is a constant depending only on ξ . Remark 1.2.(1) When M ≥ √ n, the conclusion of Theorem 1.1 shows that for all t ∈ (0, 1), where c is an absolute constant and C, c are constants possibly depending on ξ .
(2) The choice of the upper bound 2 n 0.001 on M and α −1 is arbitrary and can certainly be improved, although we have made no attempt to do so. ( for some B depending on A and C, thereby recovering the result of Tao and Vu (up to the specific dependence of B on A and C, which, as noted earlier, is typically not important for applications).
Discussion: The main advantage of Theorem 1.1 over Equation ( 1) is that it is valid for α −1 , M ≤ 2 n 0.001 , whereas Equation (1) (recast in the form of Theorem 1.1) would provide a similar conclusion only for α −1 , M ≤ O(poly(n)).In particular, even in the case when M is polynomially bounded in n and ξ is a Rademacher random variable, Theorem 1.1 shows that M n is singular with probability at most 2 −n 0.001 , as compared to Equation (1), which only gives an inverse polynomial bound.
As mentioned earlier, our goal is to provide bounds in which one can take α to be very small (for instance, this is the case of interest in the singularity problem), and not so much on the exact relationship between η and α, M .However, we note that the main source of degradation in the relationship between η and α, M in Theorem 1.1 comes from a pigeonholing argument, introduced in [35].In [38], a better relationship between η and α, M is obtained using a more involved pigeonholing scheme.By using this more involved scheme, the relationship between η and α, M in Theorem 1.1 can be made comparable to the current best known one in [38], although we have not attempted to do so in order to keep the exposition simple and transparent.
While Theorem 1.1 significantly increases the range of validity of estimates like Equation (1), we feel that what is of greater interest are the proof techniques.Unlike the geometric methods, we make no use of net arguments (except very superficially).We also do not make any use of the inverse Littlewood-Offord theory of Tao and Vu.Instead, we utilize and extend an elementary combinatorial approach to the so-called 'counting problem in inverse Littlewood-Offord theory' (see the next subsection), recently developed by Ferber, Luh, Samotij, and the author [8] -this part of our paper may be of independent interest.
The benefit of this combinatorial approach to the counting problem is that it provides much better estimates than those that can be obtained from the inverse Littlewood-Offord theorems of Tao and Vu [37], and Nguyen and Vu [22] -this is, in part, because our approach is not hampered by the black-box application of heavy machinery from additive combinatorics.However, in contrast to the 'continuous inverse Littlewood-Offord theorems' ( [35,22]), we do not have a genuinely 'continuous version' of our counting results.This necessitates the need for additional arguments to reduce the quantitative invertibility problem to a situation where the 'discrete counting theorem' we do have may directly be applied.Such an argument first appears in [12], where the author was able to use certain 'rounding' arguments to avoid the need for a continuous version of the counting theorem; however, these arguments still relied on various norms of the random matrix not being too large, which is not true in the setting of smoothed analysis.Hence, the main technical challenge in the present work is to execute a version of these rounding arguments, even in the presence of large norms and heavy-tailed random variables.
At a high level, our work shows that for the purpose of controlling the smallest singular value of a random matrix, even in the general setting of smoothed analysis, a good solution to the discrete counting version of the inverse Littlewood-Offord problem (which, as we will see, is significantly easier to establish) is already sufficient.Note that a quantitatively weaker solution to this problem first appeared in the original breakthrough work of Tao and Vu on inverse Littlewood-Offord theory [36].However, in that work, the authors made use not just of the counting estimate, but also of the additive combinatorial structural information coming from the inverse Littlewood-Offord theorems in order to study the smallest singular value.

The counting problem in inverse Littlewood-Offord theory
In its simplest form, the so-called Littlewood-Offord problem, first raised by Littlewood and Offord in [17] asks the following question.Let a a a := (a 1 , . . ., a n ) ∈ (Z \ {0}) n and let ε 1 , . . ., ε n be i.i.d.Rademacher random variables.Estimate the largest atom probability ρ(a a a), which is defined by ρ(a a a) := sup x∈Z Pr (ε Littlewood and Offord showed that ρ(a a a) = O n −1/2 log n .Soon after, Erdős [4] gave an elegant combinatorial proof of the refinement ρ(a a a) ≤ n n/2 /2 n = O(n −1/2 ), which is tight, as is readily seen by taking a a a to be the all ones vector.These classic results of Littlewood-Offord and Erdős generated a lot of activity around this problem in various directions: higher-dimensional generalizations e.g.[14,15]; better upper bounds on ρ(a a a) given additional hypotheses on a a a e.g.[5,10,29]; and obtaining similar results with the Rademacher distribution replaced by more general distributions e.g.[6,10].
A new view was brought to the Littlewood-Offord problem by Tao and Vu [36,35] who, guided by inverse theorems from additive combinatorics, tried to find the underlying reason why ρ(a a a) could be large.They used deep Freiman-type results from additive combinatorics to show that, roughly speaking, the only reason for a vector a a a to have ρ(a a a) only polynomially small is that most coordinates of a a a belong to a generalized arithmetic progression (GAP) of 'small rank' and 'small volume'.Their results were subsequently sharpened by Nguyen and Vu [22], who proved an 'optimal inverse Littlewood-Offord theorem'.We refer the reader to the survey [23] and the textbook [32] for complete definitions and statements, and much more on both forward and inverse Littlewood-Offord theory.
Recently, motivated by applications, especially those in random matrix theory such as the ones considered in the present work, the following counting variant of the inverse Littlewood-Offord problem was isolated in work [8] of Ferber, Luh, Samotij, and the author: for how many vectors a a a in a given collection A ⊆ Z n is the largest atom probability ρ(a a a) greater than some prescribed value?The utility of such results is that they enable various union bound arguments, as one can control the number of terms in the relevant union/sum.One of the main contributions of [8] was to show that one may obtain useful bounds for the counting variant of the inverse Littlewood-Offord problem directly, without providing a precise structural characterization like Tao and Vu.Not only does this approach make certain arguments considerably simpler, it also provides better quantitative bounds for the counting problem, since it is not hampered by losses coming from the black-box application of various theorems from additive combinatorics.In [8,7,12], this work was utilized to provide quantitative improvements for several problems in combinatorial random matrix theory.
A natural question left open by this line of work is whether one can adapt the strategy of [8] to study the counting problem in inverse Littlewood-Offord theory with respect to general random variables as well.We note that the inverse Littlewood-Offord theorems in [22,35] are indeed applicable to these more general settings.However, since the proofs in [8] proceed by viewing (bounded) integer-valued random variables as random variables valued in F p (for sufficiently large p), it is not clear whether the combinatorial techniques there can be extended.Here, we show (Theorem 1.3), that the combinatorial arguments of [8] can be used in combination with (the dual of) the Fourier-analytic arguments in [35,22] to prove a counting result for very general distributions.The statement of the the following theorem uses Definition 2.1 and Definition 2.6.
There exists a constant C 1.3 ≥ 1, depending only on C ξ , for which the following holds.Let n, s, k and p is an odd prime such that where ϕ p denotes the natural map from Remark 1.4.The inverse Littlewood-Offord theorems may be used to deduce similar statements, provided we further assume that ρ ≥ n −C for some constant C > 0. The freedom of taking ρ to be much smaller is the source of the quantitative improvements in Theorem 1.1.
Organization: The rest of this paper is organized as follows.In Section 2, we collect some preliminary results on anti-concentration; the main result of this section is Proposition 2.8.In Section 3, as a warm-up (included in lieu of an informal sketch of the proof), we provide a proof of Theorem 1.1 under the additional assumption that the random variable ξ is subgaussian.In Section 4, we provide a proof of Theorem 1.1; this follows essentially the same outline as in the subgaussian case, with the main difference being Proposition 4.16 (and the supporting results required to prove it).Finally, in Section 5, we prove Theorem 1.3.
Notation: Throughout the paper, we will omit floors and ceilings when they make no essential difference.
For convenience, we will also say 'let p = x be a prime', to mean that p is a prime between x and 2x; again, this makes no difference to our arguments.We will use S 2n−1 to denote the set of unit vectors in C n , B(x, r) to denote the ball of radius r centered at x, and ℜ(v v v), ℑ(v v v) to denote the real and imaginary DISCRETE ANALYSIS, 2021:10, 40pp.
parts of a complex vector v v v ∈ C n .As is standard, we will use [n] to denote the discrete interval {1, . . ., n}.
We will also use the asymptotic notation , , , to denote O(•), Ω(•), o(•), ω(•) respectively.For a matrix M, we will use M to denote its standard 2 → 2 operator norm.All logarithms are natural unless noted otherwise.

Preliminaries
In this section, we collect some tools and auxiliary results that will be used throughout the rest of this paper.
Remark 2.2.(1) For lightness of notation, we have chosen to omit the ambient dimension n from ρ r,ξ ξ ξ (v v v).This should not create any confusion since the dimension of v v v or ξ ξ ξ will always be clear from context.
(2) Note that when n = 1 and ξ is a random variable taking values in C, we have that ρ r,ξ (1) = sup x∈C Pr(ξ ∈ B(x, r)).We will use this notation repeatedly.
(3) Moreover, when the components of ξ ξ ξ are i.i.d.copies of some random variable ξ , we will sometimes abuse notation by using ρ r,ξ (v v v) to denote ρ r,ξ ξ ξ (v v v).
(4) If ξ ξ ξ is a random vector whose distribution coincides with that of a random vector ξ ξ ξ conditioned on some event E, then we will often denote ρ r, ξ ξ ξ (v v v) by ρ r,ξ ξ ξ |E (v v v).The next lemma shows that weighted sums of random variables which are not close to being a constant are also not close to being a constant.Lemma 2.3.(see, e.g., Lemma 6.3 in [38]) Let ξ be a complex random variable with finite non-zero variance.Then, there exists a constant c 2.3 ∈ (0, 1), depending only on ξ , such that Combining this with the so-called tensorization lemma (see Lemma 2.2 in [26]), we get the following estimate for 'invertibility with respect to a single vector'.
Lemma 2.4.Let ξ be a complex random variable with finite non-zero variance.Let M be an arbitrary n × n matrix and let N n be a random matrix each of whose entries is an independent copy of ξ .Then, for where c 2.4 ∈ (0, 1) is a constant depending only on ξ .
We will also need the following simple fact, which compares the Lévy concentration function with respect to a random vector to the Lévy concentration function with respect to a conditioned version of the random vector.Lemma 2.5.Let ξ ξ ξ := (ξ 1 , . . ., ξ n ) be a complex random vector, let G be an event depending on ξ ξ ξ , and let ξ ξ ξ denote a random vector distributed as ξ ξ ξ conditioned on G.Then, for any v v v ∈ C n and for any r ≥ 0, Taking the supremum of the left hand side over the choice of x ∈ C, and then taking the limit of the right hand side as ε → 0 completes the proof.
In order to state the main assertion of this subsection (Proposition 2.8), we need the following definition.
Definition 2.6.We say that a random variable where ξ 1 and ξ 2 denote independent copies of ξ .The smallest C ≥ 1 with respect to which ξ is C-good will be denoted by C ξ .
The following lemma shows that the general random variables with which we are concerned in this paper (i.e.complex random variables with finite non-zero variance) are indeed C-good for some finite C, so that there is no loss of generality for us in imposing this additional restriction.Lemma 2.7.Let ξ be a complex random variable with variance 1.Then, ξ is C ξ -good for some C ξ ≥ 1.
We conclude this subsection with the following proposition, which roughly states that the Lévy concentration function of a vector with no suitable multiple sufficiently close to a Gaussian integer vector must be small.This will prove crucial in our replacement of applications of the continuous inverse Littlewood-Offord theorem by Theorem 1.3.Proposition 2.8.Let ξ 1 , . . ., ξ n be independent copies of a C ξ -good complex random variable ξ .Let v v v := (v 1 , . . ., v n ) ∈ C n \ {0 0 0}.Suppose the following holds: there exists some f Then, for any r ≥ 0, where C 2.8 ≥ 1 and c 2.8 > 0 are constants depending only on C ξ .
The proof of this proposition requires the following preliminary definition and short Fourier-analytic lemmas from [35], along with a 'doubling trick' appearing in [11].Definition 2.9.Let ξ be an arbitrary complex random variable.For any w ∈ C, we define , where ξ 1 , ξ 2 denote i.i.d.copies of z and • R/Z denotes the distance to the nearest integer.
Lemma 2.11 (Lemma 4.5 (iii) in [35]).For v v v, w w w ∈ C n , let v v vw w w ∈ C 2n denote the vector whose first n coordinates coincide with v v v and last n coordinates coincide with w w w.Then, Proof of Proposition 2.8.Let w w w ∈ C 2n denote the vector whose first n components are v v v and last n components are iv v v.Then, we have where the first line uses ρ r,ξ (v v v) = ρ r,ξ (iv v v), the second line is due to Lemma 2.10, the third line follows from Lemma 2.11, and the last line is again due to Lemma 2.10.Next, note that where the final inequality follows from the C ξ -goodness of ξ .Therefore, from Jensen's inequality, we get that Let Then, we can bound the integral on the right hand side in Equation (3) from above by Let us, in turn, bound each of these three terms separately.
• For the first term, we have the estimate • For the second term, we begin by noting that since DISCRETE ANALYSIS, 2021:10, 40pp.
• For the third term, we have the estimate Finally, summing the estimates in the previous three bullet points and taking the square root gives the desired conclusion.
3 Warm-up: proof of Theorem 1.1 in the subgaussian case In this section, we will discuss the proof of Theorem 1.1 in the special case when the entries are further assumed to be i.i.d.subgaussian.This will allow the reader to see many of the key ideas and calculations in a simpler, less technical, setting.Our general reduction and outline follows Tao and Vu [35,38]; as mentioned in the introduction, the main difference is the replacement of the crucial continuous inverse Littlewood-Offord theorem.
Definition 3.1.A complex random variable ξ is said to be C-subgaussian if, for all t > 0, For the remainder of this section, we fix a centered Cξ -subgaussian complex random variable ξ with variance 1.Our goal in this section is to prove the following subgaussian version of Theorem 1.1.Theorem 3.2.Let ξ be a centered Cξ -subgaussian complex random variable with variance 1.Let M be an n × n complex matrix with M ≤ 2 n 0.001 and let M n = M + N n , where N n is a random matrix, each of whose entries is an independent copy of ξ .
Then, for all α ≥ 2 −n 0.001 and for all η ≤ (C where C 3.2 ≥ 1 is a constant depending only on ξ .

Properties of subgaussian random variables
A basic and important fact about subgaussian random variables is the so-called subgaussian concentration inequality.
Lemma 3.3 (see, e.g., Proposition 5.10 in [42]).Let ξ 1 , . . ., ξ n be independent centered Cξ -subgaussian complex random variables.Then, for every v v v := (v 1 , . . ., v n ) ∈ C n and for every t ≥ 0, we have Proof.For r 2 ≥ 0, let E r 2 denote the event that Fix ε > 0, and let x ∈ C be such that Then, where the second line follows from the triangle inequality.
Taking the supremum of the left hand side over the choice of x ∈ C, and then taking the limit on the right hand side as ε → 0 gives the desired conclusion.
Remark 3.5.As will be seen later, the key technical challenge in extending the proof of Theorem 1.1 from the subgaussian case to the general case is the unavailability of Proposition 3.4.
Finally, we need the following well-known estimate on the operator norm of a random matrix with i.i.d.subgaussian entries, which may be proved by combining the subgaussian concentration inequality with a standard epsilon-net argument.Lemma 3.6 (see, e.g., Lemma 2.4 in [26]).Let N n be an n × n random matrix whose entries are i.i.d.centered Cξ -subgaussian complex random variables.Then, where C 3.6 ≥ 1 depends only on Cξ .
We may assume without loss of generality that M ≥ 2C 3.6 √ n as otherwise, an improved version of Theorem 1.1 already follows from the main result in [11].We may also assume that η ≥ 2 −n 0.01 , since the statement of Theorem 3.2 for smaller values of η follows from the result for η = 2 −n 0.01 .Following Tao and Vu [35], we call a unit vector v v v ∈ C n poor if we have and rich otherwise.We use P P P(β ) and R R R(β ) to denote, respectively, the set of poor and rich vectors.Accordingly, we have Therefore, Theorem 3.2 is a consequence of the following two propositions and the union bound.Proposition 3.7.Pr (∃v v v ∈ P P P(β The proof of Proposition 3.7 is relatively simple, and follows from a conditioning argument developed in [18] (see, e.g., the proof of Lemma 11.3 in [35]).We omit the details here, since later in Proposition 4.7, we will prove a similar (but more complicated, and with a slightly different conclusion) statement.
The proof of Proposition 3.8 will occupy the remainder of this section.We begin with some preliminary results about the structure of rich vectors.
The first result is a simple observation due to Tao and Vu [35] showing that for every rich vector, there exists a sufficiently large interval such that the Lévy concentration function of the vector is 'approximately constant' at any radius in this interval.Lemma 3.9.For any v v v ∈ R R R(β ), there exists some j ∈ {0, 1, . . ., J(β , n)} such that Remark 3.10.Compared to the trivial covering bound: the above lemma represents a tremendous saving, which will be crucial for our arguments.The factor n 1/100 in the lemma can be replaced by n 1/2−ε at the expense of choosing different parameters in the rest of this section.
Proof.For j ∈ {0, 1, . . ., J(β , n)}, note that the quantities are increasing in j, and range between β and 1.Therefore, the pigeonhole principle gives the required conclusion.
The next structural result, which is an immediate corollary of Proposition 2.8, shows that every rich vector has a scale at which it can efficiently approximated by a Gaussian integer vector.Lemma 3.13.Let a a a ∈ R R R j, (β ).Then, there exists some Suppose for contradiction that the desired conclusion does not hold.Then, for all Hence, by Proposition 2.8, The utility of the previous lemma is that it allows us to reduce Proposition 3.12 to a statement about Gaussian integer vectors, which we then prove via a union bound.Indeed, let O be the event that the operator norm of N n is at most C 3.6 √ n.By Lemma 3.6, Suppose the event in the first term on the right occurs.Let a a a ∈ R R R j, (β ) be such that M n a a a 2 ≤ η, and let ) n be such that the conclusion of Lemma 3.13 holds for a a a, D, v v v .Let Then, by the triangle inequality, we have where the fourth line holds since |D|n −1/2 ≤ n 1/20 n −1/2 ≤ 1, and the last line holds because of the assumption that M ≥ 2C 3.6 √ n.Hence, letting X i denote the i th row of M n , it follows from Markov's inequality that there are at least n := n − n 0.1 coordinates i ∈ [n] for which To summarize, setting we have proved

Counting Gaussian integer vectors approximating scaled rich vectors
In this subsection, we will control the size of R R R j, (β ).This is essentially the only place in the argument where we use the subgaussianity of the random variable ξ (via the application of Proposition 3.4).Proof.We will obtain a good lower bound on ρ 1,ξ (v v v ) and then appeal to Theorem 1.3 for a suitable choice of parameters.For the lower bound, let , where Then, where the first inequality follows from Proposition 3.4, the third inequality follows since |D| −1 n 0.15 ≥ n −1/20 n 0.15 ≥ 1, and the last inequality follows from ρ 2η √ n,ξ (a a a) ≥ β exp(−n 0.1 ).Hence, by the pigeonhole principle, we must have where the final inequality holds since a a a ∈ R R R j, (β ).To summarize, using notation as in Theorem 1.3, we have shown that Applying Theorem 1.3 with the parameters s = n 0.9 , k = n 0.1 , and p = 2 n 0.04 , we find that it follows that we see that the map ϕ p is an injection on R R R j, (β ) ⊆ V V V 2 − /64n 0.30 , which completes the proof.

Proof of Proposition 3.12
Since we already have control on the size of R R R j, (β ), in order to prove Proposition 3.12 via Proposition 3.14, it suffices to have good control over ρ 3 M ,ξ (v v v ).This is provided by the following lemma.
Lemma 3.17.For any v v v ∈ R R R j, (β ), , it follows from Proposition 3.4 that (with notation as in the proof of Proposition 3.15) for all n sufficiently large, where the third line follows from 8 M |D| −1 2 M f (β ) −1 .We also have where the fourth line follows from Lemma 3.9, the fifth line follows since a a a ∈ R R R j, (β ), and the last line follows since 2 − ≥ β exp(−n 0.2 ).
The proof of Proposition 3.12 is now immediate.
Proof of Proposition 3.12.We have where the first line follows from Proposition 3.14, the second line follows from Lemma 3.17, the third line follows from Proposition 3.15, and the last line follows since 2 ≤ β −1 2 n 0.02 .
4 Proof of Theorem 1.1

Lévy concentration functions of ∞ -close vectors
As mentioned earlier, the key technical difficulty in the proof of Theorem 1.1 compared to the proof of Theorem 3.2 is the unavailability of Proposition 3.4.Instead, we have the following substitute.
Proposition 4.1.Let ξ ξ ξ := (ξ 1 , . . ., ξ n ) ∈ C n be a complex random vector whose entries are independent copies of a complex random variable ξ with mean 0 and variance 1.In order to prove this proposition, we will need some facts about concentration on the symmetric group.The following appears as Lemma 3.9 in [24], and is a direct application of Theorem 7.8 in [21].Lemma 4.2 (Lemma 3.9 in [24]).Let y y y := (y 1 , . . ., y n ) be a non-zero complex vector and let Then, for all t > 0, where the probability is with respect to the uniform measure on S n .Remark 4.3.In [24], the above lemma is stated for v v v ∈ {±1} n , but exactly the same proof shows that the conclusion also holds for any v v v ∈ [−1, 1] n .Also, it is stated and proved (with better constants) for real vectors y y y.However, the version above for complex vectors immediately follows from this by separately considering the real and imaginary parts of h and using the union bound.
We will use this lemma via the following immediate corollary.Then, for all t , where g(π) := ∑ n i=1 v π(i) w i .Therefore, by Lemma 4.2, for all t > 0, The desired statement now follows from the triangle inequality and the estimate on Eh.
We can now prove Proposition 4.1.

Proof of Proposition 4.1. Consider the random variable
Indeed, since the distribution of the random vector ξ ξ ξ , even after conditioning on the event G ε , is invariant under permuting its coordinates, it suffices to show (by the law of total probability) that for any fixed vector w w w := (w 1 , . . ., )+ε , and for any Next, fix δ > 0, and let x ∈ C be such that Then, for any t ≥ n (1/2)+ε , setting Taking the supremum of the left hand side over the choice of x ∈ C, and then taking the limit on the right hand side as δ → 0 gives the desired conclusion.

Regularization of N n
In order to make use of the results of the previous subsection, we need that, with high probability, almost all of the rows of N n satisfy the event G ε .This follows using a straightforward application of the standard Chernoff bound.
Lemma 4.5 (Lemma 5.3 in [11]).Let N n := (a i j ) be an n × n complex random matrix with i.i.d.entries, each with mean 0 and variance 1.For ε ∈ (0, 1/2), let I ⊆ [n] denote the (random) subset of coordinates such that for each i ∈ I, Let R ε denote the event that |I c | ≤ 2n 1−ε .Then, We will also need the following (trival) bound on the probability that the operator norm of N n is too large.Lemma 4.6.Let N n := (a i j ) be an n × n complex random matrix with independent entries, each with mean 0 and variance 1.Then, for any L ≥ 1, Henceforth, let O β denote the event that N n ≤ β −1/2 n; by the above lemma, this occurs except with probability at most β .Moreover, let S(β

Rich and poor vectors
For the remainder of this section, we fix an n×n complex matrix M and parameters α, η ∈ (0, 1) satisfying the restrictions of the statement of Theorem 1.1.Also, let We may further assume that η ≥ 2 −n 0.01 , since the statement of Theorem 1.1 for smaller values of η follows from the statement for η = 2 −n 0.01 .We call a unit vector v v v ∈ C n poor if we have and rich otherwise.We use P P P(β ) and R R R(β ) to denote, respectively, the set of poor and rich vectors.As before, Theorem 1.1 follows from the following two propositions and the union bound.

Eliminating poor vectors
Compared to Proposition 3.7, the proof of Proposition 4.7 requires more work, since we need to work In order to do this, we start by first eliminating 'compressible' vectors.
Definition 4.9 (Definition 3.2 in [26]).Let (2) A vector x x x ∈ S 2n−1 is called compressible if x x x is within Euclidean distance δ 2 from the set of all sparse vectors.
Remark 4.11.We have used the terminology of 'compressible' and 'incompressible' vectors mostly for convenience, and our use of these notions is rather different from that in the work of Rudelson and Vershynin.In particular, the only property of incompressible vectors we use is captured in the above remark, which is much weaker than what is used by the geometric methods.
Lemma 4.12.Let C ε,β denote the event that there exists some v v v ∈ Comp(2n where C 4.12 ≥ 1 and c 4.12 > 0 are constants depending only on ξ .
Proof.By losing an additive error term which is at most β , it suffices to bound Pr and v v v is supported on at most 2n 1−ε coordinates.Moreover, by the definition of N N N, there exists some By the triangle inequality, we see that On the other hand, by Lemma 2.4, we see that for any fixed Therefore, taking the union bound over all where the final inequality follows since S(β Proof of Proposition 4.7.By Lemma 4.5 and Lemma 4.12, after losing an additive error term of β + O(exp(−n 1−ε /4)), it suffices to bound the probability of the event intersected with where R ε,I denotes the event that the rows of N n satisfying Equation ( 4) are exactly those indexed by the subset I, it suffices (by the law of total probability) to show that for any For the remainder of the proof, fix such an I.By reindexing the coordinates, we may further assume that Since M † n and M n have the same singular values, it follows that a necessary condition for a matrix M n to satisfy the above event is that there exists a unit vector a a a = (a 1 , . . ., a n ) such that a a a ∈ Incomp(2n 1−ε , S(β ) −1 ) and a a a T M n 2 ≤ η.To every matrix M n , associate such a vector a a a arbitrarily (if one exists) and denote it by a a a M n ; this leads to a partition of the space of all matrices with least singular value at most η.By Remark 4.10, since To every a a a M n , associate such an index i ∈ I arbitrarily, and denote it by i(M n ).Then, by taking a union bound over the choice of i ∈ I, it suffices to show the following.
To this end, we expose the last n − 1 rows X 2 , . . ., X n of M n .Note that if there is some v v v ∈ P P P(β ) satisfying M n v v v 2 ≤ η, then there must exist a vector y y y ∈ P P P(β ), depending only on the last n − 1 rows X 2 , . . ., X n , such that In other words, once we expose the last n − 1 rows of the matrix, either the matrix cannot be extended to one satisfying the event in Equation ( 5), or there is some unit vector y y y ∈ P P P(β ), which can be chosen after looking only at the last n − 1 rows, and which satisfies the equation above.For the rest of the proof, we condition on the last n − 1 rows X 2 , . . ., X n (and hence, a choice of y y y).
For any vector w w w ∈ S 2n−1 with w 1 = 0, we can write where u u u := w w w T M n .Thus, restricted to the event where the second line is due to the Cauchy-Schwarz inequality and the particular choice w w w = a a a M n .Since, conditioned on R ε,[|I|] , the first row of N n is distributed as ξ ξ ξ |G ε , it follows that the probability in Equation ( 5) is bounded by ρ 2ηS(β ) √ n,ξ ξ ξ |G ε (y y y) ≤ β , which completes the proof.
We begin with the following analogue of Lemma 3.13 Lemma 4.14.Let a a a ∈ R R R j, (β ).Then, there exists some D ∈ C with |D| ∈ [ f (β ), n 1/20 ] and some v v v ∈ Proof.Let g(n) = n 1/20 and w w w := (2ηS(β ) √ n) −1 (2S(β ) f (β ) −1 ) − j a a a. Suppose for contradiction that the desired conclusion does not hold.Then, the same computation as in the proof of Lemma 3.13 shows that Finally, since Pr(G ε ) > 1/2 by Markov's inequality, it follows from Lemma 2.5 that Then, the same computation as in the subgaussian case shows that if the event in the statement of Proposition 4.13 occurs, then there must exist some v v v ∈ R R R j, (β ) for which Hence, letting X i denote the i th row of M n , it follows from Markov's inequality that, given any I ⊆ [n] with |I c | ≤ 2n 1−ε , there are at least n − 3n 1−ε coordinates i ∈ I for which Thus, we see that for any such I, As in Lemma 3.17, we have Lemma 4.15.For any v v v ∈ R R R j, (β ), , it follows from Proposition 4.1 that (with notation as in the proof of Lemma 3.17) for all n sufficiently large, where the second line follows from S(β )|D| −1 ≤ S(β ) f (β ) −1 and the third to last line follows from Lemma 2.5.We also have ≤ 4n 1/100 2 − , which completes the proof.
Given the previous lemma and Equation (6), the same calculation as in the proof of Proposition 3.8 shows that the following suffices to prove Proposition 4.13.Let v v v be the vector which agrees with v v v on T and with v v v on T c .Then, where the first inequality follows from Proposition 4.1, the third inequality follows since |D| −1 n 0.15 ≥ n −1/20 n 0.15 ≥ 1, and the last inequality follows from ρ 2ηS(β ) √ n,ξ (a a a) ≥ β exp(−n 0.01 ).Hence, by the pigeonhole principle and by Lemma 2.5, we must have where the final inequality holds since a a a ∈ R R R j, (β ).Let v v v denote the integer vector which agrees with v v v (and hence, v v v ) on T and is 0 on T c .Then, To summarize, using notation as in Theorem 1.3, we have shown that for every vector v v v ∈ R R R j, (β ), there exists some T ⊆ [n] with |T c | ≤ n 0.95 such that v v v agrees with some element of V V V 2 − /128n 0.30 on T .Since each coordinate of v v v is an integer with absolute value at most Finally, the calculation in the proof of Proposition 3.15 shows that which, together with the previous equation, completes the proof.
5 Proof of Theorem 1.3 The proof of Theorem 1.3 consists of six steps.The first three steps are modelled after the proof of the optimal inverse Littlewood-Offord theorem of Nguyen and Vu [22], whereas the last three steps are modelled after Halász's proof of his anti-concentration inequality [10].
Step 1: Extracting a large sublevel set.For each integer 1 ≤ m ≤ M, where M := 2s/k, we define it follows from Lemma 2.10 that In particular, since it is assumed that ρ ∑ M m=1 e −m/4 .Note that in the last line, we have used the fact that ∑ ∞ m=1 e −m/4 = O(1).Therefore, by averaging with respect to the probability measure {c m } M m=1 , it follows that there must exist some non-zero integer Step 2: Eliminating the z-norm.From here on, all implicit constants will be allowed to depend on C z .Since S m 0 ⊂ B(0, √ m 0 ), it follows (by averaging) that there must exist some we have that µ(T m 0 ) ρ exp(m 0 /8).
Next, let y := z 1 − z 2 , where z 1 , z 2 are i.i.d.copies of z.Since it follows that there exists some y where the final inequality follows from the C z -goodness of z.Hence, by Markov's inequality, Since T m 0 ⊂ B(0, 1/8C z ), this shows that Finally, after replacing ξ by y 0 ξ , and noting that the change of measure factor lies in [C −1 z ,C z ], it follows that Step 3: Discretization of ξ .For p a prime as in the statement of the theorem, let and consider the random set x + B 0 , where x ∈ [0, 1/p] + i[0, 1/p] is a uniformly distributed random point.Then, by linearity of expectation, we have so there exists some x 0 ∈ [0, 1/p] + i[0, 1/p] for which Let us now 'recenter' this shifted lattice.Note that for a fixed ξ 0 ∈ (x 0 + B 0 ) ∩ T m 0 , we have for any and for all ξ ∈ P m 0 , Step Note that since v i ∈ Z + iZ, the map is indeed well-defined as a map from F p + iF p to [0, 1].Note also that, since P m 0 ⊂ B 1 , the size of P m 0 (I) (as a subset of F p + iF p ) is atleast the size of P m 0 (as a subset of 1 p • (Z + iZ)) i.e. the way we have defined various objects ensures that there are no wrap-around issues.We claim that for all integers t ≥ 1, tP m (I) ⊆ P t 2 m (I).
Indeed, for r 1 , . . ., r t ∈ P m (I) ⊆ F p + iF p , we have which gives the desired inclusion.We now use the Cauchy-Davenport theorem for F p + iF p F 2 p (see, e.g., [3]), which states that every pair of nonempty A, B ⊆ F p + iF p satisfies |A + B| ≥ min{p 2 , |A| + |B| − p}.

It follows that for all integers
Hence, by Equation ( 7), we have We also claim that |P m (I)| < p 2 as long as m ≤ |I|/500C z .Indeed, since the map where the second last line follows again using the integrality of ℜ{v 1 }, ℑ{v 1 } . . ., ℜ{v n }, ℑ{v n }.
From here on, we will use a slight modification of the results of [8] to finish the proof.We begin with the following key definition.Definition 5.2.Suppose that v v v ∈ (F p + iF p ) n for an integer n and a prime p, and let k ∈ N.For every α ∈ [−1, 1], we define R α k (v v v) to be the number of solutions to The following elementary lemma from [8] shows that for 'small' positive α, [8]).For all integers k, n with k ≤ n/2, any prime p, vector v v v ∈ (F p + iF p ) n , and α ∈ The latter quantity is bounded from above by the number of sequences (i 1 , . . ., i 2k ) ∈ [n] 2k with at most (1 + α)k distinct entries times 2 2k , the number of choices for the ± signs.Thus where the final inequality follows from the well-known bound a b ≤ (ea/b) b .Finally, noting that 4e 1+α ≤ 4e 2 ≤ 40 completes the proof.
Let v v v I denote the |I|-dimensional vector obtained by restricting v v v to the coordinates corresponding to I. Recognizing the right hand side of Equation (9) as , it follows from Equation (9) and the above lemma that for any k ≤ |I| and α ∈ [0, 1/8], where the second line follows from the assumption that Mk ≤ 2s ≤ 2|I|, the third line follows from the assumption that k ≤ √ s ≤ |I|, and the fifth line follows from the assumption that ρ > s −k/4 ≥ s −(k/2)+2αk ≥ |I| −(k/2)+2αk .
Step 6: Applying the counting lemma.Let us summarize where we stand.We have proved that for any complex random variable z satisfying Equation ( 2 Hence, it follows that We will bound the size of each of these pieces separately.For |X X X s |, the following simple bound suffices: On the other hand, the desired bound on Y Y Y α k,s,ρ (m) follows easily from a slight modification of the work in [8].The proof of this theorem follows easily from a slight modification of the proof of Theorem 1.7 in [8].For the reader's convenience, we provide complete details in Appendix A. Claim A.1.The number of triples in Z is at most (s/n) 2k−1 • 2 n−s n!/s! 2k .
Proof.One can construct any such triple as follows.First, choose an s-element subset of [n] to serve as I.

|Z|
Claim A.2.Each triple from Z is compatible with at most p 2s sequences a a a ∈ (F p + iF p ) n .
Proof.Using a, we may rewrite Equation (12) as It follows from b that once a triple from Z is fixed, the right-hand side above depends only on those coordinates of the vector a a a that are indexed by i ∈ I ∪ {i s+1 , . . ., i j−1 }.In particular, for each of the p 2s possible values of (a i ) i∈I , there is exactly one way to extend it to a sequence a a a ∈ (F p + iF p ) n that satisfies Equation ( 12) for every j.Proof.Given any such a a a, we may construct a compatible triple from Z as follows.Considering all j ∈ {n, . . ., s + 1} one by one in decreasing order, we do the following.First, we find an arbitrary solution to ±a 1 ± a 2 ± • • • ± a 2k = 0 (13) such that 1 , . . ., 2k ∈ [n] \ {i n , . . ., i j+1 } and such that 2k is a non-repeated index (i.e., such that 2k = i for all i ∈ [2k − 1]).Given any such solution, we let 2k serve as i j , we let the sequence ( 1 , . . ., 2k ) serve as F j , and we let ε ε ε j be the corresponding sequence of signs (so that Equation ( 12) holds).The assumption that a a a ∈ B B B α k,s,≥t (n) guarantees that there are at least t • 2 2k •(n− j+1) 2k p many solutions to Equation ( 13), each of which has at least 2αk nonrepeated indices.Since the set of all such solutions is closed under every permutation of the i s (and the respective signs), 2k is a non-repeated index in at least an α-proportion of them.

2 2 ,Proposition 3 . 4 . 3 .3 r 2 2 a a a − b b b 2 2 .
where c 3.3 > 0 is a constant depending only on Cξ .The subgaussian concentration inequality allows us to show that if a a a, b b b ∈ C n are close in Euclidean distance, then the Lévy concentration functions of a a a and b b b are close in a suitable sense as well.More precisely: Let ξ ξ ξ := (ξ 1 , . . ., ξ n ) be a complex random vector whose entries are independent centered Cξ -subgaussian complex random variables.Then, for every a a a := (a 1 , . . ., a n ), b b b := (b 1 , . . ., b n ) ∈ C n , and for every r 1 , r 2 ≥ 0, we have ρ r 1 +r 2 ,ξ ξ ξ (b b b) ≥ ρ r 1 ,ξ ξ ξ (a a a) − 3 exp − c 8 exp(−c 3.8 n), where C 3.8 ≥ 1 and c 3.8 > 0 are constants depending only on ξ .
where X X X s := {a a a ∈ (F p + iF p ) n : |supp(a a a)| < s} , and Y Y Y α k,s,ρ (m) := a a a ∈ (F p + iF p ) n : |supp(a a a)| = m and R α k (a a a I ) ≥ 2 2k |I| 2k ρ √ M C ∀I ⊆ supp(a a a) with |I| ≥ s .

Claim A. 3 .
Each sequence a a a ∈ B B B α k,s,≥t is compatible with at least

.•
Finally, we let I = [n] \ {i n , . . ., i s+1 }.Since different sequences of solutions lead to different triples, it follows that the number Z of compatible triples satisfies Counting the number P of pairs of a a a ∈ B B B α k,s,≥t (n) and a compatible triple from Z, we have |B B B α k,s,≥t (n)| • p 2s , which yields the desired upper bound on |B B B α k,s,≥t (n)|.