A REMARK ON THE SMALLEST SINGULAR VALUE OF POWERS OF GAUSSIAN MATRICES

Let n, k ≥ 1 and let G be the n×n random matrix with i.i.d. standard real Gaussian entries. We show that there are constants ck, Ck > 0 depending only on k such that the smallest singular value of G satisfies ck t ≤ P { smin(G ) ≤ t n } ≤ Ck t, t ∈ (0, 1], and, furthermore, ck/t ≤ P { ‖G‖HS ≥ t k n } ≤ Ck/t, t ∈ [1,∞), where ‖ · ‖HS denotes the Hilbert–Schmidt norm.


Introduction
Everywhere in the paper, G denotes an n × n random matrix with i.i.d. real valued standard Gaussian entries. The smallest singular value and the condition number of standard square Gaussian matrices (and other random matrix models) are classical objects of interest within the random matrix theory. The condition number κ(A) = s max (A)/s min (A) of a matrix A is of importance as a simple estimator of the relative error when solving the linear system Ax = b with the coefficient vector b known up to some additive error (see, for example, [7]).
In 1940-es, von Neumann and Goldstine [4] conjectured that the "typical" value of s min (G) is of order n −1/2 , while the condition number κ(G) = s max (G)/s min (G) is of order n. The conjecture was rigorously established by Edelman [2] and, independently, by Szarek [8]. The proofs in [2,8] use as the central element a formula for the joint distribution of singular values of G. In particular, the following estimate for the smallest singular value of G was obtained in [2,8]: (1) P s min (G) ≤ t n −1/2 = Θ(t), t ∈ (0, 1]. Here, we adopt the "big theta" notation: given two non-negative functions f (t) and g(t) defined on the same domain, we write f (t) = Θ(g(t)) if C −1 f (t) ≤ g(t) ≤ Cg(t) for all t and some universal constant C ≥ 1. When the constant is allowed to depend on a parameter, we add the parameter as a subscript for Θ. Numerous results dealing with invertibility of non-Gaussian random models have appeared in literature. We prefer to avoid discussion of that (very active) research direction in this note. Let us refer to surveys [6] and [5] which give some (partial) account of the subject. Returning to linear systems with random coefficients, it seems natural to consider the situation when we are given a linear system of the form G k x = b, where k ≥ 1 is fixed, and would like to estimate the relative error of the obtained solution when b is known up to some additive error. In this case, we could ask what is the typical value of the condition number of G k and, moreover, what are optimal large deviation estimates for κ(G k )? Since the largest singular value of G k is of order Θ k (n k/2 ) with a very large probability, the question essentially amounts to computing small ball probabilities for s min (G k ). Obviously, the trivial relation s min (G k ) ≥ (s min (G)) k and the known estimates for s min (G) immediately imply probabilistic estimates for s min (G k ), which, however, turn out to be suboptimal. In this note, we are interested in non-asymptotic estimates which are sharp up to multiplicative constants. To authors' best knowledge no such results have been previously noted in the literature. The main statement of the note is Theorem 1.1. Let n, k ≥ 1 and let G be the n × n matrix with i.i.d. standard Gaussian entries. Then Here, · HS denotes the Hilbert-Schmidt norm of a matrix.
Acknowledgement. The second named author is partially supported by the Sloan Research Fellowship. Both authors are grateful to Mark Rudelson for interesting discussions.

Proof
Our proof relies on the following simple observation. Let be the singular value decomposition of G, so that Σ is the (random) diagonal matrix with the singular values of G arranged in the non-increasing order on the main diagonal, and U, V are (random) orthogonal matrices. Further, let W be an n × n random orthogonal matrix uniformly distributed on O n (R) (with respect to the Haar measure), which is independent from {U, Σ, V }. Then, in view of the invariance of the Gaussian distribution under orthogonal transformations, the matrix W G is equidistributed with G, whence where the random orthogonal matrix Q := U ⊤ W ⊤ V is uniformly distributed on O n (R) and is independent from Σ, G. Similarly, we have where · denotes the spectral norm. Thus, the problem of estimating the right tail of the distribution of G −k HS (and of G −k ) can be viewed as a particular case of a more general question of studying the distribution of the matrix product (T W ) k−1 T , where T is a fixed diagonal matrix and W is uniformly distributed on O n (R).
. . , τ n ) be an n × n fixed diagonal matrix with non-negative entries, and let W be a uniform random orthogonal matrix. Take any k ∈ N. Then • For any even positive integer m and any i, j ∈ [n] we have where C k,m > 0 depends only on k and m.
• The expectation of the squared Hilbert-Schmidt norm of (T W ) k T satisfies where C k > 0 only depends on k. • For any i ≤ n, denoting by T (i, s) the diagonal matrix with the i-th diagonal entry equal to s and all other entries equal to the corresponding entries of T , we have where c k > 0 may only depend on k.
Let us postpone the proof of the proposition till the end of the section, and complete the proof of the main result of the paper.
It is clear that non-asymptotic estimates for the Hilbert-Schmidt norm of the inverse of the standard Gaussian matrix can be obtained by analysis of the joint distribution of its singular values, similar to [2,8]. However, we were not able to locate a "ready-to-reference" result of this kind in the literature, and instead will use a more general statement about the Hilbert-Schmidt norm of the inverse of a random matrix with i.i.d. entries with a continuous distribution [9, Theorem 1.1], which implies, in particular, that for a universal constant C 3 ≥ 1. Now, using (2) and (3), it is easy to obtain the required upper bound on the right tail of G −k HS .
Now, fix any a ′ = (a 1 , . . . , a n−1 ) ∈ L ′ , and apply the third assertion of Proposition 2.1: denoting by T (a ′ , s) the diagonal matrix with T (a ′ , s) jj = a −1/2 j (for j < n) and the (n, n)-th entry equal to s, we get In view of (8) and the lower bound for P (s 2 1 (G), . . . , s 2 n (G)) ∈ L , the last inequality implies that c ′′ t for a universal constant c ′′ > 0. The first assertion of Proposition 2.1 with m = 4 and the last estimate yield It remains to note that, together with the deterministic relation G −k HS ≥ G −k , the above inequality and (6) imply and the theorem follows.

2.2.
Proof of Proposition 2.1. Note that for any deterministic n × n matrix B = (b ij ) and any k ∈ N, the (i, j)-th entry of B k can be expressed as Here, for k = 1 we assume that [n] 0 consists of a single "empty" index vector α.
Let P = diag(δ 1 , . . . , δ n ) be a random matrix such that δ i are i.i.d. random signs ±1 jointly independent with W . Then P W and W (hence, (T W ) k T and (T P W ) k T ) have the same distribution. Applying (10) to T P W in place of B, we get for any i, j ∈ [n]: with the appropriate modification for the case k = 1. To simplify the formulas, for any m ≥ 1 and any index vector α ∈ [n] m(k−1) we define Note that for m ≥ 1 and any given index vector α ∈ [n] m(k−1) , we have E P m(k−1) ℓ=1 δ α ℓ = 0 if and only if there exists h ∈ [n] such that |{ℓ : α ℓ = h}| is odd. Let Ω m := α ∈ [n] m(k−1) : ∀h ∈ [n], |{ℓ : α ℓ = h}| is even .
Next, observe that for any collection of q non-negative random variables X 1 , . . . , X q with identical distributions we have where we applied the triangle inequality for the L q -norm in the second inequality. Applying this relation to w i,j,α , we get for some C k,m > 0 depending only on k and m, where the last inequality follows by a standard moment estimates for one-dimensional projections of a vector uniformly distributed on S n−1 .
Combining the above estimates, we obtain Finally, for any even m we construct a mapping F m from Ω m to [n] m(k−1)/2 as follows. Take any α ∈ Ω m , and, at zeroth step, set γ := α. At step 1, we set β 1 := γ 1 and update the vector γ by erasing both its first component and the component with the smallest index which is equal to β 1 . Thus, after the first step the vector γ has length m(k − 1) − 2. At the second step, we set β 2 := γ 1 and update γ by erasing γ 1 and the first (other) component equal to β 2 . Thus, the length of γ after the second step is m(k − 1) − 4. The validity of the procedure is guaranteed by the condition α ∈ Ω m . After m(k − 1)/2 steps we obtain a m(k − 1)/2-dimensional vector β = (β 1 , . . . , β m(k−1)/2 ) =: F (α). It is not difficult to see that for every α ∈ Ω m , Therefore, for some C ′ k,m > 0 depending only on k and m, we have giving the first assertion of the proposition. Letting m = 2 and summing up over all i ∈ [n] and j ∈ [n], we obtain HS for some C ′′ k > 0 depending only on k, which gives the second assertion. To prove the third assertion, we will use formula (11), which we will rewrite for i = j, m = 2, and with the matrix T replaced with T (i, s). We get Note that the above expression, viewed as a function of s, is a polynomial of degree 2k + 2, and with the leading coefficient equal to Ew 2k ii = Θ k (n −k ). It follows immediately that on the interval s ∈ [τ i /2, τ i ], the polynomial is at least of order c k n −k τ 2k+2 i on a set of Lebesgue measure τ i /4 (of course, we could write (1 − δ)τ i /2 for any constant δ > 0, at expense of decreasing c k ). The result follows.
2.3. Further remarks. The corresponding problem for non-Gaussian matrices seems to be much more complicated due to the lack of rotational invariance. It is natural to conjecture that for any n × n matrix A with i.i.d. entries equidistributed with a random variable ξ of zero mean and unit variance, c k t ≤ P s min (A k ) ≤ t k n −1/2 ≤ C k t, 2e −c k n ≤ t ≤ 1, where c k , C k > 0 may only depend on k and the distribution of ξ (and not on n).