Approximation of the average of some random matrices

Rudelson's theorem states that if a set of unit vectors forms a John decomposition of the identity operator on ${\mathbb R}^d$, then a random sample of $Cd\log d$ of them yields a decomposition of a matrix close to the identity. First, we observe that the same proof yields a more general statement about the average of positive semidefinite matrices. Second, we show that the $\log d$ factor cannot be removed. Then we present a stability version of the statement which extends to non-symmetric matrices, with applications to the study of the Banach--Mazur distance of convex bodies. We show also that in some cases, one needs to take a sample of the vectors of order $d^2$ to approximate the identity.


Introduction
For vectors u, v ∈ R d , their tensor product (or, diadic product) is a linear operator on R d defined as (u ⊗ v)x = u, x v for every x ∈ R d , where u, x denotes the standard inner product.
A random vector v in R d is called isotropic, if Ev ⊗v = I, where E denotes the expectation of a random variable, and I is the identity operator on R d .
According to Rudelson's theorem [Rud99], if we take k independent copies y 1 , . . . , y k of an isotropic random vector y in R d for which |y| 2 ≤ γ almost surely, with A sequence of unit vectors u 1 , . . . , u m in R d is said to yield a John decomposition of I, if 1 d I ∈ conv{u i ⊗ u i : i ∈ [m]}, that is, if there are scalars α 1 , . . . , α m ≥ 0 with m i=1 α i = 1 such that Rudelson's result applies in this setting as well. The coefficients α i define a probability distribution on [m]. Let σ = {i 1 , . . . , i k } be a multiset obtained by k independent draws from [m] according to this distribution, and consider the following average of matrices It follows that, in expectation, this average is not farther than ε from I in the operator norm, provided that k is at least cd ln d ε 2 , where c is some constant. Recently, for the relationship of k and ε, better bounds were achieved in the fundamental papers of Batson, Spielman and Srivastava [BSS14]. Friedland and Youssef [FY17], building on the works of Marcus, Spielman and Srivastava [MSS15], and Srivastava [Sri12], showed that if a sequence of unit vectors u 1 , . . . , u m in R d yield a John decomposition of I, then there is a multi-subset σ of [m] of size |σ| = cd ε 2 with 1 k i∈σ In this note, we first show that Rudelson's result is still relevant, as his proof yields the following more general statement.
Theorem 1.1. Suppose that 0 < ε < 1 is a given number and A ∈ P d is a symmetric positive semidefinite matrix with a = A . Let Q 1 , . . . , Q k be independent random matrices distributed according to (not necessarily identical) probability distributions P 1 , . . . , P k on the set P d of d × d real positive semidefinite matrices such that where c is an absolute constant. Then Moreover, Theorem 1.1, is sharp in the sense that the ln d term cannot be removed.
Theorem 1.2. For any integer d ≥ 8, any 1 ≤ γ, and any 0 < ε < 1 16 , there are positive semidefinite matrices Q 1 , . . . , Q n in P d with I ∈ conv{Q i } and Q i ≤ 2γ for all i ∈ [n] such that, for any non-empty multi-subset σ of [n] of size |σ| ≤ m, where m := γ ⌊log 2 d⌋ 96ε , we have Our second goal is to study extensions of Theorem 1.1 to the case of non-symmetric matrices. First, we outline the geometric motivation behind the study of these questions in linear algebra. F. John's theorem [Joh48], extended by K. Ball [Bal92] (see also [Bal97]) states that for every convex body K in R d , there is a unique ellipsoid of maximal volume contained in K, and this ellipsoid is the o-centered Euclidean unit ball B d 2 if, and only if, there are contact points u 1 , . . . , u m ∈ bd (K) ∩ bd B d 2 such that for some scalars α 1 , . . . , α m > 0, we have (1) and m i=1 α i u i = 0. Gordon, Litvak, Meyer and Pajor [GLMP04] proved (extending similar results from [BR02, GPT01,Lew79], see also [TJ89,Theorem 14.5]) that the maximum volume affine image of any convex body K contained in L also yields a decomposition of the identity similar to John's. In order to state it, we recall some terminology.
The polar of a convex body K in R d is defined as K • = { x : x, y ≤ 1 for every y ∈ K}.
Definition 1.3. Let K and L be convex bodies in R d . We say that K is in John's position in L if K ⊆ L and for some scalars α 1 , . . . , α m > 0 with m i=1 α i = 1, we have Note that if K and L are origin-symmetric and (2) is satisfied for a set of vectors then by including the opposites of the vectors too, (3) is also satisfied.
Theorem 1.4. Let K and L be two convex bodies in R d such that K ⊆ L, and among all affine images of K contained in L, K has maximum volume. Assume also that 0 ∈ int L.
Definition 1.5. Let K be a convex body in R d . We denote the Banach-Mazur distance of K to the Euclidean ball by By John's theorem, r(K) ≤ d for any convex body K in R d , and r(K) ≤ √ d for all centrally-symmetric convex bodies.
Moreover, for the unit balls of ℓ p spaces, we have Theorem 1.6. Let K be a convex body in R d with r(K) ≤ 2 such that the ellipsoid E at which the infimum attaines in (4) is the Euclidean unit ball. Let K be in John's position in L, and let vectors u i and v i for i ∈ [m] satisfy the conditions of Definition 1.3. Then for any 0 < ε < 1 and As an immediate corollary, we obtain a stability version of Rudelson's result, that is, when K is very close to the Euclidean ball, then we can approximate the identity with diads coming from O(d ln d) contact pairs.
Let K be in John's position in L, and let vectors u i and v i for i ∈ [m] satisfy the conditions of Definition 1.3. Then for any ε ∈ (0, 1) and for there is a multiset σ ⊂ [m] of size k such that (5) and (6) hold.
On the other hand, when K is not so close to the Euclidean ball, approximation of I using only a few vector-pairs cannot be guaranteed.
Theorem 1.8. For a positive integer d, ε ∈ (0, 1/2) and δ ∈ 0, d/4 , there is an originsymmetric convex body K contained in the cube [−1, 1] d , the largest volume ellipsoid of which is B d 2 , such that there are contact points of K and [−1, 1] d satisfying (2) with the following property. If M is any subset of the diads appearing in (2) such that some linear combination of elements of M is at distance at most ε from I in the operator norm, then

Symmetric matrices
Let P d denote the cone of positive semi-definite symmetric matrices in R d×d . The Schatten p-norm of a matrix A in P d is defined as is the sequence of eigenvalues of A. We recall that A ≤ A C d p for all p ≥ 1, and we also have (7) A ≤ A C d p ≤ e A for p = ln d, where ln denotes the natural logarithm and e denotes its base.
We state the following inequality due to Lust-Piquard and Pisier [LP86,LPP91], essentially in the form as it appears in the book [Pis98, Theorem 8.4.1].
Note that for any d × d matrix Q, the product Q * Q is positive semidefinite. Since, by Weyl's inequality, the Schatten p-norm is monotone on the cone of positive semidefinite matrices, we may deduce from the theorem of Lust-Piquard the following inequality Lemma 2.2 (Symmetrization by Rademacher variables). Let q 1 , . . . , q k be independent random vectors distributed according to (not necessarily identical) probability distributions P 1 , . . . , P k on a normed space X with Eq i = q for all i ∈ [k]. Then where r = (r 1 , . . . , r k ) are Rademacher variables, that is, random variables uniformly distributed on {1, −1} independent of each other and other random variables.
Proof of Theorem 1.1. Let r = (r 1 , . . . , r k ) be a sequence of k random variables uniformly distributed on {1, −1} independent of each other and of other random variables. Denote by D = 1 k i∈[k] Q i − A, and p = ln d.
we assume that c 0 and c 1 are positive constants. Here, we use Lemma 2.2 in (S) and the inequality (8) in (L-P). The inequality (PSD) relies on the fact that the matrices Q i are positive semidefinite, and (H) follows from Hölder's inequality. Thus, we obtain Therefore, we get E D ≤ α + √ αa, and thus the inequality holds for sufficiently large c (see the definition of k). Theorem 1.2 is proved.

Non-symmetric diads -upper bound
We will show that Theorem 1.6 follows from the following more general result.
Theorem 3.1. Suppose that 0 < ε < 1 is a given number, Q 1 , . . . , Q m and A are square martrices of size d such that Note that U i ≤ γ. Since the U i are positive semidefinite matrices, we can apply Theorem 1.1 and get that E B ≤ ε.
Setting p = ln d, we obtain that Here, (T) and (H) follow from the triangle inequality and Hölder's inequality respectively. Denote by D σ the matrix 1 k j∈σ Q j − A.
Our aim is to show that E σ D σ ≤ ε. Let r 1 , . . . , r m be random variables uniformly distributed on {1, −1} and independent of each other and of other random variables. Next, we have where c 1 is some positive constant. Note that (S) and (L-P) follow from Lemma 2.2 and (8) respectively and the last inequality holds for a sufficiently large constant c. This finishes the proof of Theorem 3.1.

3.2.
Proof of Theorem 1.6. For each i ∈ [m], set Q i = du i ⊗ v i , and use the notation (U i , B, b, γ) of Theorem 3.1. As B d 2 ⊆ K ⊆ r(K)B d 2 , we have 1 ≤ u i ≤ r(K) and 1/r(K) and in particular, we have Note that Similarly, we have On the other hand, where the last summand is equal to the identity operator I. (11) and (12), the formula above yields b ≤ 100d [r(K) − 1] 1/2 + 1, which, combined with (10) yields (5).
To obtain the balancedness bound (6), we form new vectors a i and b i in R d+1 by concatenating v i and u i together with 1/ √ d. It is easy to see, that , and, that Therefore, (5) in R d+1 yields d k i∈σ are submatrices of d k i∈σ a i ⊗ b i , we complete the proof of Theorem 1.6.

The log factor is needed for symmetric matrices
In this section, we prove Theorem 1.2. First, in Lemma 4.1, we show that in ℓ t 1 , a point in the convex hull of other points may not be well approximated in terms of the dimension t. Then, we use the fact that ℓ t 1 embeds isometrically in ℓ d ∞ for d = 2 t , which embeds isometrically in the space of matrices of size d × d.
Proof of Lemma 4.1. Since the i-th coordinate b i of 1 s i∈σ 0 e i /2 is either equal to 0 or at least 1 2s ≥ 1 6k , we have |b i − 1 12k | ≥ 1 12k for every i ∈ [t], which finishes the proof of the lemma.
Without loss of generality, we may assume d = 2 t , where t is a non-negative integer. Indeed, if we prove Theorem 1.2 for d = 2 t , that is, we find proper matrices Q 1 , . . . , Q t , then the matrices Q ′ 1 , . . . , Q ′ m of size d × d with the following properties satisfy the conditions of the theorem provided that t = ⌊log 2 d⌋. The matrix Q i is a diagonal minor of Q ′ i , the complementing minor of Q ′ i is the identity submatrix, and the rest elements of Q i are zeros. Note that it is sufficient to consider multisets σ such that (13) m/2 < |σ| ≤ m.
Indeed, if the theorem is proved for such multisets, then it holds for a multiset σ with |σ| ≤ m/2: the multiset σ ′ consisting of 2 l copies of σ, where l = ⌊log 2 (m/|σ|)⌋, satisfies (13), and thus the statement of the theorem holds for σ ′ . Since σ ′ consists of several copies of σ, we can easily conclude that Theorem 1.2 is true for σ. We identify the space of d × d real diagonal matrices equipped with the operator norm with ℓ d ∞ . Enumerate all ±1 sequences of length t as s 1 , . . . , s d . Clearly, the linear map φ : ℓ t 1 → ℓ d ∞ such that φ(x) = ( x, s 1 , . . . , x, s d ) embeds ℓ t 1 isometrically into ℓ d ∞ which we consider as a subspace of R d×d . Next, we are going to construct the desired matrices Q i . Let k be an integer such that that is k ≥ 1. Using the notation introduced in Lemma 4.1, for every i ∈ [t + 1], put Note that ψ is an affine isometry from ℓ t 1 into ℓ d ∞ . By (14), we have that for every i ∈ [t + 1], and therefore Q i ≤ γ ( I + e i /2 − a 1 ) < 2γ and the matrix Q i is positive definite. Assuming λ i = 1 12k for i ∈ [t] and λ t+1 = 1 − t 12k , we have Since t+1 i=1 λ i = 1 and λ i ≥ 0 for every i ∈ [t + 1], we obtain a ∈ conv e 1 2 , . . . , e t+1 2 . Thus, denoting by Q t+2 the zero matrix, we get λ i e i /2 − φ(a) + I = I, that is, I ∈ conv{Q 1 , . . . , Q t+2 }. To prove the theorem, assume that there is a multiset σ of [t + 2] with (13) such that (15) 1 |σ| i∈σ Q i − I < ε.