Estimating the covariance of random matrices

We extend to the matrix setting a recent result of Srivastava-Vershynin about estimating the covariance matrix of a random vector. The result can be in- terpreted as a quantified version of the law of large numbers for positive semi-definite matrices which verify some regularity assumption. Beside giving examples, we dis- cuss the notion of log-concave matrices and give estimates on the smallest and largest eigenvalues of a sum of such matrices.


INTRODUCTION
In recent years, interest in matrix valued random variables gained momentum. Many of the results dealing with real random variables and random vectors were extended to cover random matrices. Concentration inequalities like Bernstein, Hoeffding and others were obtained in the non-commutative setting ( [5], [25], [17]). The methods used were mostly combination of methods from the real/vector case and some matrix inequalities like the Golden-Thompson inequality (see [8]).
Estimating the covariance matrix of a random vector has gained a lot of interest recently. Given a random vector X in R n , the question is to estimate Σ = EXX t . A natural way to do this is to take X 1 , .., X N independent copies of X and try to approximate Σ with the sample covariance matrix Σ N = 1 N i X i X t i . The challenging problem is to find the minimal number of samples needed to estimate Σ. It is known using a result of Rudelson (see [22]) that for general distributions supported on the sphere of radius √ n, it suffices to take cn log(n) samples. But for many distributions, a number proportional to n is sufficient. Using standard arguments, one can verify this for gaussian vectors. It was conjectured by Kannan-Lovasz-Simonovits [14] that the same result holds for log-concave distributions. This problem was solved by Adamczak et al ( [3], [4]). Recently, Srivatava-Vershynin proved in [24] covariance estimate with a number of samples proportional to n, for a larger class of distributions covering the log-concave case. The method used was different from previous work on this field and the main idea was to randomize the sparsification theorem of Batson-Spielman-Srivastava [7].
Our aim in this paper is to adapt the work of Srivastava-Vershynin to the matrix setting replacing the vector X in the problem of the covariance matrix by an n × m random matrix A and try to estimate EAA t by the same techniques. This will be possible since in the deterministic setting, the sparsification theorem of Batson-Spielman-Srivastava [7] 1 has been extended to a matrix setting by De Carli Silva-Harvey-Sato [10] who precisely proved the following: Theorem 1.1. Let B 1 , . . . , B m be positive semi-definite matrices of size n × n and arbitrary rank. Set B := i B i . For any ε ∈ (0, 1), there is a deterministic algorithm to construct a vector y ∈ R m with O(n/ε 2 ) nonzero entries such that y ≥ 0 and For an n × n matrix A, denote by A the operator norm of A seen as an operator on l n 2 . The main idea is to randomize the previous result using the techniques of Srivastava-Vershynin [24]. Our problem can be formulated as follows: Take B a positive semi-definite random matrix of size n × n. How many independent copies of B are needed to approximate EB i.e taking B 1 , .., B N independent copies of B, what is the minimal number of samples needed to make 1 N i B i − EB very small. One can view this as a matrix analogue to the covariance estimate of a random vector by taking for B the matrix AA t where A is an n × m random matrix. Moreover, this problem implies an averaging approximation of covariance matrices of many random vectors. Indeed, let X 1 , .., X m be random vectors in R n and take A ′ the n × m matrix which has X 1 , .., X m as columns.
Therefore, when approximating EB we are approximating the average of the covariance matrices of the random vectors (X j ) j m .
With some regularity, we will be able to take a number of independent copies proportional to the dimension n. However, in the general case this is no longer true. In fact, take B uniformly distributed on {ne i e t i } i n where e j denotes the canonical basis of R n . It is easy to verify that EB = I n and when taking B 1 , .., B N independent copies of B the matrix 1 N i B i is diagonal and its diagonal coefficients are distributed as n N (p 1 , .., p n ), where p i denotes the number of times e i e t i is chosen. This problem is well-studied and it is known (see [15]) that we must take N cn log(n). This example is essentially due to Aubrun [6]. More generally, if B is a positive semi-definite matrix such that EB = I n and Tr(B) n almost surely, then by Rudelson's inequality in the non-commutative setting (see [18]) it is sufficient to take cn log(n) samples.
The method will work properly for a class of matrices satisfying a matrix strong regularity assumption which we denote by (MSR) and can be viewed as an analog to the property (SR) defined in [24].

Definition 1.2. [Property (MSR)]
Let B be an n × n positive semi-definite random matrix such that EB = I n . We will say that B satisfies (MSR) if for some c, η > 0 we have : P( P BP t) c t 1+η ∀t c · rank(P ) and ∀P orthogonal projection of R n .
In the rest of the paper, c will always denote the parameter appearing in this definition while C will be a universal constant which may change from line to line. Also, c(η) will denote a constant depending on c and η which may also change from line to line.
The main result of this paper is the following: Let B be an n × n positive semi-definite random matrix verifying EB = I n and (MSR) for some η > 0. Then for every ε ∈ (0, 1), If X is an isotropic random vector of R n , put B = XX t then P BP = P X 2 2 . Therefore if X verifies the property (SR) appearing in [24], then B verifies property (MSR). So applying Theorem 1.3 to B = XX t , we recover the covariance estimation as stated in [24].
In order to apply our result, beside some examples, we investigate the notion of logconcave matrices in relation with the definition of log-concave vectors. Moreover remarking some strong concentration inequalities satisfied by these matrices we are able, using the ideas developed in the proof of the main theorem, to have some results with high probability rather than only in expectation as is the case in the main result. This will be discussed in the last section of the paper.
The paper is organized as follows: in section 2, we discuss Property (MSR) and give some examples, in section 3 we show how to prove Theorem 1.3 using two other results (Theorem 3.1, Theorem 3.3) which we prove respectively in sections 4 and 5 using again two other results (Theorem 4.1, Theorem 5.1) whose proofs are given respectively in sections 6 and 7. In section 8, we discuss the notion of log-concave matrices and prove some related results.

PROPERTY (MSR) AND EXAMPLES
A random vector X in R l is called isotropic if its covariance matrix is the identity i.e EXX t = Id. In [24], an isotropic random vector X in R l was said to satisfy (SR) if for some c, η > 0, P P X 2 2 t c t 1+η , ∀t c · rank(P ) and ∀P orthogonal projection of R l .
Since P XX t P = P X 2 2 , then clearly B = XX t satisfies (MSR) if and only if X satisfies (SR). Therefore if X verifies the property (SR), applying Theorem 1.3 to B = XX t , we recover the covariance estimate as stated in [24].
Let us note that (MSR) implies moment assumptions on the quadratic forms Bx, x . To see this, first note that if x ∈ S n−1 then Bx, x = P x BP x , where P x is the orthogonal projection on span(x). Now, by integration of tails we have for Moreover, property (MSR) implies regularity assumption on the eigenvalues of the matrix B. Indeed, for any orthogonal projection of rank k one can write where the last equality is given by the Courant-Fisher minimax formula (see [9]). Therefore, property (MSR) implies the following: for some c, η > 0, We may now discuss some examples for applications of the main result. Let us first replace (MSR) with a stronger, but easier to manipulate, property which we denote by (MSR * ). If B is an n × n positive semidefinite random matrix such that EB = I n , we will say that B satisfies (MSR * ) if for some c, η > 0: c t 1+η ∀t c · rank(P ) and ∀P orthogonal projection of R n . Note that since P BP Tr (P BP ) = Tr (P B), then (MSR * ) is clearly stronger than (MSR).

(2 + ε)-moments for the spectrum.
As we mentioned before, (MSR), one can see that it implies regularity assumptions on the eigenvalues of B. Putting some independence in the spectral decomposition of B, we will only need to use the regularity of the eigenvalues. To be more precise, we have the following: Proposition 2.1. Let B = UDU t be the spectral decomposition of an n × n symmetric positive semidefinite random matrix, where U an orthogonal matrix and D is a diagonal matrix whose entries are denoted by (α j ) j n . Suppose that U and D are independent and that (α j ) j n are independent and satisfy the following: Proof. First note that since U and D are independent and Eα i = 1, then EB = I n . Now, (MSR * ) is a rotationally invariant property. Therefore we can assume without loss of generality that U = I n and thus that B = D. Let k > 0 and P be an orthogonal projection of rank k on R n and denote by (p ij ) i,j n the entries of P . Note that Tr (P B) = i n p ii α i , and now using Markov's inequality we have for t > k, Using Rosenthal's inequality (see [21]) we get Taking in account that p ii 1, which implies that for any l 1, i p l ii k, we deduce that Instead of Rosenthal's inequality, we could have used a symmetrization argument alongside Khintchine's inequality to get the estimate above. One can easily conclude that B satisfies (MSR * ) with η = p 2 − 1. Applying Theorem 1.3, we can deduce the following proposition: Proposition 2.2. Let B = UDU t be the spectral decomposition of an n × n symmetric positive semidefinite random matrix, where U an orthogonal matrix and D is a diagonal matrix whose entries are denoted by (α j ) j n . Suppose that U and D are independent and that (α j ) j n are independent and satisfy the following: where B 1 , .., B N are independent copies of B.

From (SR) to (MSR).
We will show how to jump from property (SR) dealing with vectors to the property (MSR * ) dealing with matrices. Proposition 2.3. Let A be an n × m random matrix and denote by (C i ) i m its columns.
is an isotropic random vector in R nm which satisfies property (SR). Then B = AA t verifies EB = I n and Property (MSR * ).
Proof. For l nm, one can write l = (j − 1)n + i with 1 i n, 1 j m, so that the coordinates of A ′ are given by a ′ l = √ ma i,j , and since A ′ is isotropic we get a i,s a j,s . We deduce that Eb i,j = δ i,j and therefore Let P be an orthogonal projection of R n and put P ′ = I m ⊗ P i.e. P ′ is an nm × nm matrix of the form . Let t c · rank(P ) then mt c · rank(P ′ ) and by property (SR) we have This means that P (Tr(P B) t) c (mt) 1+η , and therefore B satisfies (MSR * ).

PROOF OF THEOREM 1.3
Let us first introduce some notations which will be used in the rest of the paper. The set of n × n symmetric matrices is denoted by S n . For X ∈ S n , the notation λ(X) always refers to the eigenvalues of X. For X, Y ∈ S n , X Y means that X − Y is positive semidefinite. The vector space S n can be endowed with the trace inner product ·, · defined by X, Y := Tr(XY ).
Let us now introduce a regularity assumption on the moments which we denote by Note that by a simple integration of tails, (MSR) (with P a rank one projection) implies The proof of Theorem 1.3 is based on two theorems dealing with the smallest and largest eigenvalues of For ε ∈ (0, 1) and N C 2

Remark 3.4. The proof yields a more general estimate; Precisely if h = n N then
Combining this with the previous remark, for any B 1 , ..., B N n × n independent positive semidefinite random matrices verifying EB i = I n and (MSR), we have We will give the proof of these two theorems in sections 4 and 5 respectively. We need also a simple lemma: Lemma 3.5. Let 1 < r 2 and Z 1 , ..., Z N be independent positive random variables with EZ i = 1 and satisfying (EZ r i ) Bernoulli variables. By symmetrization and Jensen's inequality we can write satisfying conditions of Theorem 3.1 (with p = 1 + η 2 ) and Theorem 3.3. Note that by the triangle inequality Observe that Since the two terms in the max are non-negative, then one can bound the max by the sum of the two terms. More precisely, we get α λ max and by Theorem 3.1 and Theorem 3.3 we deduce that Eα 2ε.
Note that Therefore Z i satisfy the conditions of Lemma 3.5 and we deduce that Eβ ε by the choice of N.
As a conclusion Given A an n × n positive semi-definite matrix such that all eigenvalues of A are greater than a lower barrier l A = l i.e A ≻ l.I n , define the corresponding potential function to be φ l (A) = Tr(A − l · I n ) −1 . The proof of Theorem 3.1 is based on the following result which will be proved in section 6: then there exist l ′ a random variable such that Applying Theorem 4.1, one can find l 1 such that El 1 l 0 + 1 − ε Now apply Theorem 4.1 conditionally on A 1 to find l 2 such that and E B 2 l 2 l 1 + 1 − ε. Using Fubini's Theorem we have Taking N = n εφ , we get Eλ min

PROOF OF THEOREM 3.3
Given A an n × n positive semi-definite matrix such that all eigenvalues of A are less than an upper barrier u A = u i.e. A ≺ u · I n , define the corresponding potential function to be The proof of Theorem 3.3 is based on the following result which will be proved in section 7: Let ε ∈ (0, 1), if Proof of Theorem 3.3. Let ψ satisfying the condition of Theorem 5.1. We start with A 0 = 0, u 0 = n ψ so that ψ u 0 (A 0 ) = ψ. Applying Theorem 5.1, one can find u 1 such that Now apply Theorem 5.1 conditionally on A 1 to find u 2 such that Using Fubini's Theorem we have 6. PROOF OF THEOREM 4.1

Notations.
We work under the assumptions of Theorem 4.1. We are looking for a random variable l ′ of the form l + δ where δ is a positive random variable playing the role of the shift.
If in addition A ≻ (l + δ) · I n , we will note ., λ n will denote the eigenvalues of A and v 1 , .., v n the corresponding normalized eigenvectors. Note that (v i ) i n are also the eigenvectors of L −1 δ corresponding to the eigenvalues 1 λ i −(l+δ) .

Finding the shift.
To find sufficient conditions for such δ to exist, we need a matrix extension of Lemma 3.4 in [7] which, up to a minor change, is essentially contained in Lemma 20 in [10] and we formulate it here in Lemma 6.2. This method uses the Sherman-Morrison-Woodbury formula: Lemma 6.1. Let E be an n × n invertible matrix, C a k × k invertible matrix, U an n × k matrix and V a k × n matrix. Then we have: Rearranging the hypothesis, we get φ l+δ (A + B) φ l (A).
Since L −1 δ , B then in order to satisfy conditions of Lemma 6.2, we may search for δ satisfying: and For t 1 φ , let us note : We have already seen in Lemma 6.2 that if t 1 then A ≻ (l + t) · I n so the definitions above make sense. Since we have then in order to have (3), it will be sufficient to choose δ satisfying δ 1 φ and Note that q 1 and q 2 can be expressed as follows: and

Estimating the random shift.
Now that we have found δ, we will estimate Eδ using the property (MW R). We will start by stating some basic facts about q 1 and q 2 . Proposition 6.4. Let as above A ≻ l · I n and φ l (A) φ, B satisfying (MW R) and EB = I n . Then we have the following : Since EB = I n then Eq 1 (0, B) = φ l (A) and Eq 2 (0, B) = 1. Now using the triangle inequality and Property (MW R) we get : With the same argument we prove that Eq 2 (0, B) p C p . The third part of the proposition follows by Markov's inequality.

Lemma 6.5. If δ is as in Lemma 6.3. Then
Proof. Using the above proposition and H ..
older's inequality with 1 p + 1 q = 1 we get : Now it remains to make good choice of s and φ in order to finish the prove Theorem 4.1. Take l ′ = l + δ, the choice of δ being as before with s = ε 4 .
As we have seen, we get A + B ≻ l ′ · I n and φ l ′ (A + B) φ l (A). Moreover, by the assumption on φ. This ends the proof of Theorem 4.1.

Notations.
We work under the assumptions of Theorem 4.1. We are looking for a random variable u ′ of the form u + ∆ where ∆ is a positive random variable playing the role of the shift.
We will denote As before, λ 1 , .., λ n will denote the eigenvalues of A and v 1 , .., v n the corresponding normalized eigenvectors. To find sufficient conditions for such ∆ to exist, we need a matrix extension of Lemma 3.3 in [7] which, up to a minor change, is essentially contained in Lemma 19 in [10]. For the sake of completeness, we include the proof. Lemma 7.1. Let A as above satisfying A ≺ u · I n . Suppose that one can find ∆ > 0 verifying

Estimating ∆ 1 .
We may write Denote P S the orthogonal projection on span ({v i } i∈S ), clearly rank(P S ) = |S|. Then we have : µ is the smallest positive number such that We will need an analog of Lemma 3.5 appearing in [24]. We extend this lemma to a matrix setting: for all subsets S ⊂ [n] and some constants c, η > 0. Consider positive numbers µ i such that Let µ be the minimal positive number such that for some K C = 4c. Then Eµ c(η) K 1+η . Proof. For any j 0, denote Define µ ′ as the minimal positive number such that ∀j 0, and since µ is the minimal positive number satisfying the inequality above, then µ µ ′ . We may now estimate Eµ ′ ; to this aim, we need to look at P {µ ′ t}. For t 0, where the last inequality comes from the fact that ε j (2 j + t) K 4 n j c|I j | and by applying the hypothesis satisfied by the ξ i . Now since ε j

Now by integration we get,
, so that Since we can write First note that P 2 (t, B) can be written as Having this in mind, one can easily check that EP 2 (t, B) = 1 and where for the last inequality, we used the fact that B satisfies (MW R) with p = 1 + 3η 4 . In order to estimate ∆ 2 , we will divide it into two parts as follows: Since ψ u (A) ψ, we have (u−λ i ).ψ 1 ∀i and therefore u+x−λ i (1+xψ)(u−λ i ). This implies that P 2 (x, B) (1 + xψ)P 2 (0, B).

By integration, this implies
.

ISOTROPIC LOG-CONCAVE MATRICES
A natural way to define a log-concave matrix is to ask that it has a log-concave distribution. We will define the isotropic condition as follows: Definition 8.1. Let A be an n × m random matrix and denote by (C i ) i m its columns. We will say that A is an isotropic log-concave matrix if A ′t = √ m(C t 1 , .., C t m ) is an isotropic log-concave random vector in R nm .

Remark 8.2. Let (a i,j ) the entries of A. Saying that A ′ is isotropic means that
This implies that for any n × m matrix M we have

One can view this as an analogue to the isotropic condition in the vector case: in fact if
A = X is a vector (i.e an n × 1 matrix), the above condition would be E X, y X = y for all y ∈ R n , which means that X is isotropic in R n .
In [19] and [20], Paouris established large deviation inequality and small ball probability estimate satisfied by an isotropic log-concave vector. Moreover, Guédon-Milman obtained in [12] what is known as thin-shell estimate for isotropic log-concave vector. We will derive analogue properties for isotropic log-concave matrices using the results above.

Proposition 8.3. Let
A be an n×m isotropic log-concave matrix and denote B = AA t . Then for every orthogonal projection P on R n we have the following large deviation estimate for Tr(P B) and a small ball probability estimate Moreover, we also have a thin-shell estimate Proof. Let P be an orthogonal projection on R n and denote P ′ = I m ⊗ P . As we have seen before Tr(P B) = P A 2 HS = 1 m P ′ A ′ 2 2 and rank(P ′ ) = m.rank(P ). Since P ′ A ′ is an isotropic log-concave vector, then using the large deviation inequality [19] satisfied by P ′ A ′ , we have Let t rank(P ) and write u = t.m. Since u m.rank(P ) = rank(P ′ ) we have which gives the large deviation estimate stated above.
For the small ball probability estimate, we apply Paouris result [20] to P ′ A ′ : Writing this in terms of B and P , we easily get the conclusion. Using the thin-shell estimate obtained in [12] and following the same procedure as above, we get the last part of the proposition. [11] that s-concave random vectors satisfy a thin-shell concentration similar to the log-concave one. Therefore, all results of this section extend to the case of s-concave random matrices.

Remark 8.4. Recently, it was shown in
In [24], it was shown that an isotropic log-concave vector satisfies (SR) and we showed in Proposition 2.3 how to pass from (SR) to (MSR * ). Therefore, we may apply Theorem 1.3 to log-concave matrices and get the following: Proposition 8.5. Let A be an n × m isotropic log-concave matrix. Then B = AA t satisfies (MSR). Moreover ∀ε ∈ (0, 1), taking N > c(ε)n independent copies of B we have Proof. Note first that since A is isotropic in the sense of definition 8.1, then B = AA t satisfies EB = I n . By proposition 8.3, B satisfies P (Tr(P B) c 1 t) exp(− √ tm) ∀t rank(P ) and ∀P orthogonal projection of R n .
and therefore (MSR * ). Applying Theorem 1.3 we deduce the result.

Eigenvalues of the empirical sum of a log-concave matrix.
The concentration inequalities satisfied by log-concave matrices will allow us obtain some results with high probability rather than in expectation as was the case before. Precisely, we can prove the following : Proof. The proof of Theorem 8.6 follows the same ideas developed in the previous sections. Let ε ∈ (0, 1), we only need the following property satisfied by our matrix B = AA t : This is obtained by applying (13) for rank 1 projections and looking only at the large deviation part. Define ∆, ψ and α as follows:

Recall some notations
Define the corresponding potential function when A i ≺ u i .I n . Denote by ℑ i the event Clearly P (ℑ 0 ) = 1. Suppose now that ℑ i is satisfied; as we have seen in Lemma 7.1, the following condition is sufficient for the occurrence of the event ℑ i+1 : Note that where P 2 is defined in (7) but with A i instead of A. Now denoting λ j the eigenvalues of A i and v j the corresponding eigenvectors, taking the probability with respect to B i+1 one can write Keeping in mind that B i are independent, we have shown that Moreover, we have Therefore, Theorem 8.6 follows by the choice of m.
Remark 8.7. Note that in the previous proof, we only used the large deviation inequality given by the thin-shell estimate (13). If one uses the deviation inequality given by (11), then by the same proof it can be proved that with high probability and with similar condition on m. The advantage of using thin-shell is that we can get an estimate close to 1.
By the same techniques, we also get an estimate of the smallest eigenvalue.
Proof. Here we will use the lower and the upper estimate given by thin-shell (13). Applying (13) for rank 1 projections, we have: Define δ, φ and α as follows: Recall some notations the corresponding potential function when A i l i .I n . Note also that δ 1 φ . Denote by ℑ i the event Clearly P (ℑ 0 ) = 1. Suppose now that ℑ i is satisfied, following what was done after Lemma 6.2, condition (4) is sufficient for the occurrence of the event ℑ i+1 : Denoting λ j the eigenvalues of A i and v j the corresponding eigenvectors, taking the probability with respect to B i+1 one can write : Keeping in mind that B i are independent, we have shown that Moreover, we have Therefore, Theorem 8.8 follows by the choice of m.
Remark 8.9. Note that in the previous proof, we used the large deviation inequality alongside the small ball probability estimate given by thin-shell (13). If one uses the deviation inequality given by (11) alongside the small ball probability estimate given by (12), then by the same proof it can be proved that with high probability and with similar condition on m. The advantage of using thin-shell is that we can get an estimate close to 1.
Combining the two previous results, we will be able to obtain, with high probability, a similar result to Proposition 8.5 for log-concave matrices: Proof. Let ε ∈ (0, 1), N 6n ε 2 and suppose that m satisfies the assumption of the theorem. Note that and therefore it is sufficient to apply Theorem 8.6 and Theorem 8.8.

Concrete examples of isotropic log-concave matrices.
For x ∈ R k , we denote by x * the vector with components |x i | arranged in nonincreasing order. Let f : R k −→ R, we say that f is absolutely symmetric if f (x) = f (x * ) for all x ∈ R k . (For example, · p is absolutely symmetric). Define F a function on M n,m by F (A) = f (s 1 (A), .., s k (A)) for A ∈ M n,m and k = min(n, m). It was shown by Lewis [16] that f is absolutely symmetric if and only if F is unitary invariant. Moreover, f is convex if and only if F is convex. Let A be an n × m random matrix whose density with respect to Lebesgue measure is given by G(A) = exp (−f (s 1 (A), .., s k (A))), where f is an absolutely symmetric convex function. By the remark above, G is log-concave. This covers the case of random matrices with density of the form exp (− i V (s i (A))), where V is an increasing convex function on R + . When V (x) = x 2 , this would be the gaussian unitary ensemble GUE. Proposition 8.11. Let A be an n × m random matrix whose density with respect to Lebesgue is given by s 1 (A), ..., s k (A))) , where f is an absolutely symmetric convex function and k = min(n, m). Suppose that E A 2 HS = n, then A is an isotropic log-concave matrix, and n m A t is an m × n isotropic log-concave matrix.
Proof. Since f is an absolutely symmetric convex function, then G is log-concave as we have seen above. It remains to prove the isotropic condition.
Let (a i,j ) be the entries of A. Fix (i, j) and (k, l) two different indices. Note D j = diag(1, .., −1, .., 1) the m × m diagonal matrix where the −1 is on the j th term. Let E (i,k) be the n × n matrix obtained by swapping the i th and k th rows in the identity matrix. Note also F (j,l) the m × m matrix obtained by swapping the j th and l th rows in the identity matrix.
It is easy to see that α := AD j change the j th column of A to its opposite and keep the rest unchanged. Note that AD j has the same singular values as A.
Similarly, β := E (i,k) AF (j,l) permute a i,j with a k,l and keep the other terms unchanged. Note also that E (i,k) AF (j,l) has the same singular values as A.
Finally note that these α and β have a Jacobian equal to 1, and since f is absolutely symmetric, these transformations, which preserve the singular values, don't affect the density. As a conclusion, we have shown that Ea i,j a k,l = 1 m δ (i,j),(k,l) , which means that A is isotropic. where f is an absolutely symmetric convex function, properly normalized as above and k = min(n, m).
Suppose that m