Uniform Estimates for Averages of Order Statistics of Matrices

We prove uniform estimates for the expected value of averages of order statistics of matrices in terms of their largest entries. As an application, we obtain similar probabilistic estimates for $\ell_p$ norms via real interpolation.


Introduction and Main Results
Combinatorial and probabilistic inequalities play an important role in a variety of areas of mathematics, especially in Banach space theory. In [8] and [9], S. Kwapień and C. Schütt studied combinatorial expressions involving matrices and obtained inequalities in terms of the average of the largest entries of the matrix. To be more precise, they showed that where s(k) is the k-th largest entry of the matrix a and S n the symmetric group. This estimate seems crucial if one wants to compute the projection constant of symmetric Banach spaces and related invariants. Among other things, the authors obtained estimates for the positive projection constant of finite dimensional Orlicz spaces and estimated the order of the projection constant of the Lorentz spaces ℓ n 2,1 . Also, the symmetric sublattices of ℓ 1 (c 0 ) as well as the finite dimensional symmetric subspaces of ℓ 1 were characterized. Further applications and extensions of (1.1) can be found in [9,16,17,11,13], just to mention a few. The main result of this paper is a generalization of (1.1) in the sense that we study the expected value of averages of higher order statistics of a matrix in a more general setting described below. Our method of proof is purely probabilistic in nature, whereas the proof of (1.1) in [8] uses non-trivial combinatorial arguments.
In what follows, given a finite set G, we denote the normalized counting measure on G by P, i.e., where |·| denotes the cardinality. E will always denote the expectation with respect to the normalized counting measure. Moreover, for a vector x ∈ R n with nonnegative entries, we denote its k-th largest entry by k-max 1≤i≤n x i .
In particular, 1-max 1≤i≤n x i is the maximal value, n-max 1≤i≤n x i the minimal value of x. Our main result is the following: Date: November 26, 2014. Theorem 1.1. Let n, N ∈ N and a ∈ R n×N . Let G be a collection of maps from I = {1, . . . , n} to J = {1, . . . , N } and C G > 0 be a constant only depending on G.
Assume that for all i ∈ I, j ∈ J and all different pairs Then, for any ℓ ≤ n, Observe that estimate (1.1) [8, Theorem 1.1] is a special case of our result with the choice ℓ = 1 and G = S n , and that for ℓ = 1 and G = {1, . . . , n} {1,...,n} we directly obtain [2, Lemma 7]. Note that in this general setting E max 1≤i≤n |a ig(i) | was already studied in [9]. In a slightly different setting, order statistics were considered also in [2,3,4,5,6].
We will now present two natural choices for the set G that appear frequently in the literature (cf. [8,9,16,17,15,13,2,1,7,12]). Example 1. If N = n and G = S n is the group of permutations of the numbers {1, . . . , n}, then and for (i 1 , j 1 ) = (i 2 , j 2 ) and for (i 1 , j 1 ) = (i 2 , j 2 ) This means that C G = 1. Hence, Theorem 1.1 implies Another combinatorial inequality that was obtained in [8, Theorem 1.2] and which turned out to be crucial to study and characterize symmetric subspaces of L 1 (cf. [16,17,13]) states that for all 1 ≤ p ≤ ∞ In Section 5, we will use Theorem 1.1 to generalize this result and show that the lower bound in (1.3) can be naturally derived via real interpolation. The upper bound is quite easily obtained and we just follow [8]. Please note that averages of order statistics of matrices naturally appear, as they are strongly related to the K-functional of the interpolation couple (ℓ 1 , ℓ ∞ ). Again, two typical choices for the set of maps G are S n and {1, . . . , n} {1,...,n} . We will prove the following result: Theorem 1.2. Let n, N ∈ N, a ∈ R n×N , and 1 ≤ p < ∞. Let G be a collection of maps from I = {1, . . . , n} to J = {1, . . . , N } and C G > 0 be a constant only depending on G. Assume that for all i ∈ I, j ∈ J and all different pairs The organization of the paper is as follows. In Section 3, we will prove the lower estimate in (1.2). This is done by reducing the problem to the case of matrices only taking values in {0, 1} and showing the estimate for this subclass of matrices. In Section 4, we establish the upper bound in (1.2) by passing from averages of order statistics to equivalent Orlicz norms and using an extreme point argument. Section 5 contains the proof of Theorem 1.2.

Notation and Preliminaries
Throughout this paper we will use |E| to denote the cardinality of a finite set E. By S n we denote the symmetric group on the set {1, . . . , n}. We will denote by ⌊x⌋ and ⌈x⌉ the largest integer m ≤ x and the smallest integer m ≥ x, respectively.
For an arbitrary matrix a = (a ij ) n,N i,j=1 , we denote by (s(k)) nN k=1 the decreasing rearrangement of (|a ij |) n,N i,j=1 . To avoid confusion, in certain cases we write (s a (k)) nN k=1 to emphasize the underlying matrix a. Please also recall that the Paley-Zygmund inequality for non-negative random variables Z and 0 < θ < 1 states that For example, the classical ℓ p spaces are Orlicz spaces with M (t) = p −1 t p . The closed unit ball of the space ℓ n M will be denoted by B n M . We write ext(B n M ) for the set of extreme points of B n M and s-conv(M ) shall denote the set of points of strict convexity of M . We will make use of the following characterization of extreme points of B n M : For a detailed and thorough introduction to Orlicz spaces we refer the reader to [14] or [10].

The lower bound
In this section we will prove the lower bound in (1.2). We begin by recalling some notation and assumptions given in Theorem 1.1. Let a ∈ R n×N , I = {1, . . . , n}, J = {1, . . . , N }, and G be a collection of maps from I to J. The matrix a will be fixed throughout the entire section. By P we denote the normalized counting measure on G, i.e., P(E) = |E|/|G| for E ⊂ G. We assume a uniform distribution of the random variable g → g(i) for each i ∈ I, i.e., We assume for all different pairs ( with a constant C G ≥ 1 that depends on G, but not on n or N . Without loss of generality, we will assume that a has only non-negative entries. It is enough to show the lower estimate in (1.2) for matrices a that consist of only the ℓN largest entries, while all others are equal to zero. This is because if we change any entry a i0j0 ≤ s(ℓN + 1) by setting a i0j0 = 0, the left hand side in (1.2) remains the same, while k-max 1≤i≤n |a ig(i) | does not increase for any g ∈ G.
3.1. The key ingredients. We will now introduce a bijective function h that determines the ordering of the values of a. The crucial point is that this function does not depend on the actual values of the matrix, but merely on their relative size. So let h : {1, . . . , n · N } → I × J be a bijective function satisfying Observe that there is possibly more than one choice for h, since some of the entries of the matrix a might have the same value. For all j ∈ N, 1 ≤ j ≤ n · N , define the random variable where we identify g with its graph {(i, g(i)) : i ∈ I}. X m counts the number of elements in the path {(i, g(i)) : i ∈ I} that intersect with the positions of the m largest entries of a. As we will see in Subsection 3.2, the random variables X m are strongly related to order statistics. In Lemma 3.1, Lemma 3.2, and Lemma 3.3, we investigate crucial properties of the distribution function of X m .
In particular, Proof. By using the inclusion-exclusion principle, we obtain where the latter inequality is a direct consequence of conditions (i) and (ii) in Theorem 1.1.

Lemma 3.2.
For all m ∈ N, 1 ≤ m ≤ n · N , and all θ ∈ (0, 1), we have Proof. The result follows as a consequence of Paley-Zygmund's inequality (cf. (2.1)). Therefore, we need to compute EX m and EX 2 m . Note that EY j = P(Y j = 1) = 1/N and thus EX m = m j=1 Y j = m/N . Moreover, since Y j = Y 2 j , we have where the latter inequality is a direct consequence of conditions (i) and (ii) in Theorem 1.1. Inserting those estimates in (2.1), we obtain the result.
and for all k, m ∈ N with 2kN ≤ m ≤ nN On the other hand, if m ≥ N/C G , Lemma 3.1 implies Now we prove (3.4). Let k ≤ n/2 and m such that 2kN ≤ m ≤ n · N . Then Lemma 3.2 with θ = 1/2 implies

3.2.
Reduction to two valued matrices. We will now reduce the problem of estimating the expected value of averages of order statistics of general matrices to matrices only taking one value different from zero. To do so, we need some more definitions.
Let A h be the collection of all non-negative real n × N matrices b that satisfy Observe that a m ∈ A h for all 1 ≤ m ≤ n · N . For b ∈ A h and g ∈ G we put Proof. Observe that for every integer k with 1 ≤ k ≤ ℓ, Thus, in order to prove the lemma, it is enough to show that i.e., the assertion of the lemma for m ≤ 2N . Now, let m ≥ 2N + 1 and choose the integer t ≥ 1 such that 2tN + 1 ≤ m ≤ 2(t + 1)N . The sequence k → P(X ℓN ≥ k) is decreasing, hence, noting that t ≤ ℓ,

Then, estimate (3.4) of Lemma 3.3 implies
and the result follows. Proof. Recall that X j (g) = |h({1, . . . , j}) ∩ g|. Hence, for all b ∈ A h , we can write Since a, a ∈ A h , a(h(j)) = s a (j) and a(h(j)) = (ℓN ) −1 ℓN i=1 s a (i) for all j ≤ ℓN , we obtain and where for all 1 ≤ j ≤ ℓN Note  3.3. Conclusion. As we have seen, we can reduce the case of general a to multiples of matrices only taking values zero and one. Before we finally prove the lower bound in the main theorem, we will need another simple lemma.

Now we conclude with
Lemma 3.6. Let b ∈ A h be an (n × N )-matrix consisting of ℓN ones and (n − ℓ)N zeros. Then, for all 1 ≤ k ≤ ℓ/2, Proof. Let k ≤ ℓ/2. Using Lemma 3.2 with θ = 1/2, we obtain Proof of the lower estimate in Theorem 1.1. By Theorem 3.5 we obtain Now take b ∈ A h consisting of ℓN ones and (n − ℓ)N zeros such that Then, by Lemma 3.6 Combining the above estimates yields which is the lower estimate in Theorem 1.1.

The upper bound
We will now prove the upper bound of Theorem 1.1 via an extreme point argument. To do so, we first use the fact that the average of the j ≤ n · N largest entries of a matrix a ∈ R n×N is equivalent to an Orlicz norm a Mj (cf. Lemma 4.1). Then, since the expected value of the average of order statistics defines a norm on R n×N as well, it is enough to prove the upper bound in Theorem 1.1 for the extreme points of B n Mj . Recall that, for a vector (x i ) n i=1 ∈ R n , we denote the decreasing rearrangement of (|x i |) n i=1 by (x * i ) n i=1 . We start with the approximation of sums of decreasing rearrangements of vectors x ∈ R n by equivalent Orlicz norms.
The following result is due to C. Schütt (private communication). With his permission we include it here. Lemma 4.1. Let j ∈ N, 1 ≤ j ≤ n. Then, for all x ∈ R n , we have Proof. Let x ∈ R n . We start with the right hand side inequality. Of course, Hence, for all k ≥ j, Therefore, we obtain Therefore, we have for all α < 1/2 We are now able to prove the upper bound of Theorem 1.1.
Proposition 4.2. Let a ∈ R n×N . Then, for all ℓ ≤ n, Proof. It is sufficient to show (4.2) for all a ∈ ext(B nN M ℓ ). Therefore, by Lemma 2.1 (2), we only need to consider matrices a ∈ R n×N that are of the form  On the other hand, we also have Therefore, which is the upper estimate in Theorem 1.1. Inequalities (3.6) and (4.4) together complete the proof.
5. An application of Theorem 1.1 We now present an application and use Theorem 1.1 to prove Theorem 1.2. The proof uses real interpolation and is, what we find, a natural approach to combinatorial inequalities such as (1.3) that were obtained in [8]. Please notice that [8, Theorem 1.2] is a special case of Theorem 1.2 when G = S n .
Let us first recall some basic notions from interpolation theory. A pair (X 0 , X 1 ) of Banach spaces is called a compatible couple if there is some Hausdorff topological space H, in which each of X 0 and X 1 is continuously embedded. For example, (L 1 , L ∞ ) is a compatible couple, since L 1 and L ∞ are continuously embedded into the space of measurable functions that are finite almost everywhere. Of course, any pair (X, Y ) for which one of the spaces is continuously embedded in the other is a compatible couple.
For a compatible couple (X 0 , X 1 ) (with corresponding Hausdorff space H), we equip X 0 + X 1 with the norm under which this space becomes a Banach space. This definition is independent of the particular space H.
Proof of Theorem 1.2. To show the upper bound we use the same argument as in [8]. For the sake of completeness we include it here. Let a ∈ R n×N and write a = a ′ + a ′′ , where a ′ contains the N largest entries of a and zeros elsewhere, and a ′′ contains s(N +1) . . . , s(nN ) and zeros elsewhere. Then, using triangle and Jensen's inequality, we obtain We will now prove the lower bound. Let 1 ≤ p < ∞ and θ = 1 − 1/p. First, recall that a θ,p = ∞ 0 t −θ K a, t; L = inf a=b+c G b(g) ℓ n 1 + t c(g) ℓ n ∞ dP(g) = G inf a(g)=b(g)+c(g) b(g) ℓ n 1 + t c(g) ℓ n ∞ dP(g).
Hence, we have where c 2 is a positive constant only depending on C G . Taking the p-th root concludes the proof.