Concentration inequalities for order statistics

This note describes non-asymptotic variance and tail bounds for order statistics of samples of independent identically distributed random variables. Those bounds are checked to be asymptotically tight when the sampling distribution belongs to a maximum domain of attraction. If the sampling distribution has non-decreasing hazard rate (this includes the Gaussian distribution), we derive an exponential Efron-Stein inequality for order statistics: an inequality connecting the logarithmic moment generating function of centered order statistics with exponential moments of Efron-Stein (jackknife) estimates of variance. We use this general connection to derive variance and tail bounds for order statistics of Gaussian sample. Those bounds are not within the scope of the Tsirelson-Ibragimov-Sudakov Gaussian concentration inequality. Proofs are elementary and combine R\'enyi's representation of order statistics and the so-called entropy approach to concentration inequalities popularized by M. Ledoux.


Introduction
The purpose of this note is to develop non-asymptotic variance and tail bounds for order statistics. In the sequel, X 1 , . . . , X n are independent random variables, distributed according to some probability distribution F , and X (1) ≥ X (2) ≥ . . . ≥ X (n) denote the corresponding order statistics (the non-increasing rearrangement of X 1 , . . . , X n ). The cornerstone of Extreme Value Theory (EVT), the Fisher-Tippett-Gnedenko Theorem describes the asymptotic behavior of X (1) after centering and normalization [2]. The median X (⌊n/2⌋) is a widely used location estimator. Its asymptotic properties are well documented (See for example [11] for a review). Much less seems to be available if the sample size n is fixed. The distribution function of X (1) is obviously explicitly known (F n !), but simple and useful variance or tail bounds do not seem to be publicized.
Our main tools will be the Efron-Stein inequalities that assert that the jackknife estimate(s) of the variance of functions of independent random variables are on average upper bounds, and extensions of those inequalities that allow to derive exponential tail bounds (See Theorem 2.1).
We refer to [8,9] and references therein for an account of the interplay between jackknife estimates, order statistics, extreme value theory and statistical inference.
The search for non-asymptotic variance and tail bounds for extreme order statistics is not only motivated by the possible applications of EVT to quantitative risk management, but also by our desire to understand some aspects of the concentration of measure phenomenon [4,7]. Concentration of measure theory tells us that a function of many independent random variables that does not depend too much on any of them is almost constant. The best known results in that field are the Poincaré and Gross logarithmic Sobolev inequalities and the Tsirelson-Ibragimov-Sudakov tail bounds for functions of random Gaussian vectors. If X 1 , . . . , X n are independent standard Gaussian random variables, and f : R n → R is L-Lipschitz, then Z = f (X 1 , . . . , X n ) satisfies Var(Z) ≤ L 2 , log E[exp(λ(Z − EZ))] ≤ λ 2 L 2 2 and P{Z − EZ ≥ t} ≤ exp(− t 2 2L 2 ) . If we apply those bounds to X (1) (resp. to X (⌊n/2⌋) ) that is the maximum (resp. the median) of X 1 , . . . , X n , the Lipschitz constant is (almost surely) L = 1, so Poincaré inequality allows to establish Var(X (1) ) ≤ 1 (resp. Var(X (⌊n/2⌋) ) ≤ 1 ). This easy upper bound is far from being satisfactory, it is well-known in EVT that Var(X (1) ) = O 1 log n and Var(X (⌊n/2⌋) ) = O 1 n . Naive use of off-the-shelf concentration bounds does not work when handling maxima or order statistics at large. This situation is not uncommon. The analysis of the largest eigenvalue of random matrices from the Gaussian Unitary Ensemble (GUE) [5] provides a setting where the derivation of sharp concentration inequalities require ingenuity and combining concentration/hypercontractivity with special representations.
Our purpose is to show that the tools and methods used to investigate the concentration of measure phenomenon are relevant to the analysis of order statistics. When properly combined with Rényi's representation for order statistics (see Theorem 2.5) the so-called entropy method developed and popularized by Ledoux [4] allows to recover sharp variance and tail bounds. Proofs are elementary and parallel the approach followed in [5] in a much more sophisticated setting: whereas Ledoux builds on the determinantal structure of the joint density of the eigenvalues of random matrices from the GUE to upper bound tail bounds by sums of Gaussian integrals that can be handled by concentration/hypercontractivity arguments, in the sequel, we build on Rényi's representation of order statistics: X (1) , . . . , X (n) can be represented as the image of the order statistics of a sample of the exponential distribution by a monotone function. The order statistics of an exponential sample turn out to be represented as partial sums of independent random variables.
In Section 2, using Efron-Stein inequalities and modified logarithmic Sobolev inequalities, we derive simple relations between the variance or the entropy of order statistics X (k) and moments of spacings ∆ k = X (k) − X (k+1) . When the sampling distribution has non-decreasing hazard rate (a condition that is satisfied by Gaussian, exponential, Gumbel, logistic distributions, etc, see 2.6 for a definition), we are able to build on the connection between the fluctuations of order statistics X (k) and spacings. Combining Proposition 2.3 and Rényi's representation for order statistics, we connect the variance and the logarithmic moment generating function of X (k) with moments of spacings, Theorem 2.9 may be considered as an exponential Efron-Stein inequality for order statistics.
In the framework of EVT, those relations are checked to be asymptotically tight (see Section 3).
In Section 4, using explicit bounds on the Gaussian hazard rate, we derive Bernstein-like inequalities for the maximum and the median of a sample of independent Gaussian random variables with a correct variance and scale factors (Proposition 4.6). We provide non-asymptotic variance bounds for order statistics of Gaussian samples with the right order of magnitude in Propositions 4.2, and 4.4.

Order statistics and spacings
Efron-Stein inequalities ( [3]) allow us to derive upper bounds on the variance of functions of independent random variables.
The quantity n i=1 (Z − Z i ) 2 is called a jackknife estimate of variance. Efron-Stein inequalities form a special case of a more general collection of inequalities that encompasses the so-called modified logarithmic Sobolev inequalities [4]. Henceforth, the entropy of a non-negative random variable X is defined by Ent[X] = E[X log X] − EX log EX. The next inequality from [6] has been used to derive a variety of concentration inequalities [1].
Then for any λ ∈ R, and for all λ ∈ R, For all n/2 < k ≤ n, and for all λ ∈ R, Proof. Let Z = X (k) and for k ≤ n/2 define Z i as the rank k statistic from sub- For k ≤ n/2, define Z and Z i as before, apply Theorem 2.2: The proof of the last statement proceeds by the same argument.

Proposition 2.3 can be fruitfully complemented by Rényi's representation of order statistics (See [2] and references therein).
In the sequel, if f is a monotone function from (a, b) (where a and b may be [2] for properties of this transformation).
Rényi's representation asserts that the order statistics of a sample of independent exponentially distributed random variables are distributed as partials sums of independent exponentially distributed random variables.
be the order statistics of an independent sample of the exponential distribution, then . , E k are independent and identically distributed standard exponential random variables, and We may readily test the tightness of propositions 2.3. By Theorem 2.5, Hence, for any sequence (k n ) n with lim n k n = ∞, and lim sup k n /n < 1, lim n→∞ k n Var[Y (kn) ] = 1, while by Proposition Observe that if the hazard rate h is non-decreasing, then for all t > 0 and x > 0, ) . Moreover, assuming that the hazard rate is non-decreasing warrants negative association between spacings and related order statistics.
Proposition 2.8. If F has non-decreasing hazard rate, then the k th spacing ∆ k = X (k) − X (k+1) and X (k+1) are negatively associated: for any pair of non-decreasing functions g 1 and g 2 , Proof. Let Y (n) , . . . , Y (1) be the order statistics of an exponential sample. Let E k = Y (k) − Y (k+1) be the k th spacing of the exponential sample. By Theorem 2.5, E k and Y (k+1) are independent. Let g 1 and g 2 be two non-decreasing functions. By Theorem 2.5, The function g 1 • U • exp is non-decreasing. Almost surely, as the conditional distribution of kE k with respect to Y (k+1) is the exponential distribution, As F has non-decreasing hazard rate, is a non-increasing function of Y (k+1) . Hence, by Chebyshev's association inequality, Negative association between order statistics and spacings allows us to establish our main result. Theorem 2.9. Let X 1 , . . . , X n be independently distributed according to F , let X (1) ≥ . . . ≥ X (n) be the order statistics and let ∆ k = X (k) − X (k+1) be the k th spacing. Let V k = k∆ 2 k denote the Efron-Stein estimate of the variance of X (k) (for k = 1, . . . , n/2).
If F has non-decreasing hazard rate h, then for 1 ≤ k ≤ n/2, while for k > n/2, For λ ≥ 0, and 1 ≤ k ≤ n/2, Inequality (2.10) may be considered as an exponential Efron-Stein inequality for order-statistics: it connects the logarithmic moment generating function of the k th order statistic with the exponential moments of the square root of the Efron-Stein estimate of variance k∆ 2 k . This connection provides correct bounds for exponential distribution whereas the exponential Efron-Stein inequality described in [1] does not. This comes from the fact that negative association between spacing and order statistics leads to an easy decoupling argument, there is no need to resort to the variational representation of entropy as in [1]. It is then possible to carry out the so-called Herbst's argument in an effortless way.

Asymptotic assessment
Assessing the quality of the variance bounds from Proposition 2.3 in full generality is not easy. However, Extreme Value Theory (EVT) describes a framework where the Efron-Stein estimates of variance are asymptotically of the right order of magnitude.
Membership in a maximum domain of attraction is characterized by the extended regular variation property of U = (1/(1 − F )) ← : F ∈ MDA(γ) with auxiliary function a iff for all x > 0 where the right-hand-side should be read as log x when γ = 0 [2]. Using Theorem 2.1.1 and Theorem 5.3.1 from [2], and performing simple calculus, we readily obtain For γ = 0, the last expression should be read as π 2 /6.
The asymptotic ratio between the Efron-Stein upper bound and the variance of X (1) converges toward a limit that depends only on γ (for γ = 0 this limit is 12/π 2 ≈ 1.21). When the tail index γ < 0, the asymptotic ratio degrades as γ → −∞, it scales like −4γ.

Order statistics of Gaussian samples
We now turn to the Gaussian setting. We will establish Bernstein inequalities for order statistics of absolute values of independent Gaussian random variables.
A real-valued random variable X is said to be sub-gamma on the right tail with variance factor v and scale parameter c if for every λ such that 0 < λ < 1/c .
Such a random variable satisfies a so-called Bernstein-inequality: for t > 0, P X ≥ EX + √ 2vt + ct ≤ exp (−t) . A real-valued random variable X is said to be sub-gamma on the left tail with variance factor v and scale parameter c, if −X is sub-gamma on the right tail with variance factor v and scale parameter c. A Gamma random variable with shape parameter p and scale parameter c (expectation pc and variance pc 2 ) is sub-gamma on the right tail with variance factor pc 2 and scale factor c while it is sub-gamma on the left-tail with variance factor pc 2 and scale factor 0. The Gumbel distribution (with distribution function exp(− exp(−x)) is sub-gamma on the right-tail with variance factor π 2 /6 and scale factor 1, it is sub-gamma on the left-tail with scale factor 0 (note that this statement is not sharp, see Lemma 4.3 below).
Order statistics of Gaussian samples provide an interesting playground for assessing Theorem 2.9. Let Φ and φ denote respectively the standard Gaussian distribution function and density. Throughout this section, let U : ]1, ∞) → [0, ∞) be defined by U (t) = Φ ← (1 − 1/(2t)) , U (t) is the 1 − 1/t quantile of the distribution of the absolute value of a standard Gaussian random variable, or the 1 − 1/(2t) quantile of the Gaussian distribution.
. By Theorem 5.3.1 from [2], lim n 2 log n Var[X (1) ] = π 2 /6, while the above described upper bound on Var[X (1) ] is equivalent to (8/ log 2)/ log n. If lim n k n = ∞ while lim n k n /n = 0, Smirnov's lemma [2] implies that lim n k( U (n/k)) 2 Var[X (k) ] = 1. For the asymptotically normal median of absolute values, lim n (4φ( U (2)) 2 n) Var[X (n/2) ] = 1 [11]. Again, the bound in Proposition 4.2 has the correct order of magnitude. Lemma 4.3. Let Y (k) be the k th order statistics of a sample of n independent exponential random variables, let log 2 < z < log(n/k), then since the right-hand-side of the first line is the probability that a binomial random variable with parameters n and ke z n is less than k, which is sub-gamma on the left-tail with variance factor less than ke z and scale factor 0.
Our next goal is to establish that the order statistics of absolute values of independent Gaussian random variables are sub-gamma on the right-tail with variance factor close to the Efron-Stein estimates of variance derived in Proposition 4.1 and scale factor not larger than the square root of the Efron-Stein estimate of variance.
Before describing the consequences of Theorem 2.9, it is interesting to look at what can be obtained from Rényi's representation and exponential inequalities for sums of Gamma-distributed random variables.
This inequality looks like what we are looking for: U (n)(X (1) − EX (1) ) converges in distribution, but also in quadratic mean, or even according to the Orlicz norm defined by x → exp(|x|) − 1, toward a centered Gumbel distribution. The centered Gumbel distribution is sub-gamma on the right tail with variance factor π 2 /6 and scale factor 1, we expect X (1) to satisfy a Bernstein inequality with variance factor of order 1/ U(n) 2 and scale factor 1/ U (n). Up to the shift δ n , this is the content of the proposition. Note that the shift is asymptotically negligible with respect to the typical order of magnitude of the fluctuations. The constants in the next proposition are not sharp enough to make the next proposition competitive with Proposition 4.4. Nevertheless it illustrates that Proposition 2.9 captures the correct order of growth for the right-tail of Gaussian maxima. Proposition 4.5. For n such that the solution v n of equation 16/x + log(1 + 2/x + 4 log(4/x)) = log(2n) is smaller than 1, for all 0 ≤ λ < 1 √ vn , For all t > 0, Proof. By Proposition 2.9, , with E 1 is exponentially distributed and independent of Y (2) which is distributed like the 2 nd largest order statistics of an exponential sample. On the one hand, the conditional expectation is a non-increasing function of Y (2) . The maximum is achieved for Y (2) = 0, and it is equal to : On the other hand, by Proposition 4.1, .
Letting τ = log n − log(1 + 2λ 2 + 4 log(4/v n )), We may also use Theorem 2.9 to provide a Bernstein inequality for the median of absolute values of a Gaussian sample. We assume n/2 is an integer.
Var(X (k) ) = E Var X (k) | σ(N ) Conditionally on N = m, if k ≤ m, X (k) is distributed as the k th order statistic of a sample of m independent absolute values of Gaussian random variables. If k > m, X (k) is negative, its conditional variance is equal to the variance of the statistics of order n − k + 1 in a sample of size n − m. Hence, letting V m (k) denote the k th order statistic of a sample of n independent absolute values of Gaussian random variables.