Comparison inequalities for suprema of bounded empirical processes

In this Note we provide comparison moment inequalities for suprema of bounded empirical processes. Our methods are only based on a decomposition in martingale and on comparison results concerning martingales proved by Bentkus and Pinelis.


Introduction
Let X 1 , . . ., X n be a sequence of independent random variables valued in some measurable space (X , F) with common distribution P .Let P n denote for every integer n the empirical probability measure P n := n −1 (δ X1 + . . .+ δ Xn ).Let F be a countable class of measurable functions f : X → R such that P (f ) = 0 for all f ∈ F .In this Note, we are concerned with concentration properties around the mean of the random variable Z := sup{nP n (f ) : f ∈ F }, (1.1) when F satisfies a two-sided or a one-sided (from above) boundedness condition.Our approach is based on a decomposition of Z − E[Z] into a sum of martingale increments together with comparison inequalities for martingales with (two-sided or one-sided) bounded increments proved by Bentkus [1] and Pinelis [7].Before going further, let us introduce some notations.Definition 1.1.(i) Let α, β be two reals such that α < β.We say that a random variable θ follows a Bernoulli distribution if it assumes exactly two values and we write θ ∼ B m (α, β) if and E[θ] = m.(ii) For any a ≥ 0, Γ a stands for any centered Gaussian random variable with variance equals to a. (iii) For any a > 0, Π a stands for any Poisson random variable with parameter a.We also denote by Πa the centered Poisson random variable Πa := Π a − a.
Let us introduce the class of convex functions in which the comparison inequalities, stated in this Note, are valid.Definition 1.2.Let k ∈ N * .As usual, we denote by C k the space of k-times continuously differentiable functions from R to R. We define the following class of functions: (1.4) Furthermore, we use the classical notations x + := max(0, x) and x α + = (x + ) α .We now recall the two following comparison results, proved by Bentkus [1] and Pinelis [7], which are the main tools in our proofs.X k be a martingale with respect to a nondecreasing filtration (F k ) such that M 0 = 0, ) and S n := θ 1 + . . .+ θ n be a sum of n independent copies of a random variable θ with distribution B 0 (−s 2 , 1) (defined by (1.2)).Then for any ϕ ∈ G 2 , (ii) Additionally to (1.5), assuming that k+ ] ≤ β k a.s., and β := we have for any ϕ ∈ G 3 , where Γ n(s 2 −β) and Πnβ are independent and respectively defined by Definition 1. 1 (ii)   and (iii).
Remark 1.4.In fact, the results in the original papers are stated in the following slightly smaller class of functions: for some Borel measure µ ≥ 0 on R and all u ∈ R , for α ∈ {2, 3}.The extensions to G α follow from a result of Pinelis [8, Corollary 5.8] (see also [6,Section 2]).
Remark 1.5.From moment comparison inequalities in H α + , such as in the above Proposition, one can derive tail comparison inequalities.We refer the reader to [4,5,6] for the statements of these results and for some more details.
Finally, we use the notations:

Two-sided boundedness condition
Here, by two-sided boundedness condition, we mean that F is a countable class of measurable functions with values in [−a , 1] for some positive real a.Let ψ be the function defined on [0 , 1] by ECP 23 (2018), paper 33.Theorem 2.1.Let F be a countable class of measurable functions from X into [−a , 1] such that P (f ) = 0 for all f ∈ F .Let Z be defined by (1.1).(i) Case a ≥ 1.Let θ be a Bernoulli random variable with distribution B 0 (−a, 1) (defined by (1.2)).Let θ 1 , . . ., θ n be n independent copies of θ and let S n := θ 1 + . . .+ θ n .Then for any function (ii) Case a < 1.
Let ϑ be a Bernoulli random variable with distribution , where ψ is defined by (2.1), and Ē is defined by (1.7).Let ϑ 1 , . . ., ϑ n be n independent copies of ϑ and let  [10]) and so Ē also decreases to 0. This ensures that n → (a + 1) 2 ψ((a + Ē)/(a + 1)) (which is also the variance of ϑ) is nonincreasing and tends to a as n tends to infinity.

Example 2.3 (Set-indexed empirical processes).
Let S be a countable class of measurable sets of X .We consider the class of functions Let p := sup{P (S) : S ∈ S } and we assume that p < 1.Since x → x/(1 − x) is increasing on [0 , 1[, we can apply Theorem 2.1 with a = p/(1 − p).Hence with the notations of Theorem 2.1: , and θ has the distribution B 0 (−p/(1 − p), 1).
). Theorem 3.1 of Rio [9], when applied to Z (see also his Theorem 4.2 (a)), provides a Bennett-type inequality for classes of sets with small measures under P .Precisely the condition is E n + p(1 − E n ) ≤ 1/2.Hence since G 2 contains all increasing exponential functions x → e tx , t > 0, the Case (ii) above completes Rio's result in this situation.

One-sided boundedness condition
Here, by one-sided boundedness condition, we mean that F is a countable class of measurable functions with values in ]−∞ , 1].Let ρ be the function defined on [0 , 1] by (2.3) Theorem 2.4.Let F be a countable class of measurable functions from X into ]−∞ , 1] such that P (f ) = 0 for any f ∈ F .Let Z be defined by (1.1).Define also  (i) Let θ be a Bernoulli random variable with distribution B 0 (−v 2 , 1) (defined by (1.2)).
One can find in Pinelis [6] a thorough study of the comparison between the right-hand sides of (b) and (a ) as well as with other classical bounds.Let us mention here some facts.
(i) If β = v 2 , then the right-hand side of (b) is equal to the right-hand side of (a ).Thus, since G 3 ⊂ G 2 , Inequality (b) is relevant with respect to (a) if and only if β 3 + < v 2 (see also the point (ii) below).

Proofs
The starting point of the proofs is a martingale decomposition of Z which we briefly recall.Firstly by virtue of the monotone convergence theorem, we can suppose that F is a finite class of functions.Set F 0 := {∅, Ω}, F k := σ(X 1 , . . ., X k ) and F k n := σ(X 1 , . . ., X k−1 , X k+1 , . . ., X n ) for all k = 1, . . ., n.Let E k (respectively E k n ) denote the conditional expectation operator associated with F k (resp.F k n ).Set also Let us number the functions of the class F and consider the random variables We notice that ] and let r k be the nonnegative random variable such that The important point is that E k−1 [r k ] is a corrective term which is essentially small.This is the statement of the following lemma: The proof is based on a property of exchangeability of variables, proved in Marchina [3].Since it is the fundamental tool of the paper, we give again the proof for sake of completeness.

Proof of Lemma
We start by the following property of exchangeability of variables.
Proof of Lemma 3.2.By the definition of the random variable τ , for every permutation on n elements σ, τ (X 1 , . . ., X n ) = τ • σ(X 1 , . . ., X n ) almost surely.Applying now this fact to σ = (k j) (the transposition which exchanges k and j), it suffices to use Fubini's theorem (recalling that j ≥ k) to complete the proof.
Hence, We are now in position to prove Theorem 2.1.
Proof of Theorem 2.1.Observe first that ∆ k ≤ 1 by (3.1) and the uniform boundedness condition on F .Then, in view of Proposition 1.3 (i), it remains us to bound up the conditional variance with respect to F k−1 of ∆ k .This is the subject of the following lemma: Lemma 3.3.For any k = 1, . . ., n, (m + a)(1 − m) a.s.
Proof of Lemma 3.3.A classical result due to Hoeffding [2] (see his Inequalities (4.1) and (4.2)) states that any bounded random variable X, such that a ≤ X ≤ b for some reals a and b, satisfies where θ is a Bernoulli random variable with distribution B E[X] (a, b) (defined by (1.2)).
In particular, Var(X) is lower than Var(θ).We apply this result conditionally to F k−1 to the variable ξ k + r k which has its values in [−a , 1] by (3.1).Recalling now (1.3), one immediately obtains Furthermore, since ψ is concave, (3.9) . Therefore, from (3.9) and the fact that (3.1) implies ξ k + r k ≤ 1, we get Therefrom, recalling that r k ≥ 0 and ξ k + r k ≤ 1, we get where the last inequality follows from Lemma 3.1 and ρ is the nondecreasing function defined in (2.3).Since ρ is concave, we complete the proof in the same way as the Case (i) by using Proposition 1.3 (ii) in place of Proposition 1.3 (i).

Remark 2 . 5 .
Inequalities (a) and (b) of Theorem 2.4 are not easy to compare directly.To this end, we start by observing that Inequality (a) yields the following (see [1, Theorem 1.1]): , which combined with (3.3) end the proof of Lemma 3.1.
) which combined with Lemma 3.1 conclude the proof of Lemma 3.3.Let us now complete the proof of Theorem 2.1.Define the function V by V (m) := (m + a)(1 − m) for any m ∈ [0 , E n−k+1 ].