A MULTIVARIATE VERSION OF HOEFFDING’S INEQUALITY

In this paper a multivariate version of Hoeﬀding’s inequality is proved about the tail distribution of homogeneous polynomials of Rademacher functions with an optimal constant in the exponent of the upper bound. The proof is based on an estimate about the moments of homogeneous polynomials of Rademacher functions which can be considered as an improvement of Borell’s inequality in a most important special case.

The following result will be proved.
Theorem 1. (The multivariate version of Hoeffding's inequality). The random variable Z defined in formula (2) satisfies the inequality with the constant V defined in (3) and some constants A > 0 depending only on the parameter k in the expression Z.
I make some comments about this result. The condition that the coefficients a(j 1 , . . . , j k ) are symmetric functions of their variables does not mean a real restriction, since by replacing all coefficients a(j 1 , . . . , j k ) by a Sym (j 1 , . . . , j k ) = 1 k! π∈Π k a(j π(1) , . . . , j π(k) ) in formula (2), where Π k denotes the set of all permutations of the set {1, . . . , k} we do not change the random variable Z. Beside this, the above symmetrization of the coefficients in formula (2) decreases the number V introduced in formula (3). The identities EZ = 0, EZ 2 = k!V 2 hold. Thus Theorem 1 yields an estimate on the tail behaviour of a homogeneous polynomial of order k of independent random variables ε 1 , . . . , ε j , P (ε j = 1) = P (ε j = −1) = 1 2 , 1 ≤ j ≤ n, with the help of the variance of this polynomial. Such an estimate may be useful in the study of degenerate U -statistics. Thus for instance in paper [10] a weaker form of Theorem 1 played an important role. In Lemma 2 of that paper such a weaker version of the estimate (4) was proved, where the constant 1 2 in the exponent at its right-hand side was replaced by the number k 2e(k!) 1/k . This estimate, which is a fairly simple consequence of Borell's inequality was satisfactory in that paper. (Borell's inequality together with its relation to the problem of this paper will be discussed in Section 3.) However, the question arose whether it can be improved. In particular, I was interested in the question whether such an estimate holds which a comparison with the Gaussian case suggests. In the case k = 1 it is natural to compare the tail behaviour of Z with that of V η, where η is a random variable with standard normal distribution. Theorem A gives an estimate suggested by such a comparison.
If Z is a homogeneous random polynomial of order k defined in (2), then it is natural to compare its tail distribution with that of V H k (η), where η has standard normal distribution, and H k (·) is the k-th Hermite polynomial with leading coefficient 1. Theorem 1 yields an estimate suggested by such a comparison. The next example shows that this estimate is sharp. It also explains, why it is natural to compare the random variable Z with V H k (η).
For the sake of simplicity let us assume that the random variables ε j , j = 1, . . . , n, in formula (2) are given in the form ε j = h(ζ j ), 1 ≤ j ≤ n, where ζ 1 , . . . , ζ n are independent random variables, uniformly distributed in the interval [0, 1], and h( (Such a representation of the random variables ε j is useful for us, because it enables us to apply the subsequent limit theorem about degenerate U -statistics of iid. random variables with non-atomic distribution.) In this example √ n(n−1)···(n−k+1) k! Z n are degenerate U -statistics with kernel function and a sequence ζ 1 , . . . , ζ n of iid. random variables with uniform distribution on the interval [0, 1]. EZ 2 n = k!V 2 , and a limit theorem about degenerate U -statistics (see e.g. [4]) implies that the random variables Z n converge in distribution to the k-fold Wiener-Itô integral as n → ∞, where W (·) is a Wiener process on the interval [0, 1]. Moreover, the random variable Z (0) has a simpler representation. Namely, by Itô's formula for multiple Wiener-Itô integrals (see e.g. [6]) it can be written in the form where H k (·) is the k-th Hermite polynomial with leading coefficient 1, and η = h(x)W ( dx) is a random variable with standard normal distribution. Simple calculation shows that there are some constants C > 0 and D > 0 such that P (H k (η) > u) ≥ Cu −1/k e −u 2/k /2 if u > D.
(Actually, this estimate is proved in [11].) Hence with some appropriate constants C > 0 and D > 0. This inequality implies that the estimate (4) is essentially sharp. It does not hold with a smaller constant in the exponent at its righthand side; this upper bound can be improved at least with a pre-exponential factor.
Theorem 1 will be proved in Section 2. It is a fairly simple consequence of a good estimate on the moments of the random variable Z formulated in Theorem 2. These moments will be estimated by means of two lemmas. The first of them, Lemma 1, enables us to bound the moments of Z by those of an appropriate polynomial of independent standard Gaussian random variables. There is a diagram formula to calculate the moments of polynomials of Gaussian random variables. This makes the estimation of the moments of Gausian random variables relatively simple. This is done in Lemma 2. Actually it turned out that it is simpler to rewrite these polynomials in the form of a multiple Wiener-Itô integral and to apply the diagram formula for multiple Wiener-Itô integrals. To make the explanation complete I give a more detailed description of the diagram formula at the end of Section 2. In the final part of this work, in Section 3, I try to explain the background of the proof of Theorem 1 in more detail. In particular, I make some comments about the role of the Gaussian bounding of moments in Lemma 1 and compare the moment estimates obtained by means of the method of this paper with the estimates supplied by Borell's inequality.
2 The proof of Theorem 1.
Theorem 1 will be obtained as a consequence of the following Theorem 2.
Theorem 2. The random variable Z defined in formula (2) satisfies the inequality with the constant V defined in formula (3).
Theorem 2 will be proved with the help of two lemmas. To formulate them, first the following random variableZ will be introduced.
where η 1 , . . . , η n are iid. random variables with standard normal distribution, and the numbers a(j 1 , . . . , j k ) agree with those in formula (2). Now we state Lemma 1. The random variables Z andZ defined in formulas (2) and (6) satisfy the inequality and Lemma 2. The random variableZ defined in formula (6) satisfies the inequality with the constant V defined in formula (3).
Theorem 2 is a straightforward consequence of Lemmas 1 and 2. So to get this result it is enough to prove Lemmas 1 and 2.
Proof of Lemma 1. We can write, by carrying out the multiplications in the expressions EZ 2M and EZ 2M , by exploiting the additive and multiplicative properties of the expectation for sums and products of independent random variables together with the identities Eε 2p+1 j = 0 and Eη 2p+1 j = 0 for all p = 0, 1, . . . that and EZ 2M = The coefficients A(·, ·, ·) and B(·, ·, ·) could have been expressed in an explicit form, but we do not need such a formula. What is important for us is that A(·, ·, ·) can be expressed as the sum of certain terms, and B(·, ·, ·) as the sum of the absolute value of the same terms, hence relation (11) holds. (There may be such indices (j 1 , . . . , j l , m 1 , . . . , m l ) for which the sum defining A(·, ·, ·) and B(·, ·, ·) with these indices is empty. The value of an empty sum will be defined as zero. As empty sums appear for some index in (9) and (10) simultaneously, their appearance causes no problem.) Since Eε 2m j ≤ Eη 2m j for all parameters j and m, formulas (9), (10) and (11) imply Lemma 1.
Proof of Lemma 2. I found simpler to construct an appropriate multiple Wiener-Itô integral Z whose distribution agrees with that of the random variableZ defined in (6) and to estimate its moment. To do this, let us consider a white noise W (·) on the unit interval [0, 1], i.e. let us take a set of (jointly) Gaussian random variables W n , js n , and j s = j s for some s = s , 1 ≤ j s ≤ n, 1 ≤ s ≤ k and the k-fold Wiener-Itô integral of this (elementary) function f . (For the definition of Wiener-Itô integrals see e.g. [6] or [8].) Observe that the above defined random variables η 1 , . . . , η n are independent with standard normal distribution. Hence the definition of the Wiener-Itô integral of elementary functions and the definition of the function f imply that the distributions of the random integralZ and of the random variableZ introduced in (6) agree. Beside this, the identity also holds with the number V defined in formula (3). Since the distribution of the random variablesZ andZ agree, formulas (12), (13) together with the following estimate about the moments of Wiener-Itô integrals complete the proof of Lemma 2.
In this estimate a function f of k variables and a σ-finite measure µ on some measurable space (X, X ) are considered which satisfy the inequality with some σ 2 < ∞. The moments of the k-fold Wiener-Itô integral of the function f with respect to a white-noise µ W with reference measure µ satisfy the inequality for all M = 1, 2, . . .. This result can be got relatively simply from the diagram formula for the product of Wiener-Itô integrals, and it is actually proven in Proposition A of paper [11]. It can be obtained as a straightforward consequence of the results in Lemma 7.31 and Theorem 7.33 of the book [7]. For the sake of completeness I explain this result at the end of this section.
After the proof of Theorem 2 with the help of the diagram formula it remained to derive Theorem 1 from it.
Proof of Theorem 1. By the Stirling formula we get from the estimate of Theorem 2 that for any K > √ 2 if M ≥ M 0 (K). Hence the Markov inequality yields the estimate Formula (17) means that relation (4) holds for u ≥ u 0 with the constant A = Ke k . Hence relation (4) holds with a sufficiently large constant A > 0 for all u ≥ 0.

Estimation of the moments of a Wiener-Itô integral by means of the diagram formula.
Let us have m real-valued functions f j (x 1 , . . . , x kj ), 1 ≤ j ≤ m, on a measurable space (X, X , µ) with some σ-finite non-atomic measure µ such that A white noise µ W with reference measure µ can be introduced on (X, X ). It is an ensemble of jointly Gaussian random variables µ W (A) indexed by the measurable sets A ∈ X such that µ(A) < ∞ with the property Eµ W (A) = 0 and Eµ W (A)µ W (B) = µ (A ∩ B). Also the Wiener-Itô integrals of these functions with respect to the white noise µ W can be defined if they satisfy relation (18). The definition of these integrals is rather standard, (see e.g [6] or [8]). First they are In the present paper only this consequence of the diagram formula will be needed, hence only this result will be described. This result will be formulated by means of the notion of (closed) diagrams. The class of closed diagrams will be denoted by Γ = Γ(k 1 , . . . , k m ). A diagram γ ∈ Γ(k 1 , . . . , k m ) consists of vertices of the form (j, l), 1 ≤ j ≤ m, 1 ≤ l ≤ k j , and edges ((j, l), (j , l )), 1 ≤ j, j ≤ m, 1 ≤ l ≤ k j , 1 ≤ l ≤ k j . The set of vertices of the form (j, l) with a fixed number j is called the j-th row of the diagram. All edges ((j, l), (j , l )) of a diagram γ ∈ Γ connect vertices from different rows, i.e. j = j . It is also demanded that from all vertices of a diagram γ there starts exactly one edge. The class Γ(k 1 , . . . , k m ) of (closed) diagrams contains the diagrams γ with the above properties. If j < j for an edge ((j, l), (j , l )) ∈ γ, then (j, l) is called the upper and (j , l ) the lower end point of this edge. Let U (γ) denote the upper and L(γ) the lower end points of a diagram γ ∈ Γ(k 1 , . . . , k m ). Define the function α γ (j, l) = (j, l) if (j, l) is the upper end point and α γ (j, l) = (j , l ) if (j, l) is the lower end point of an edge ((j, l), (j l )) of a diagram γ ∈ Γ(k 1 , . . . , k m ). For the sake of simpler notations let us rewrite the functions f j with reindexed variables in the form f j (x j,1 , . . . , x j,kj ), 1 ≤ j ≤ m, and define the function Define with the help of the functions F and α γ the constants for all γ ∈ Γ(k 1 , . . . , k m ). The expected value of the product of Wiener-Itô integrals k j !J µ,k (f j ), 1 ≤ j ≤ m, can be expressed with the help of the above quantities F γ . The following result holds. F γ with the numbers F γ defined in (19). These numbers satisfy the inequality Let us consider the above result in the special case m = 2M and f j = f for all 1 ≤ j ≤ m with a square integrable function f of k variables. Let Γ(k, M ) denote the class of diagrams Γ(k 1 , . . . , k m ) in this case, and |Γ(k, M )| the number of diagrams it contains. The above result yields the estimate It is not difficult to see that |Γ(k, M )| ≤ 1 · 3 · 5 · · · (2kM − 1). Indeed, if we omit the restriction that the edges of a diagram can connect only vertices from different rows, then the number of diagrams with 2M rows and k vertices in each row equals 1 · 3 · 5 · · · (2kM − 1). Relation (20) together with this observation imply (14). It is also worth mentioning that the estimate (20) is sharp in the following sense. If with some square integrable function f , then relation (20) holds with identity. In this case k!J µ,k (f ) equals const. H k (η) with some standard normal random variable η and the k-th Hermite polynomial H k (·) because of Itô's formula for multiple Wiener-Itô integrals.
3 Some remarks about the results.
The proof of Theorem 1 was based on an estimate of the (high) moments of the homogeneous random polynomial Z of Rademacher functions defined in (2). Although bounds on the tail distribution of sums of independent random variables are generally proved by means of a good estimate on the moment generating function, in the present problem it was more natural to estimate the moments because of the following reason.
As the example discussed in Section 1 shows, if Z is a random polynomial of order k, then the tail distribution P (Z > u) should behave for large numbers u as e −const. u −α(k) with α(k) = 2 k . In the case k ≥ 3 a random variable with such a tail distribution has no finite moment generating function. Hence the estimation of the moment generating function does not work in such cases. On the other hand, a good estimate of the (high) moments of the random variable Z is sufficient to prove Theorem 1. It has to be shown that the high moments of Z are not greater than constant times the appropriate moments of a random variable with tail distribution e −const. u −α(k) . Here the same constant is in the exponent as in the exponent of the upper bound in Theorem 1. Theorem 2 contains a good estimate on all even moments of a homogeneous polynomial of Rademacher functions of order k, and it can be considered as a Gaussian type estimate. (It has the same order as the moments of a k-order Hermite polynomial of a standard normal random variable multiplied with a constant.) The moments of degenerate U -statistics were also studied. Proposition B of paper [11] contains a result in this direction. It turned out that high moments of degenerate U -statistics show a worse behaviour. Only their not too high moments satisfy a good 'Gaussian type' estimate. This difference has a deeper cause. There are degenerate U -statistics which have a relatively bad tail behaviour at high levels. Such examples can be found in Example 2.4 for sums of independent random variables and in Example 4.5 for degenerate U -statistics of order 2 in paper [9]. In such cases much worse moment estimates hold than in Theorem 2. Lemma 1 made possible to reduce the estimation of the moments (and as a consequence the tail of distribution) of a homogeneous polynomial of Rademacher functions to the estimation of the moments of a homogeneous polynomial of Gaussian random variables. This result provided a good tail distribution estimate at all high levels. It can be generalized to other polynomials of independent random variables with good moment behaviour. On the other hand, general U -statistics may have a much worse tail behaviour at high levels than the behaviour suggested by a Gaussian comparison. It would be interesting to get a better understanding about the question when a U -statistic has such a good tail behaviour at all levels which a Gaussian comparison suggests, and when it has a relatively bad tail behaviour at very high level. At any rate, the fact that homogeneous polynomials of Rademacher functions satisfy a good 'Gaussian type' estimate at all levels u > 0 has an important consequence. This property was needed for the application of an important symmetrization argument in paper [10]. This symmetrization argument made possible to get a good estimate on the supremum of degenerate U -statistics also in such cases when other methods do not work. There is another result, called Borell's inequality, which makes possible to bound the high moments, and as a consequence the tail distribution of a homogeneous polynomial of Rademacher functions. Actually, this estimate is a simple consequence of the hypercontractive inequality for Rademacher functions proved by A. Bonami [1] and L. Gross [5] independently of each other. It may be interesting to compare the estimates provided by Borell's inequality with those of the present paper. Borell's inequality, (see e.g. [2]) states the following estimate.
Theorem B. (Borell's inequality). The moments of the random variable Z defined in formula (2) satisfy the inequality Let us apply Borell's inequality with the choice p = 2M and q = 2 for the random variable Z defined in (2). It gives the bound EZ 2M ≤ (2M − 1) kM (EZ 2 ) M ≤ A(k)(2M ) kM (k!) M V 2M with the constant A(k) = e −k/2 . (The expression in the last part of this inequality is slightly larger than the middle term, but this has no importance in the subsequent consideration.) On the other hand, Theorem 2, more precisely its consequence relation (15), yields the bound EZ 2M ≤ K(2M ) kM k e kM V 2M with some appropriate constant K = K(k) > 0 not depending on M . It can be seen that the inequality k e k < k! holds for all integers k ≥ 1. This means that the estimate of the present paper yields a const. · α M -times smaller bound for the moment EZ 2M than the estimate given by Borell's inequality, where α = 1 k! k e k < 1. As a consequence, Borell's inequality can give the right type of estimate for the tail distribution of the random variable Z, but it cannot give the optimal constant in the exponent. In such large deviation type estimates the moment estimates based on the diagram formula seem to work better.