Exponential inequalities for martingales with applications

The paper is devoted to establishing some general exponential inequalities for supermartingales. The inequalities improve or generalize many exponential inequalities of Bennett, Freedman, de la Pe\~{n}a, Pinelis and van de Geer. Moreover, our concentration inequalities also improve some known inequalities for sums of independent random variables. Applications associated with linear regressions, autoregressive processes and branching processes are provided. In particular, an interesting application of {de la Pe\~{n}a's} inequality to self-normalized deviations is also provided.


Introduction
Assume that we are given a sequence of real supermartingale differences (ξ i , F i ) i=0,...,n defined on some probability space (Ω, F, P), where ξ 0 = 0 and {∅, Ω} = F 0 ⊆ ... ⊆ F n ⊆ F are increasing σ-fields. So we have E(ξ i |F i−1 ) ≤ 0, i = 1, ..., n, by definition. Set Then S = (S k , F k ) k=1,...,n is a supermartingale. Let S and [S] be respectively the quadratic characteristic and the squared variation of the supermartingale S : The following exponential inequality for supermartingales can be found in Freedman [17]. Theorem A. Suppose that ξ i ≤ ǫ for a positive constant ǫ. Then, for all x, v > 0, .
After Freedman's seminal work, many interesting Bernstein type exponential inequalities for martingales have been established. For continuous-time martingales with bounded jumps, Freedman's inequality (3) has been established by Shorack and Wellner [35]. By imposing certain moment conditions, Van de Geer [36] relaxed the condition of Shorack and Wellner and generalized inequality (3) for martingales with non-bounded jumps. Under the following conditional Bernstein condition: for a positive constant ǫ, De la Peña [9] have obtained the following Bernstein type inequality for martingales, for all x, v > 0, Inequality (6) has also been obtained by Van de Geer [36]. In particular, when (ξ i ) i=1,...,n are independent, the inequalities (5) and (6) reduce respectively to the inequalities of Bennett [2] and Bernstein [6]. Many other generalizations of Freedman's inequality can be found in Haeusler [19], Pinelis [29], Dzhaparidze and van Zanten [14], Delyon [13] and Khan [23]. Following the work of Freedman [17], Shorack and Wellner [35], Van de Geer [36] and De la Peña [9], we develop some new methods, based on changes of probability measure, to establish some general Bernstein type exponential inequalities for supermartingales. The methods are userfriendly and efficient.
If ξ i ≥ −ε, then our result implies that, for all x, v > 0, This inequality is similar to the one of Freedman (3). To highlight the differences between (3) and (9), notice that the conditions ξ i ≤ ε and conditional variance S k in Freedman's inequality (3) are respectively replaced by the condition ξ i ≥ −ε and squared variation [S] k in our inequality (9). Moreover, inequality (9) completes Freedman's inequality (3) by giving an estimation of deviation probabilities on the left side: if the martingale differences (ξ i , F i ) i=1,...,n satisfy ξ i ≤ ε for all i, then, for all x, v > 0, If the martingale differences verifies canonical assumption (which means g(λ) = λ 2 /2 and f (λ) = 0), then (8) implies the following De la Peña inequality [9], for all x, v > 0, Moreover, we find that (11) implies the following self-normalized deviation result associated with independent and symmetric random variables, for all x > 0, If E|ξ i | 3 < ∞, then (8) implies the following Bernstein type inequality, for all x, v, w > 0, where Υ(S k ) = k i=1 E(|ξ i | 3 |F i−1 ); see Corollary 1. Compared to the inequalities (5) and (6), the advantage of the last two inequalities (13) and (14) is that we do not assume the existence of moments of all orders.
In the i.i.d. case, bound (15) significantly improves the large deviation bound (5) on large deviation tail probabilities P(S n ≥ nx) by adding a factor with exponentially decay rate exp{−nc x }, where c x > 0 does not depend on n. In the application of linear regression, we find that such type refinement is useful; see Theorem 3.
, inequality (17) reduces to Theorem 4.2 of Khan [23]. Inequality (17) implies the following result, where V i−1 is not equal to the conditional variance. where Then we show that (18) implies a generalization of Azuma-Hoeffding's inequality for martingales due to Van de Geer [37]. Moreover, we also show that (18) significantly improves some recent inequalities of Bentkus [3] and Pinelis [30,31] by adding an exponential decay factor in the case || n i=1 C 2 i−1 || ∞ < n i=1 ||C 2 i−1 || ∞ ; see (42) and Example 1 for details. We find that such improvements are important in the applications of linear regression model and autoregressive processes; see Remarks 2 and 3.
The paper is organized as follows. We present our theoretical results in Section 2, give the applications of our results in Section 3 and devote to the proofs of our results in Sections 4 -6. The proofs of the theorems and their corollaries are in the same sections.

Main results
Our first result is given under a very general condition.
Next we show that Theorem 1 is very useful to obtain the concentration inequalities for supermartingales. Introducing the third moments of the martingale differences, we have the following Bernstein type inequalities.
Since S k ≤ Υ(S k ), the inequalities (23), (24) and (25) hold true if S k is replaced by Υ(S k ). To the best of our knowledge, such inequalities have not been established for the sums of independent random variables.
Notice that (24) and (25) are respectively the bounds of Bennett and Bernstein. Compared to the conditional Bernstein condition (4), the condition of Corollary 1 does not assume the existence of the moments of all orders.
For supermartingales with differences bounded from below, we still have the following Bernstein type inequality.
Inequality (26) is similar to Freedman's inequality (3). However, there are two differences between (26) and (3). First, we assume ξ i bounded from below instead of ξ i bounded from above. Second, the quadratic characteristic S k in Freedman's inequality is replaced by the squared variation [S] k in our inequality (26). Such inequality could be useful to estimate the tail probabilities when the variances of (ξ i ) do not exist.
Under the conditional Bernstein condition, we have Then, for all x, v > 0, where In the independent case, inequality (29) is known as Bennett's inequality [2]. To highlight how the bound where ψ(t) = t − log(1 + t) is a nonnegative convex function in t ≥ 0. It is easy to see that, in the i.i.d. case with v 2 = nσ 2 1 (or more generally when ǫ v = σ 1 √ n for a constant σ 1 > 0), we have where on tail probabilities P (S n ≥ nx) is strengthened by adding a factor with exponential decay rate exp {−n c x,σ 1 ,ǫ } as n → ∞. Since the conditional Bernstein condition (4) implies condition (27), inequality (28) strengthen De la Peña's inequality (5).
The following result is a Fuk-Nagaev type inequality [18,28] for martingales with conditionally symmetric differences. Its proof is based on the truncation argument on martingale differences.
Then, for all x, y, v > 0 and v 2 ≤ ny 2 , where Inequality (31) is the best that can be obtained from the exponential Markov inequality ..,n are i.i.d. and satisfy the following distribution then the bound (31) equals to inf λ≥0 Ee λ(Sn−x) . In this sense, inequality (31) is a version of Hoeffding's inequality (cf. (2.8) of [21]) for martingales with conditionally symmetric differences. For martingales with bounded conditionally symmetric differences, Sason [34] has obtained (31) under the conditions |ξ i | ≤ y and E(ξ 2 i |F i−1 ) ≤ v 2 /n. He has also obtained (32) under the assumption |ξ i | ≤ y. Thus (31) generalizes the Sason's inequalities under a more general condition.
For the martingales with square integrable differences, several Nagaev type inequalities based on the truncation argument on martingale differences can be found in Haeusler [19] and Courbot [7]. For optimal exponential convergence speed of such type bounds, we refer to Lesigne and Volný [24] and Fan et al. [15,16].
Consider the case that the differences (ξ i , F i ) i=1,...,n are sub-Gaussian. We have the following very general result.
Using Theorem 2, we generalize Azuma-Hoeffding's inequality (cf. [1,21]) to the case that the differences are only bounded from above.
Corollary 5 Assume that U i−1 are nonnegative and F i−1 -measureable random variables. Denote by and, for all x, v > 0, In Notice that if (ξ i ) i=1,...,n are independent and satisfy the conditions ξ i ≤ c i and Eξ 2 i ≥ c 2 i for some constants (c i ) i=1,...,n , then (38) holds with v 2 = n i=1 Eξ 2 i . It is obvious that the Rademacher random variables satisfy this assumption.
For martingale differences (ξ i , F i ) i=1,...,n , inequality (37) generalizes the following inequality due to Van de Geer (cf. Theorem 2.5 of [37] Indeed, since and which together with (37) implies (39). Under the assumption of Corollary 5, Pinelis [30,31] (see also Bentkus [3]) proved the following inequality, for all x > 0, where c is an absolute constant and (37) improves Pinelis's inequality (41) by adding an exponential decay factor of order where To illustrate this factor, consider the following example. For a much more significant improvement, we refer to Remark 2. Example 1 : Assume that (ε i ) i=1,...,n is a sequence of Rademacher random variables, and that N is a random variable independent of (ε i ) i=1,...,n . Set Hence, for even number n, it is easy to see thatv 2 = 1 > 1 2 = n i=1 C 2 i−1 = S n . Then Pinelis's inequality (41) shows that: while our inequality (37) implies that: Thus our inequality (37) improves Pinelis's inequality (41) by adding a factor with the exponential decay rate (1 + x) exp − x 2 2 .

Remark 1 Corollary 5 implies a simple proof of the following self-normalized deviation inequality.
Assume that (ξ i ) i=1,...,n are independent and symmetric. Then, for all x > 0, where by convention 0 0 = 0. A similar result can be found in Hitczenko [20]. Hitczenko has obtained the same upper bound on tail probabilities P S n ≥ x|| [S] n || ∞ . For more precise results, we refer to Wang and Jing [38]. In particular, the Cramér type large deviations have been established by Jing, Shao and Wang [22], without assuming that (ξ i ) i=1,...,n are symmetric (or (ξ i ) i=1,...,n have exponential moments).

Applications to statical estimation
The exponential concentration inequalities for martingales certainly have many applications. McDiarmid [27] and Rio [32] applied such type inequalities to estimate the concentration of separately Lipschhitz functions. Van de Geer [36] found that such inequalities can be used for maximum likelihood estimation for counting processes. Liu and Watbled [26] considered the free energy of directed polymers in a random environment. Dedecker and Fan [12] gave an application of these inequalities to the Wasserstein distance between the empirical measure and the invariant distribution. We refer to Bercu [5] for more interesting applications of the concentration inequalities for martingales.
In the sequel, we discuss how to apply our result to linear regression model, autoregressive processes and branching processes. We find these models in Liptser and Spokoiny [25] and Bercu and Touati [4].
1. Linear regression model. Consider the stochastic linear regression model given, for all 1 ≤ k ≤ n, by where X k , φ k and ε k are the observations, the regression variables and the driven noise, respectively. We assume that (φ k ) is a sequence of independent random variables. We also assume that (ε k ) is a sequence of independent and identically distributed (i.i.d.) random variables, with mean zero and variation σ 2 > 0. Moreover, we suppose that (φ k ) and (ε k ) are independent. Our interest is to estimate the unknown parameter θ. The well-known least-squares estimator θ n is given below When (φ k ) and (ε k ) are sub-Gaussian, exponential inequalities on the convergence of θ n − θ have been established by Bercu and Touati [4]. When (ε k ) are the normal random variables, Liptser and Spokoiny [25] have established the following estimation: for all x ≥ 1, Here, we would like to give a generalization of this inequality. Consider the case that (ε k ) satisfy the Bernstein condition.
If a ≤ |φ k | ≤ b for two positive constants a and b, then the condition of Theorem 1 is satisfied with ǫ 1 = b a √ n . Indeed, it is easy to see that In this case, bound (47) behaviors like exp{−x 2 /2} when x = o( √ n) as n → ∞. When x is large, bound (47) behaviors like exp{−x}. If (ε k ) are bounded from above, we have the following sub-Gaussian tail bound from Corollary 5.
Then, for all x, v ≥ 0, where When the condition of Theorem 5 is verified with α = 2, then (ε i ) are known as sub-Gaussian random variables. It is known that the bounded random variables and the normal random variables are all sub-Gaussian random variables. In particular, if (ε i ) are the standard normal random variables, then bound (49) is valid with α = 2 and c = C(2) = 1/2.
2. Autoregressive processes. The model of autoregressive can be stated as follows: for all 1 ≤ k ≤ n, where (X k ) and (ε k ) are the observation and driven noise, respectively. We assume that (ε k ) is a sequence of i.i.d. centered random variables with variation σ 2 > 0. The process is said to be stable if |θ| ≤ 1, unstable if |θ| = 1 and explosive if |θ| > 1. We can estimate the unknown parameter θ by the least-squares estimator given by, for all n ≥ 1, When X 0 and (ε k ) are the normal random variables, the convergence rate of θ ′ n − θ has been established by Bercu and Touati [4]. Here, we would like to give an almost sure convergence rate of (θ ′ n − θ) n k=1 X 2 k−1 . By an argument similar to that of Theorem 5, we have the following result.

Theorem 6 Assume the condition of Theorem 5. Then bound (49) holds true on the tail probabilities
In particular, if (ε i ) are bounded, then we have Then, for all x, v > 0, where

Remark 3 We can obtain some similar bounds by using Corollary 2.3 or Van de Geer's inequality. However, those bounds are less tight than (53). For instance, by Van de Geer's inequality, we can obtain the bound (53) with a larger
3. Branching processes. Consider the Galton-Watson process stating from X 0 = 1 and given, for all n ≥ 1, by where (Y n,k ) is a sequence i.i.d. and nonnegative integer-valued random variables. The distribution of (Y n,k ), with finite mean m and variance σ 2 , is commonly called the offspring or reproduction distribution. We are interested in the estimation of the offspring mean m. The Lotka-Nagaev estimator is given by Assume X n > 0 a.s. such that the Lotka-Nagaev estimator m n is always well defined. Our goal is to establish exponential inequalities for m n . Denote by Then (m n − m)X n−1 = X n − mX n−1 = X n−1 k=1 ξ n,k .
Thus (m n − m)X n−1 is a sum of independent random variables by given X n−1 . By Corollary 3, we easily obtain the following exponential inequalities. Then, for all x, v > 0, it holds In particular, it implies that, for all x > 0, Since ξ n,k ≥ −m, we have the following one side sub-Gaussian bound by Corollary 5. This bound cannot be obtained from Azuma-Hoefding's inequality, provided that ξ n,k are not bounded from above.
In particular, it implies that, for all x > 0, More precise estimation on the tail probabilities P (|m n − m| ≥ x) , we refer to Bercu and Touati [4]. In particular, Bercu and Touati have established the Bernstein bounds associated with the cumulant generating function of ξ n,k .
Define the conjugate probability measure Denote E λ the expectation with respect to P λ .
Proof of Theorem 1. For any x, v, w > 0, define the stopping time with the convention that min ∅ = 0. Then By the change of measure (54), we deduce, for all x, λ, v, w > 0, where Using Jensen's inequality and the condition E(exp λξ i − g(λ)ξ 2 Thus (55) implies that, for all x, λ, v, w > 0, By the fact S k ≥ x, [S] k ≤ v 2 and k i=1 V i−1 ≤ w on the set {T (x, v, w) = k}, we find that, for all x, λ, v, w > 0, This gives the desired inequalities (21) and (22), and completes the proof of Theorem 1.
Proof of Corollary 1. To prove Corollary 1, we should use the following basic inequality: By the last inequality, it follows that, for all λ > 0, Applying the inequalities (21) and (22) where λ = 2x/(v 2 + √ v 4 + 4wx). By a simple calculation, we find that, for all v, w > 0 and all 0 < λ < 3v 2 w , Thus, for all x, v, w > 0, Combining this inequality with (23), we obtain the desired inequalities (24) and (25) of the corollary.
Since Eξ ≤ 0, it follows that which gives the desired inequality.
Proof of Corollary 4. Assume that (ξ i , F i ) i=1,...,n are conditionally symmetric. For any y > 0, let ..,n is a sequence of bounded and conditionally symmetric martingale differences. Using Taylor's expansion, we obtain the following estimation of the moment generating function of η i , Since Using Theorem 1, we obtain, for all x, v > 0, By some simple calculations, we find that (75) and (76) attain their minimums at λ and λ of Corollary 4, respectively. It is easy to see that Implementing (75) and (76) into (77), we get the desired inequalities (31) and (32).

Proof of Theorem 2 and its corollaries
The proof of Theorem 2 is similar to the argument of Theorem 1.
In the proof of Corollary 5, we shall need the following two lemmas.
Proof of Corollary 5. Inequality (36) follows immediately from Lemma 3. Using Theorem 2, we obtain, for all x, λ, v > 0, Minimizing the right hand side of the last inequality with respect to λ ≥ 0, we easily obtain (37).
Proof of Remark 1. Assume that (ξ i ) i=1,...,n are independent and symmetric. Set Since ξ i is symmetric, then Using the inequality 1 2 (e t + e −t ) ≤ e t 2 /2 , we obtain, for all λ ≥ 0, Since ξ 2 i is measurable with respect to F i−1 , it follows that for all k ∈ [1, n]. By Theorem 2 with V i−1 = ξ 2 i [S]n , it follows that, for all x, λ ≥ 0, The right hand side of the last inequality attends its minimum at λ = x. Substituting λ = x into (82), we easily get (43) of Remark 1.

Proof of Theorems 3 -7
We make use of Corollary 3 to prove Theorem 3. Proof of Theorem 3. From (44) and (45), it is easy to see that For any i = 1, ..., n, set Then (ξ i , F i ) i=1,...,n is a sequence of martingale differences and satisfies Notice that Applying Corollary 3 to (ξ i , F i ) i=1,...,n , we prove the claim of Theorem 3.
Applying Theorem 2 to (ξ i , F i ) i=1,...,n , we obtain, for all x, λ, v ≥ 0, The right hand side of the last inequality takes its minimum at Substituting λ = λ(x) into (88), we obtain the desired inequality.
Applying Corollary 5 to (ξ i , F i ), we obtain the desired inequality.