A note on the approximate admissibility of regularized estimators in the Gaussian sequence model

We study the problem of estimating an unknown vector $\theta$ from an observation $X$ drawn according to the normal distribution with mean $\theta$ and identity covariance matrix under the knowledge that $\theta$ belongs to a known closed convex set $\Theta$. In this general setting, Chatterjee (2014) proved that the natural constrained least squares estimator is"approximately admissible"for every $\Theta$. We extend this result by proving that the same property holds for all convex penalized estimators as well. Moreover, we simplify and shorten the original proof considerably. We also provide explicit upper and lower bounds for the universal constant underlying the notion of approximate admissibility.


Introduction
The Gaussian sequence model is a commonly used model for theoretical investigations in nonparametric and high dimensional statistical problems. Here one Approximate admissibility of regularized estimators 4747 models the data vector X ∈ R n as an observation having the normal distribution with unknown mean θ ∈ R n and identity covariance matrix i.e., X ∼ N (θ, I n ). Often one assumes some structure on the unknown mean θ in the form of a convex constraint. Specifically, it is common to assume that θ ∈ Θ for some closed convex subset Θ of R n . A natural estimator for θ under the constraint θ ∈ Θ is the least squares estimator (LSE) defined as θ(X; Θ) := argmin where · 2 denotes the usual Euclidean norm on R n . It is easy to see that many common estimators in nonparametric and high-dimensional statistics such as shape constrained estimators (see, for example, Groeneboom and Jongbloed (2014)) and those based on constrained LASSO (see, for example, Bühlmann and van De Geer (2011)) are special cases of the LSE (1.1) for various choices of Θ.
In this abstract setting, Chatterjee (2014) asked the following question: Does the estimator θ(X; Θ) satisfy a general optimality property that holds for every closed convex set Θ? This is a non-trivial question; obvious guesses for the optimality property might be admissibility and minimaxity but the LSE does not satisfy either of these for every Θ. Indeed, θ(X; Θ) is not minimax (even up to multiplicative factors that do not depend on the dimension n) when Θ := {α ∈ R n : i<n α 2 i + n −1/2 α 2 n ≤ 1} as noted by Zhang (2013) (a more elaborate counterexample for minimaxity is given in Chatterjee (2014)). Also, θ(X; Θ) is not admissible when Θ = R n where the James-Stein estimator dominates θ(X) = X (see, for example, Lehmann and Casella (1998)). Chatterjee (2014) answered the general optimality question of the constrained LSE in the affirmative by proving that θ(X; Θ) is approximately admissible over Θ for every Θ. The precise statement of Chatterjee's theorem is described below. Let us say that, for a constant C > 0, an estimator d(X) is C-admissible over Θ if for every other estimator d(X), there exists θ ∈ Θ such that (1.2) In words, the above definition means that for every estimator d(X), there exists a point θ ∈ Θ at which the estimator d(X) performs as well as the estimator d(X) up to the multiplicative factor C. Note that the point at which d(X) performs better than d(X) would depend on the estimator d(X) as well as on the constraint set Θ. Essentially an estimator d(X) being C-admissible over Θ means that it is impossible for any estimator to dominate d(X) uniformly over Θ by more than the multiplicative factor C. Chatterjee (2014) proved that there exists a universal constant 0 < C ≤ 1 such that for every n ≥ 1 and closed convex subset Θ ⊆ R n , the LSE θ(X; Θ) is C-admissible for Θ. Theorem 1.1. [Chatterjee (2014)] There exists a universal constant 0 < C ≤ 1 (independent of n and Θ) such that for every n ≥ 1 and closed convex subset Θ ⊆ R n , the least squares estimator θ(X; Θ) is C-admissible over Θ.
Remarkable features of the above theorem are that it is true for every Θ and that the constant C does not depend on n or Θ. We would like to mention here that Theorem 1.1 is a rather difficult result (in Chatterjee's own words, "from a purely mathematical point of view, this is the deepest result of this paper") and the original proof in Chatterjee (2014) is quite complex.
Our paper has the following twin goals: (a) we extend Theorem 1.1 (which only involves constrained estimators) to penalized estimators, which are more commonly used in practice, and (b) we simplify considerably the proof of Theorem 1.1 given in Chatterjee (2014) and our proof is also much more intuitive. To describe our main result, let us first introduce penalized estimators. Given a closed convex set Θ ⊆ R n and a real-valued convex function f on Θ, let Strictly speaking θ(X; Θ, f) is a least squares estimator that is both constrained and penalized. We can of course write it as a pure penalized estimator with the penalty functionf (x) = f (x)+I Θ (x), where I Θ (x) is the indicator function that takes the value 0 when x ∈ Θ and +∞ otherwise. We choose to separate the constraint and penalty as it is more natural for many statistical applications. In doing so, note that we have required that f is real-valued (i.e., f does not take the value +∞) on Θ. For Θ = R n in (1.3), we obtain penalized estimators for which the LASSO is the most common example. For f ≡ 0, we get back the constrained LSEs of (1.1). There are examples where one uses both a non-trivial constraint set Θ and a non-trivial penalty function f (·); for example, in isotonic regression, it is common to use Θ := {α ∈ R n : α 1 ≤ · · · ≤ α n } and f (α) := λ (α n − α 1 ) for some λ ≥ 0. This estimator fits non-decreasing sequences to the data while constraining the range of the estimator so as to prevent the spiking effect that the usual isotonic LSE suffers from; see, for example, Woodroofe and Sun (1993).
Because of the presence of the penalty function f , it is clear that the class of estimators given by θ(X; Θ, f) is larger compared to the class given by the LSEs in (1.1). The main result of our paper is the following.
Theorem 1.2. There exists a universal constant 0 < C ≤ 1 (independent of n, Θ and f ) such that for every n ≥ 1, closed convex set Θ ⊆ R n and real-valued convex function f on Θ, the estimator θ(X; Θ, f) is C-admissible over Θ.
The above theorem generalizes Theorem 1.1 by showing that all estimators θ(X; Θ, f) have the C-admissibility property over Θ for a universal constant C. In words, this means that given any estimator d(X), there exists a point θ ∈ Θ at which the estimator θ(X; Θ, f) performs as well as the estimator d(X) up to the multiplicative factor C. This point θ ∈ Θ would depend on the estimator d(X) as well as on the constraint set Θ and the penalty function f . Remark 1.1 (Generalization to Linear Regression). Because Theorem 1.2 applies to arbitrary closed convex sets Θ ⊆ R n and arbitrary real-valued convex functions f on Θ, Theorem 1.2 can be generalized to deal with prediction error for convex-regularized estimators in linear regression. Indeed, if β satisfies for a real-valued convex function Λ and an n × p design matrix D, then It is now easy to check that is a real-valued convex function on {Dβ : β ∈ R p }. This means therefore that D β is an estimator of the form (1.3) so that Theorem 1.2 applies directly to it. We therefore obtain that the estimator β is C-admissible in terms of prediction error (defined as D β − Dβ 2 2 ) for every choice of the convex regularizer Λ(·). Dealing with estimation error (defined as β − β 2 2 ) in the regression context is more complicated and might involve assumptions on the design matrix D.
Let us conclude this introduction by a brief description of the significance of our main result. The class of estimators θ(X; Θ, f) is used very frequently in applications and, from a practical perspective, the fact that these estimators may be inadmissible might be slightly disconcerting. Our Theorem 1.2 shows that although these estimators may be inadmissible, they are always C-admissible for a universal positive constant C. Informally, this means that it is impossible for other estimators to uniformly dominate these estimators by more than a universal multiplicative constant.

Connections to the normalized minimax risk
There is a restatement of Theorem 1.2 that is illuminating and gives a minimax flavor to Theorem 1.2. Given Θ and f , let us define the normalized minimax risk over Θ by (1.4) where the infimum is over all estimators d(X). Here we use the conventions 0 0 = 1 and a 0 = +∞ for a > 0. Note that R nor (Θ; f ) is defined just like the usual minimax risk over the parameter space Θ except that the risk of every estimator d(X) is rescaled (normalized) by the risk of θ(X; Θ, f). This therefore a reasonable measure of comparison of arbitrary estimators d(X) to our estimator θ(X; Θ, f).
It is clear that R nor (Θ; f ) ≤ 1 as can be seen by bounding the infimum in (1.4) by the term corresponding to d(X) = θ(X; Θ, f). A small value for R nor (Θ; f ) means that there exists an estimator d(X) and some point θ ∈ Θ at which the risk of d(X) is smaller, by a large factor, than the risk of θ(X; Θ, f).
Let us now define a universal constant C * by taking the worst possible value of R nor (Θ; f ) over all possible values of the dimension n, convex constraint set Θ and convex penalty function f . Specifically, let where C n denotes the class of all closed convex subsets of R n and F(Θ) denotes the class of all real-valued convex functions on Θ. Note first that C * is a universal constant and, a priori, it is not clear if C * is zero or strictly positive.
It is now straightforward to verify that Theorem 1.2 is equivalent to the statement that C * is strictly positive. Another contribution of our paper is to provide explicit lower and upper bounds for C * . Theorem 1.3. The universal constant C * satisfies The lower bound of 6.05×10 −6 for C * comes from our argument for the proof of Theorem 1.2. It must be noted here that Chatterjee (2014) does not provide any explicit values for C in his C-admissibility result. Even if the constant C were tracked down in the proof of Chatterjee (2014), it appears that it will be smaller than 6.05 × 10 −6 by several orders of magnitude. The improvement of the lower bound also shows the advantage of using our new arguments in the proof of admissibility.
The upper bound of 1/2 for C * is a consequence of an explicit construction of Θ and f such that θ(X; Θ, f) is uniformly dominated over Θ by a factor of 2 by another estimator. We believe that this example is non-trivial. Please see Section 4 for the proof of Theorem 1.3.
The determination of the exact value of the constant C * is likely to be a very challenging problem, which is left to be a future work. It will also be interesting to develop techniques to accurately bound the quantity R nor (Θ, f) for specific choices of Θ and f .

Proof sketch
As a summary, the contributions of the paper include: (1) a novel and intuitive proof of a generalization of a result of Chatterjee (2014) on C-admissibility in Theorem 1.2; (2) explicit bounds for the worst possible value of the normalized minimax risk C * in Theorem 1.3. In this subsection, we provide an outline of our proofs of these results.
Admissibility results are almost always proved via Bayesian arguments involving priors. Analogous to the notion of C-admissibility, we can define a notion of C-Bayes as follows. For C > 0 and a proper prior w over Θ, we say that an where R Bayes (w) is the Bayes risk with respect to w and the infimum in the definition of R Bayes (w) above is over all estimatorsd.
It is now trivial to see that an estimator d(X) is C-admissible over Θ if it is C-Bayes for some proper prior w supported on Θ. As a result, in order to prove that θ(X; Θ, f) is C-admissible over Θ, it is sufficient to construct a proper prior w on Θ such that θ(X; Θ, f) is C-Bayes with respect to w. We construct such a prior w by modifying the construction of Chatterjee (2014) (which only applied to the LSE) appropriately. The prior w will concentrate in the vicinity of a suitably chosen point θ * ∈ Θ (see the proof of Theorem 1.2 for the exact description of w).
For our chosen prior w, in order to prove that θ(X; Θ, f) is C-Bayes, we need to 2 w(dθ) from above, and 2. bound R Bayes (w) from below and make sure that the two bounds differ only by the multiplicative factor C. More precisely, we shall prove that for a positive constant c . Here θ * is the point near which w concentrates and t θ * is a quantity which controls the risk behavior of θ(X, Θ, f) at θ * (see Section 2 for the precise definition of t θ * ). For the first step above, we need to study the risk properties of the estimator θ(X, Θ, f). This is done in Section 2 For the second step, we apply a recent general Bayes risk lower bound from Chen, Guntuboyina and Zhang (2016). The application of this risk bound shortens the proof considerably. In contrast, Chatterjee (2014) used a bare hands for lower bounding the Bayes risk via "a sequence of relatively complicated technical steps involving concentration inequalities and second moment lower bounds".
For proving Theorem 1.3, we first observe that our proof of Theorem 1.2 also yields the lower bound of 6.05 × 10 −6 for the constant C * . For proving that C * ≤ 1/2, we explicitly construct a convex set Θ over which the normalized minimax risk is arbitrarily close to 1/2 (see Section 4 for details).
The rest of this paper is structured as follows. In Section 2, we describe some results on the risk of the estimator θ(X; Θ, f). These can be seen as an extension of the results of Chatterjee (2014) for penalized estimators. We will also discuss the connection of our risk bounds to a recent work by van de Geer and Wainwright (2015). Section 3 contains the proof of Theorem 1.2 while Section 4 contains the proof of Theorem 1.3. Section 5 contains the proofs for the risk results of penalized estimators from Section 2.

Risk behavior of θ(X; Θ, f)
Throughout this section, we fix a closed convex set Θ in R n and a real-valued convex function f on Θ. The data vector X will be generated according to the normal distribution with mean θ and identity covariance. We study the risk of the estimator θ(X; Θ, f). The main risk result is Theorem 2.1 below which will be used in the proof of Theorem 1.2 to bound the quantity for a suitable prior w.
The basic fact about the estimator θ(X; Θ, f) (proved in Theorem 2.1 below) is that the loss θ(X; Θ, f)−θ 2 is concentrated around a deterministic quantity t θ which depends on θ, the constraint set Θ and the regularizer f . The quantity t θ is defined as the maximizer of the function The quantity m θ (t) can be viewed as an extension of notion of (localized) Gaussian width with the penalty function f (α) (note that X − θ is a standard Gaussian random variable). Indeed, when f ≡ 0, m θ (t) is the Gaussian width of the set {α − θ : α ∈ Θ, α − θ 2 ≤ t}. The existence of t θ as a unique maximizer of G θ (t) over t ∈ [0, ∞) is proved in Lemma 2.2 (see the end of this section). We also note that t θ depends on the choice of the penalty f . Theorem 2.1. Fix θ ∈ Θ and consider the estimator θ(X; Θ, f) constructed from X generated according to the model X ∼ N (θ, I n ). Then for every δ ≥ 0 and When f ≡ 0 i.e., when the estimator θ(X; Θ, f) becomes the LSE over Θ, then the above result has been proved by Chatterjee (2014, Theorem 1.1). Therefore, Theorem 2.1 can be seen as an extension of Chatterjee (2014, Theorem 1.1) to penalized estimators. Muro and van de Geer (2015) also studied concentration for penalized estimators; however their result (see Muro and van de Geer (2015, Theorem 1)) proves concentration for (in our notation) the quantity τ ( θ(X; Θ, f)) := θ(X; Θ, f) − θ 2 2 + 2f ( θ(X; Θ, f)). More recently, van de Geer and Wainwright (2015) studied concentration of the loss of empirical risk minimization estimators in a very general setting. Two of their results are relevant to Theorem 2.1. In van de Geer and Wainwright (2015, Theorem 2.1), it is proved that θ(X; Θ, f) − θ 2 concentrates around its expectation E θ θ(X; Θ, f) − θ 2 at a rate that is faster than that given by Theorem 2.1. However to prove our admissibility result, we require concentration of θ(X; Θ, f) − θ 2 around t θ and not around E θ θ(X; Θ, f) − θ 2 . The relation between t θ and E θ θ(X; Θ, f) − θ 2 is not completely clear although it has been very recently observed in Bellec (2017, Section 5 t θ is bounded from above by a universal positive constant. Another result from van de Geer and Wainwright (2015) that is relevant to us is their Theorem 4.1. However it also gives concentration for the quantity τ ( θ(X; Θ, f)) while we require concentration for θ(X; Θ, f) − θ 2 . It is also worthwhile to note that van de Geer and Wainwright (2015) also studied concentration in models more general than Gaussian sequence models.
We would also like to point out that results similar to Theorem 2.1 (and some parts of Lemma 2.2 below) have also recently appeared in Bellec (2017) (this latter paper appeared two days after our paper on arXiv).
In addition to Theorem 2.1, we shall require some additional facts about t θ and the function m θ . These are summarized in the following result which also includes a statement on the existence and uniqueness of t θ . For the case of the LSE (i.e., when f ≡ 0), the facts stated in the lemma below are observed in Chatterjee (2014) and most of the results in the following lemma are straightforward extensions of the corresponding facts in Chatterjee (2014). Lemma 2.2. Recall the functions G θ (·) and m θ (·) from (2.1) and (2.2) respectively.

Proof of Theorem 1.2
We follow the program outlined in the introduction. For proving that θ(X; Θ, f) is C-admissible for a constant C, it is enough to demonstrate the existence of a prior w on Θ such that θ(X; Θ, f) is C-Bayes with respect to w. As described in the introduction, a key step for proving that θ(X; Θ, f) is C-Bayes involves bounding from below the Bayes risk R Bayes (w) with respect to w. For this purpose, we shall use the following result from Chen, Guntuboyina and Zhang (2016, Corollary 4.4). This result states that the following inequality holds for every prior w on Θ: where I is any nonnegative number satisfying Here P θ denotes the n-dimensional normal distribution with mean θ and identity covariance and the infimum in (3.2) is over all probability measures Q on R n . Also χ 2 (P Q) denotes the chi-square divergence defined as (p 2 /q)dμ − 1 where p and q are densities of P and Q respectively with respect to a common dominating measure μ.
We are now ready to prove Theorem 1.2.
Proof of Theorem 1.2. We break the proof into two separate cases: the case when inf θ∈Θ t θ is strictly smaller than some constant b and the case when inf θ∈Θ t θ is larger than b. The first case is the easy case where we show that θ(X; Θ, f) is C-Bayes with respect to a simple two-point prior via Le Cam's classical two-point testing inequality. The second case is harder where we use a more elaborate prior w together with inequality (3.1) to lower bound R Bayes (w). Easy Case: Here we assume that inf θ∈Θ t θ ≤ b (the precise value of the constant b will be specified later). Choose θ * ∈ Θ such that t θ * ≤ b (note that θ → t θ is continuous from (2.8) and (2.9) and that Θ is closed so that such a θ * exists). Let θ 1 ∈ Θ be any maximizer of θ * − θ 2 as θ varies over {θ ∈ Θ : θ − θ * 2 ≤ 1}. Let w be the uniform prior over the two-point set {θ * , θ 1 }. The Bayes risk with respect to w can be easily bounded by below by Le Cam's inequality (from Le Cam (1973)) which gives where P θ * −P θ1 T V denotes the total variation distance between the probability measures P θ * and P θ1 . Pinsker's inequality (see for example (Tsybakov, 2009, Lemma 2.5)) now implies By the definition of θ 1 , we have θ 1 − θ * 2 ≤ 1. We consider the following two cases by the value of θ 1 − θ * 2 . 1. θ * − θ 1 2 = 1: Here inequality (3.3) gives R Bayes (w) ≥ 1/8. Further, by the assumption t θ * ≤ b and inequality (2.4), we have Moreover, by inequality (2.7) and (2.4), we have Combining the above two inequalities, we deduce that This inequality together with R Bayes (w) ≥ 1/8 allow us to obtain

X. Chen et al.
This means that θ(X; Θ, f) is C-Bayes with respect to w with 2. θ * − θ 1 2 < 1: In this case, γ := diam(Θ) ≤ 2 and θ * − θ 1 2 ≥ γ/2. Inequality (3.3) then gives R Bayes (w) ≥ γ 2 /32. Also for every θ ∈ Θ, we have E θ θ(X; Θ, f) − θ 2 2 ≤ γ 2 (because both θ(X; Θ, f) and θ are constrained to take values in Θ whose diameter is at most γ). These two inequalities imply that which means that θ(X; Θ, f) is C-Bayes with respect to w with C = 1/32. Therefore in this easy case, we have proved that θ(X; Θ, f) is C-Bayes for some C that is atleast the minimum of (3.4) and 1/32. Hard Case: We now work with the situation when inf θ∈Θ t θ > b. We fix a specific θ * ∈ Θ and choose w as a specific prior that is supported on the set for some constant ρ > 0 (to be specified later) which satisfies ρ 2 + 4ρ < 1. More precisely, for a fixed small constant η, let θ * be chosen so that where m θ (·) is defined in (2.2). Let Ψ : R n → Θ be any measurable mapping such that Ψ(z) is a maximizer of z, α − θ * − f (α) as α varies in U (θ * ). Let w be the prior given by the distribution of Ψ(Z) for a standard Gaussian vector Z in R n . Now because of inequalities (2.4) and (2.7), we can write the following for every θ ∈ U (θ * ): where the second inequality above follows from the fact that t θ * ≥ inf θ∈Θ t θ > b.
The goal now is to provide a lower bound for R Bayes (w). We shall use inequality (3.1) for this purpose. Because the prior w is concentrated on the convex set U (θ * ), we can replace the supremum over a ∈ Θ in (3.1) by the supremum over a ∈ U (θ * ). This gives the following lower bound for R Bayes (w): Here P θ is the ndimensional normal distribution with mean zero and identity covariance and the infimum is over all probability measures Q.
To obtain a suitable value for I, we use where, in the last inequality, we used the expression χ 2 (P θ P θ * ) = exp( θ − θ * 2 2 ) − 1 and the fact that θ − θ * 2 ≤ ρt θ * for all θ ∈ U (θ * ). We can therefore take 1 + I to be exp(ρ 2 t 2 θ * ) in (3.8) which gives (3.9) We shall now bound from above (3.10) for a constant β ∈ (0, 1). The goal is to show that the above quantity is smaller than exp(−ρ 2 t 2 θ * )/4. Because w is defined as the distribution of Ψ(Z) which is a maximizer of holds for every measureable subset A of R n . Therefore for every a ∈ U (θ * ), the prior probability w{θ ∈ Θ : θ − a 2 2 ≤ t} is bounded from above by Chen et al. The above probability can be exactly written as P{M 2 + M 3 ≥ M 1 } where Now if γ ≥ 0 is such that EM 1 − EM 2 ≥ γ, then we can write: where the last inequality follows by standard Gaussian concentration and the observation that (a) M 2 , as a function of Z, is Lipschitz with Lipschitz constant √ t, (b) M 3 , as a function of Z, is Lipschitz with Lipschitz constant a − θ * 2 , and (c) M 1 , as a function of Z, is Lipschitz with Lipschitz constant ρt θ * .
We now use the fact that for every a ∈ U (θ * ), the inequality a − θ * 2 ≤ ρt θ * holds to deduce that Here γ is any nonnegative lower bound on EM 1 − EM 2 . To obtain a suitable value of γ, we argue as follows. Observe first that EM 1 = m θ * (ρt θ * ) and EM 2 = m a ( √ t) where m is defined in (2.2). Because θ * is chosen so that inequality (3.6) is satisfied, we have EM 1 = m θ * (ρt θ * ) ≥ m a (ρt a ) − η. Thus We now use inequality (2.10) which states that Now from the expression for t given in (3.10) and inequality (3.12) above, it is clear that √ t ≤ ρt a for every a ∈ U (θ * ). Therefore using concavity of m a (·) (proved in Lemma 2.2) and inequality (2.5), we deduce where the last inequality follows from inequality (3.12) and the expression for t. We therefore take γ to be Inequality (3.11) then gives By a straightforward computation, it can be seen that the right hand side above is strictly smaller than 1 4 exp(−ρ 2 t 2 θ * ) if and only if 1 Now, as a result of the following inequality (note that we are working under the condition inf θ∈Θ t θ > b which implies that t θ * > b): , a sufficient condition for (3.13) is (3.14) We now make the choices: With these, the right hand side of (3.14) can be calculated to be strictly smaller than b 2 so that the condition (3.14) holds because t θ * > b. Therefore we deduce from inequality (3.9) that Combining the above inequality with (3.7), we obtain The constant above (for our choice of ρ, β and b in (3.15)) is at least 6.05×10 −6 . This means therefore that θ(X; Θ, f) is C-Bayes with respect to w with C ≥ 6.05 × 10 −6 in the case when inf θ∈Θ t θ > b = 51.53. It is also easy to check that for b = 51.53, the constant in (3.4) is also at least 6.05 × 10 −6 . Therefore in every case, we have proved the existence of a prior w such that θ(X; Θ, f) is C-Bayes with respect to w for C ≥ 6.05 × 10 −6 . This means that θ(X; Θ, f) is C-admissible for some constant C ≥ 6.05 × 10 −6 . This completes the proof of Theorem 1.2.
We now turn to the proof of (2.4). For convenience, let L := θ(X; Θ, f)−θ 2 . First assume that t θ ≥ 1. Using (2.3), we can write As a result, via the identity EX 2 + = 2 ∞ 0 xP{X ≥ x}dx which holds for every random variable X, we obtain where we have also used the fact that the integral above is at most 21 (as can be verified by numerical computation). Note that the above bound also implies that Thus if L := θ(X; Θ, f) − θ 2 , then the inequality L ≤ (L − t θ ) 2 + + t 2 θ + 2t θ (L − t θ ) + (5.7) together with the above two bounds for E(L − t θ ) 2 + and E θ (L − t θ ) + proves (2.4) in the case when t θ ≥ 1.