The Sharp Lower Bound of Asymptotic Efficiency of Estimators in the Zone of Moderate Deviation Probabilities

For the zone of moderate deviation probabilities the local asymptotic minimax lower bound of asymptotic efficiency of estimators is established. The estimation parameter is multidimensional. The lower bound admits the interpretation as the lower bound of asymptotic efficiency in confidence estimation.


Introduction
The local asymptotic minimax Theorem [16,18,22,30,31] allows to study the asymptotic efficiency of estimators in the zone of Central Limit Theorem (CLT) approximation. We do not have information that the values of estimators lie in this zone. Therefore the investigation of asymptotic efficiency of estimators in the zones of large and moderate deviation probabilities is interesting as well.
In the zone of large deviation probabilities the analysis of estimator quality is based on the Bahadur asymptotic efficiency (see [3,18,31,25] and references therein). The moderate deviation probabilities of statistics is also the subject of numerous publications (see [5,1,8,15,27,19,20,11,14] and references therein). However their asymptotic efficiency was studied only in [10,26].
The study of Bahadur asymptotic efficiency of estimators is a rather difficult. This problem is often replaced with the study of local Bahadur asymptotic efficiency. The local Bahadur asymptotic efficiency is a particular case of asymptotic efficiency in the moderate deviation zone.
Here we suppose that there exists the finite Fisher information I(θ 0 ). We note that the calculation of moderate deviation probabilities is the simpler problem [11,14] than the calculation of large deviation probabilities. For one-dimensional parameter the sharp lower bound for the asymptotic of moderate deviation probabilities of estimators has been established in [10]. This lower bound represents a version of the local asymptotic minimax Theorem [16,18,22,30,31] in the moderate deviation zone. The goal of this paper is to obtain similar results for the multidimensional parameter. Thus one can say that the local asymptotic minimax Theorem works in a wider zone than the zone of CLT approximation.
The study of large and moderate deviation probabilities of estimators is closely related to the problem of confidence estimation. For the large samples the asymptotic normality of estimators is the key property allowing to construct the confidence sets. The inequalities of the Berry-Esseen type and the Edgeworth expansions (see [13,5,27,15] and references therein) show that the convergence rate to the normal distribution has the order n −1/2 (here n is a sample size). The coverage errors α of confidence sets have usually small values (α = 0.1; 0.05; 0.01 are the standard values in practice). For such a slow rate of convergence and for such small values of α the implementation of normal approximation requires additional arguments if the sample sizes is several hundreds observations or smaller.
Thus the problem of asymptotic efficiency of estimators in large and moderate deviation zones can be considered as the problem of asymptotic efficiency of confidence estimation.
The variances of estimators are usually unknown. Therefore it is natural to determine the lower bounds of asymptotic efficiency for the pivotal statistics [21,17]. This is also the goal of the paper.
We make use of the letters C and c as generic notation for positive constants. Denote χ(A) the indicator of set A, [a] -the integral part of a. For any u, v ∈ R d denote u ′ v the inner product of u, v and u ′ the transposed vector of u. For positive sequences a n denote a n ≍ b n , if c < a n /b n < C, and denote a n >>> b n if a n /b n → ∞ as n → ∞. For any set of events B ... denote A ... the complementary event to B ... . For any set D ⊂ R d denote ∂D the boundary of D.
We make the following assumptions.
The constants C in (2.2-2.5) do not depend on θ, θ + u ∈ Θ. We say that a set Ω ⊂ R d is central-symmetric if x ∈ Ω implies −x ∈ Ω. We make also the following assumptions Assumption 2.3. The set Ω is convex and central-symmetric.
The risk asymptotic is depend on the geometry of the set Then for any estimatorθ n =θ n (X 1 , . . . , X n ) we have with C n → ∞ as n → ∞.
If b n = n −1/2 , Theorem 2.1 is a particular case of the Local Asymptotic Minimax Theorem [16,18,22,30,31]. Wolfowitz [32] was the first who pointed out the relationship between the lower bounds of (2.6)-type and the problem of asymptotic efficiency in the confidence estimation.
The statement (2.6) of Theorem 2.1 contains the infimum over θ 0 ∈ Θ 0 . In the Local Asymptotic Minimax Theorem [16,18,22,30,31] the value of θ 0 is fixed. This Theorem is valid if the finite Fisher information I(θ 0 ) exists at the fixed point θ 0 . The one-dimensional version of Theorem 2.1 was proved also for the fixed point θ 0 (see [10]). The assumptions of one-dimensional version of Theorem 2.1 suppose that the finite Fisher information I(θ 0 ) exists at the fixed point θ 0 and (2.2)-(2.4) hold at the point θ 0 as well. We can prove multidimensional version of Theorem 2.1 for the fixed point θ 0 only if the finite Fisher information I(θ 0 ) exists in some vicinity of the point θ 0 and (2.2)-(2.5) hold uniformly in some vicinity of the point θ 0 .
It suffices to suppose in Theorem 2.1 that assumptions 2.4 and 2.5 hold in some vicinity of the set M .
In confidence estimation the set Ω is usually a ball Ω r having the center zero and the radius r > 0. In this case Theorem 2.1 can be rewritten in a more evident form.

M. Ermakov
Here The assumptions of Theorem 2.1 are rather weak. The sharp asymptotics of moderate deviation probabilities of likelihood ratio were established under the more restrictive assumptions (see [5,7,8,29] and references therein). The proofs of the lower bounds for the moderate deviation probabilities do not require such strong assumptions (see [2,10]) and they are usually proved more easily than the upper bounds.
The assumptions of Theorem 2.1 are different from the traditional assumption of local asymptotic normality [16,18,22,30,31]. Thus Theorem 2.1 could not be straightforwardly extended on the models having this property. At the same time the assumptions 2.1, 2.2 represent a slightly more stable form of usual assumptions arising in the proof of local asymptotic normality. This allows us to make use of the technique arising in the proofs of local asymptotic normality and to get the results similar to (2.6) for other models of estimation. This problem will be considered in the sequel.
For the semiparametric estimation the local asymptotic minimax lower bounds in the zone of moderate deviation probabilities have been established in [12]. In [12] the statistical functionals take the values in R 1 . The results were based on the assumptions that (2.2)-(2.4) hold uniformly for the families of "leastfavourable" distributions. In the case of multidimensional parameter there arises only one additional assumption (2.5). Thus the difference is not very significant.
The lower bound for asymptotic efficiency of the pivotal statistics is given below in Theorem 2.2.
Make the following assumptions.
Assumption 2.11. The principal curvatures at each point of ∂Ω are negative.
Here the sequence a n > 0 is such that a γ n b −λ n → ∞ and nb 2 n a γ n → 0 as n → ∞. In these assumptions we do not suppose that H(θ, ψ) is covariance matrix of limit distribution of n 1/2 (θ n − θ).
Theorem 2.2 is deduced easily from Theorem 2.1 in section 4. The plan of the proof of Theorem 2.1 is the following. In section 3 we outline the basic steps of the proof of Theorem 2.1. After that the proof is given for the set Ω with the most simple geometry. For the set Ω with arbitrary geometry we point out the differences in the proof at the end of section 3. The key Lemmas 3.1, 3.2 are proved in section 5. The proof of Lemma 3.2 is based on new Theorems 5.1 and 5.2 on large deviation probabilities of sums of independent random vectors. The proofs of Theorems 5.1 and 5.2 are given in section 6. Section 7 contains the proofs of technical Lemmas of sections 3 and 5.

Notation
To simplify the notation we suppose that θ 0 equals zero. The estimates of all reminder terms are uniform with respect to θ 0 ∈ Θ 0 . Assume that the matrix I(θ 0 ) is the unit.

Plan of the proof
The reasoning is based on the standard proof of local asymptotic minimax lower bound [16,18,22,30,31]. In particular we make use of the fact that the minimax risk exceeds the Bayes one and study the asymptotic of Bayes risks. However, in this setup, the estimates of residual terms of asymptotics of posterior Bayes risks should have the order o(exp{−cnb 2 n }). This does not allow to implement the technique of local asymptotic normality in the zone |u n | ≤ Cb n of moderate deviation probabilities. This is the basic reason of differences in the proof.
Instead of (3.1) we are compelled to prove that, for any ǫ > 0, there holds where U n is a fairly broad set of parameters. Therefore, the main problem is how to narrow down the set U n .
The following two facts allowed us to solve this problem.
• The normalized values of posterior Bayes risks tend to a constant in probability. • In the zone of moderate deviation probabilities the normal approximation [4,24] holds for the sets of events ψ n ∈ n 1/2 Γ ni where the domain Γ ni has a diameter o(n −1 b −1 n ). Thus we can find the asymptotic of posterior Bayes risks independently for each an event ψ n ∈ n 1/2 Γ ni , summarize them over i and then to get the lower bound. Fixing the set Γ ni allows us to replace the proof of (3.2) with the statement To narrow down the sets U n we define the lattice Λ n in the cube K vn , v n = Cb n and split Λ n into subsets Λ nile . The set Λ nile is the lattice in the union of a finite number of very narrow parallelepipeds K nij whose orientation is given by the position of the set Γ ni relative to θ 0 . The problem of Bayes risk minimization is solved independently for each set Λ nile and the results are added.
Note that the proof of (3.3) with U n = Λ nile is based on the "chaining method" together with the inequality To prove (3.4) we implement simultaneously Chebyshev inequality to the first sum in the left-hand side of (3.4) and Theorem on large deviation probabilities for ψ n . Thus we prove some anisotropic version of Theorem on large deviation probabilities (see Theorem 5.2).

Notation
Denote

M. Ermakov
We split the cube K κvn , 0 < κ < 1 into the small cubes For each x ni , 1 ≤ i ≤ m n we define the partition of the cube K vn on the subsets n → 0 as n → ∞. Let us fix i. Suppose x ni is parallel to e 1 = (1, 0, . . . , 0) ′ . This does not cause serious differences in the reasoning. Denote Π 1 the subspace orthogonal to e 1 . Suppose the points θ nij , 1 ≤ j ≤ m 1ni are chosen so that they form a lattice in The risk asymptotic is depend on the set We begin with the proof of Theorem 2.1 for the two-point case M = {−y, y}, y ∈ ∂Ω. For arbitrary geometry of the set M we are compelled to make use of a rather cumbersome constructions. At the same time the basic part of the proof is the same.

Proof for the simple geometry of the set Ω
In this case the problem of risk minimization on Λ n is reduced to the same problems on the subsets Λ nie . Thus we have Therefore we can minimize the Bayes risk on each subset Λ nie independently and make use of the own linear approximation (3.1) of logarithms of likelihood ratio on each set U n = Λ nie . For the arbitrary geometry of the set M the additional summation over index l, 1 ≤ l ≤ m 3ni caused the different points of M arises in (3.5). Thus the righthand side of (3.5) is the following The definition of the sets Λ nile is akin to Λ nie . The statement (3.5) with the right-hand side (3.6) is the basic difference of the proof for the arbitrary geometry of M . For the completeness of the proof we shall write the index l in the further reasoning. This index should be omitted for the two-point case.
The plan of the further proof is the following. First the basic reasoning will be given. After that we define the partitions of Λ n into the sets Λ nile for the arbitrary geometry of M . The basic reasoning is given on the set of events The definition of the set A 1n is rather cumbersome. To simplify the understanding of the proof we have postponed the definition of the set A 1n to the end of section.

Constructions for the arbitrary geometry of the set Ω
Let us allocate in M connectivity components M 1 , . . . , M s1 having the greatest dimension. These components define the asymptotic of lower bound of risks.
Define the linear manifold N having the smallest dimension d 1 such thatM ⊂ N . Define in R d the coordinate system, such that N is induced the first d 1 coordinates. Denote e 1 , . . . , e d the vectors of the coordinate system. Denote For each y nij ∈Ỹ ni we set z nij ∈ b nM such that Define the setZ ni = {z : z = z nij , y nij ∈Ỹ ni }. Denote m 4ni the number of points ofZ ni . We splitZ ni into subsets of pointsZ nil = {z nil1 , . . . , z nild1 }, 1 ≤ l ≤ m 3ni such that the vectors z nil1 , . . . , z nild1 induce N . Note that t < d 1 points could not enter in these partitions since m 4ni may not be a multiple of d 1 . However their exception is not essential for the further reasoning. Moreover, for the existence of such a partition we may have to define different constants c 3n in the definition of different sets K nij . However, this does not affect significantly on the subsequent proof and we omit the reasoning.
For each z nile define the point y nile , y nile ∈Ỹ ni such that |y nile − z nile | ≤ c 3n δ 1n .
= m 2nil , 1 ≤ l ≤ m 3ni . This can always be achieved by choosing different constants c 3n defining the sets K nij . Denotē It will be convenient to number the setsK nil (k 1 , . . . , k d−d1 ) denoting them K nil1 , . . . ,K nilm 2nil . Denote . The further proof of Theorem 2.1 follows to the reasoning for the two-point {y, −y} geometry of set M given above.

Definition of the set A 1n and Estimate of P (A 1n )
Now the definition of the set A 1n = A 1nile and the complementary set B 1n = B 1nile = D nile ∪ B 4nile ∪ B 3nile will be given. The definitions of the sets D nile , B 4nile , B 3nile are given bellow.
For all s, Now we define the set B 2nile ⊂ B 4nile . For any θ 1 , θ 2 ∈ Θ denote η s (θ 1 , θ 2 ) = g(X s , θ 1 , θ 2 ) with 1 ≤ s ≤ n. Define the sets of events B 2s (θ 1 , θ 2 ) = {X s : The estimates of P (B 2nile ) are based on the "chaining method". For simplicity we suppose that l n = 2 m . This does not cause serious differences in the reasoning. For each θ ∈ Θ nile we define the sets Ψ j = Ψ j (θ), 1 ≤ j ≤ m of points h k = θ +kδ 1n e 1 , h k ∈ Λ nile , such that |k| is divisible by 2 m−j and is not divisible We say that the points h ∈ Ψ j and h 1 ∈ Ψ j−1 are neighbors if h 1 is the nearest point of Ψ j−1 for h. For any h ∈ Ψ j we denote Π(h) = {h 1 : h 1 ∈ Ψ j−1 and h, h 1 − are neighbors }.
Therefore we can implement similar reasoning and obtain Theorem 2.2 in this case.

Proofs of Lemmas 3.1 and 3.2
We begin with the proof of Lemma 3.2.
Denote µ n the probability measure of Gaussian random vector ζ with E[ζ] = 0 and covariance matrix nI. For any Borel set W denote W δ the δ-vicinity of W, δ > 0.
Theorem 5.1. Let the set W belong to a ball in R d having the radius r = o(ǫ n n 1/2 b n ) where ǫ n → 0 as n → 0. Let nb 2 n → ∞, nb 2+λ The differences in the statements of Theorem 5.1 and Osypov -van Bahr Theorem [4,24] are caused the differences in the assumptions. In [4,24] the results have been proved if E[exp{c|Z|}] < ∞.
Let us check up that the assumptions of Theorem 5.1 are fulfilled for the random vector Z = I −1/2 (θ 0 )τ χ(A 1n1 ).
with 0 < ǫ < 1. Suppose the covariance matrix of random vector Z is positively definite. Let V 1 = (X 1 , Z 1 ), . . . , V n = (X n , Z n ) be independent copies of the random vector V . Let U be a bounded set in R d being a difference of two convex sets.
Then, for all sufficiently large n, we have where ǫ 1n , r n are chosen such that nb 2+λ n c −3 n1 ǫ −2 1n → 0 as n → ∞ and r n > c n1 n −1/2 b −1 n . It is clear that ǫ 1n , r n can be chosen such that ǫ 1n → 0, r n n 1/2 b n → 0 as n → ∞. In the proof of (5.7, 5.8) we suppose that ǫ 1n and r n satisfy these assumptions.
Lemma 5.6. For all θ ∈ Λ nile we have This completes the proof of (5.7). The proof of (5.6) is akin to the proof of (5.7) and is omitted. For the estimates of V 6nh in (5.8) we choose Z = τ and Using the same reasoning as in the proof of (5.7) and Lemmas 5.7, 5.8 given below we get (5.8).
If v h , we have
Since the reasoning is given in a sufficiently small vicinity of point n 1/2 y sj the surface n 1/2 b n ∂Ω admits the approximation in this vicinity by an ellipsoid where −α 2 , . . . , −α d are the principal curvatures of the surface ∂Ω at the point (1, 0, . . . , 0). Thus, in the further reasoning, we can replace the set n 1/2 b n ∂Ω with the ellipsoid. After the shift t = (t 1 , . . . , t d ) the ellipsoid is defined by the equation It intersects the line y = n 1/2 (θ sj + ux ni ), u ∈ R 1 at the point n 1/2 y sj (t) having the coordinates (1)).
Let us consider the case c << |t| << Cn 1/2 b n . Note that, since all the principal curvatures in all points of ∂Ω are negative, we can conclude n 1/2 b n Ω into an ellipsoid passing through the points y nile and −y nile , 1 ≤ e ≤ d 1 and such thatᾱ k < 1, d 1 +1 ≤ k ≤ d. Denote by y sj (t) ∈ (n 1/2 b n ∂Ω−t)∩{y : y = θ sj +x ni u, u ∈ R 1 } and denote byȳ sj (t) ∈ (Ξ − t) ∩ {y : y = θ sj + x ni u, u ∈ R 1 } the point to which the y sj passes at the shift t.
It is easy to see that For the pointsȳ sj (t) we can derive estimates similar to the case |t| < C < ∞ and can get The statement (5.33) implies J(t) > J(0) for c << |t| << Cn 1/2 b n . Finally, after the shift t, |t| ≍ n 1/2 b n one of the points y nile or −y nile , 1 ≤ e ≤ d 1 will be located at a distance of the order n 1/2 b n outside the ellipsoid Ξ and hence outside n 1/2 b n Ω. This implies J(t) > J(0).

Proofs of Theorems 5.1 and 5.2
The proof of Theorem 5.1 contains only some new technical details in comparison with the proof of similar Theorem in [24]. The proof of Theorem 5.2 is based on a fairly new analytical technique (see [6,9]) and is more interesting. Thus we begin with the proof of Theorem 5.2.

Proof of Theorem 5.2
Proof. We begin with auxillary estimates of moments of random variable X and random vector Z. We have Define twice continuously differential functions f 1n : and 0 ≤ f 1n (x) ≤ 1, ∂f1n(x) n . We slightly modify the setup of Theorem 5.2 in the proof. The reasoning will be given for r n = 1. Theorem 5.2 follows from the reasoning if we put r n = c n .
For any γ > 0 denote where the supremum is taken over all distributions of (X, Z) satisfying the assumptions of Theorem 5.2.

Proof.
We have It is clear that ∆ ≤ ∆ 1 + · · · + ∆ n where Using the Taylor expansion of f 1n and f 2n , we get After opening the brackets in the right-hand side of (6.7) it remains to estimate each term independently. The estimates are performed in the same way, using (5.12, 5.13, 5.14, 6.1 -6.5). Therefore, we estimate only three of them. Using (6.4), we get (6.8) The first inequality in (6.8) is due to the following reasoning Using (6.1), we get Using (5.14), we get We begin the proof of Theorem 5.2 with auxilliary estimates. The first one is with ω n = γ n + n 1/2 b n − (n − 1) 1/2 b n−1 . Therefore The further reasoning is based on an induction on n. We take a sufficiently large n = n 0 such that Cn 0 ǫ −2 1n0 c −3 n0,1 b 2+λ n0 < a with aa 0 a 1 < 1. We take C n0 such that Then Suppose Theorem 5.2 was proved for n − 1 ≥ n 0 . Let us prove it for n. We show that where C n = a 0 + C n−1 aa 1 . Then, since C n form geometric progression with exponent aa 0 a 1 < 1, Theorem 5.2 follows from (6.9). Applying (6.6) and the inductive assumption, we get

Proof of Theorem 5.1
Proof. In the proofs of Theorem 5.1 and Osypov Theorem [24] the basic reasoning coincide. The difference is only in the preliminary estimates. On these estimates the basic reasoning are based on.
The considerable part of the subsequent estimates is based on the following Lemma.
Proof. By (2.2) and (2.4), we get Proof. The proof of Lemma 5.2 is based on the following reasoning. Using the Taylor expansion of ξ n , we get where 0 ≤ κ ≤ 1.