On estimating a bounded normal mean with applications to predictive density estimation

: For a normally distributed X ∼ N ( μ,σ 2 ) and for estimating μ when restricted to an interval [ − m,m ] under general loss F ( | d − μ | ) with strictly increasing and absolutely continuous F , we establish the inadmissi- bility of the restricted maximum likelihood estimator δ mle for a large class of F ’s and provide explicit improvements. In particular, we give conditions on F and m for which the Bayes estimator δ BU with respect to the boundary uniform prior π ( − m ) = π ( m ) = 1 / 2 dominates δ mle. Speciﬁc examples include L s loss with s > 1, as well as reﬂected normal loss. Connections and implications for predictive density estimation are outlined, and numerical evaluations illustrate the results.


Introduction
For a normally distributed observable X ∼ N (μ, σ 2 ) we consider the restricted parameter space point estimation problem with μ ∈ Θ(m) = {μ ∈ R : |μ| ≤ mσ}. Both m and σ 2 are known. We provide analysis for symmetric and strict bowled-shaped loss functions (1.1) with F strictly increasing and absolutely continuous on [0, 2mσ], but otherwise general. The simple model addressed here still remains relevant and influential. It is relevant to various situations, namely via limit theorems, where all the uncertainty lies in the mean signal with the noise controlled and X is a normally distributed summary statistic (that may be sufficient as well). The other aspect of the model is the boundedness to an interval assumption on the mean, centered here about 0 without loss of generality. It is indeed often the case that bounds on the parameters can be stipulated and arise from a given practical context. A central objective is to study the efficiency of estimators that take values on or close to the boundary of the parameter space, such as the benchmark maximum likelihood estimator δ mle (X) = (mσ ∧ |X|) sgn(X) , (1.2) and to provide dominating estimators δ(X), including Bayes estimators, whenever δ mle is inadmissible, in terms of frequentist risk With much previous work focussed on squared-error loss (i.e., F (t) = t 2 ), our findings encompass many other alternative convex losses, but alternatively loss function features that may well be more attractive for the decision-maker such as boundedness and non-convexity. These include for instance reflected normal loss (e.g., Spiring, 1993) given by (1.4) 2004É. Marchand et al. γ being a positive constant. It is useful to observe that lim γ→∞ 2γL γ (d, μ) = |d − μ| 2 , so that we can reasonably anticipate that results relative to L γ with γ large mimic those for squared-error loss. Furthermore, as described in Section 6, additional sources of motivation and connections with reflected normal loss L γ arise from a predictive density estimation perspective.
Remark 1.1. Other classes of loss functions for which our findings apply, and which also arise in a predictive density estimation setting (see Lemma 6.3), are cases where F is generated by a cdf on R + , such as a half-normal F (t) = Φ(t) − 1/2, Φ being the N(0, 1) cdf. Our results also apply to a vast collection of losses F which are reflected densities of the form F (t) = 1 − ψ(t) ψ(0) , t > 0, with ψ a decreasing density on R + , which include reflected normal loss, as well as losses F (t) = 1 − e −t α /2γ , α, γ > 0, which are reflections of exponential power densities and which we will study. Alternatively, these latter losses are generated by Weibull distribution functions which relate to the other losses of this example.
Using various conditional risk decompositions, including a technique introduced by Moors (1985), Marchand and Perron (2001) obtained estimators that dominate δ mle under squared-error loss, including Bayesian dominators for small enough m. Namely, one of their results, which duplicates an earlier finding by Casella and Strawderman (1981), establishes that δ BU (X) dominates δ mle (X) for m ≤ σ , where δ BU (X) = mσ tanh( mX σ ) is the Bayes estimator with respect to the two-point uniform prior on the boundary {−m, m} of Θ(m). Otherwise for larger m, the projection δ p (X) = (|δ BU (X)| ∧ |δ mle (X)|) sgn(X) was shown by Moors (1985) to dominate δ mle (X) under squared-error loss. For other losses, the situation may differ. Indeed, for absolute value loss (i.e., F (t) = t), Iwasa and Moritani (1997), as well as Kucerovsky et al. (2009), show that δ mle is a proper Bayes estimator and is thus admissible. On the other hand, Iwasa and Moritani (1997) establish the inadmissibility of δ mle for a large class of convex losses, including L s losses (F (t) = t s ) with s > 1.
As for squared-error loss and as shown below using a conditional risk decomposition, we establish that δ mle , as well as others estimator taking values on the boundary of the parameter space, are inadmissible under reflected normal loss L γ and for a large class of losses. For such situations, an improvement is always provided by the above-defined projection δ p (X), with δ BU (X) the corresponding boundary uniform Bayes estimator. Moreover, despite not having in general an explicit expression for δ BU (X), we obtain a simple sufficient condition (Theorem 3.4), applicable to a large class of losses F , for δ BU (X) to dominate δ mle (X), namely m ≤ c 0 σ, with c 0 the unique solution in c of c f (c) f (c) = 1 and f = F a.e.. For L s loss F (t) = t s with s > 1, this reduces to m ≤ σ √ s − 1 (Example 4.1), extending the L 2 result mentioned above. For reflected normal loss, the above condition for δ BU (X) to dominate δ mle (X) reduces to m ≤ σ γ 1+γ (Example 4.2). In contrast to such losses; conditions of which are presented in Section 3 and which include strictly convex losses; we show in Section 3.1 that losses in (1.1) with concave F do not imply the deficiency of boundary taking estimators such as δ mle (X). In fact, in such cases, we show that the boundary uniform Bayes estimator is given by δ BU (X) = mσsgn(X).
The organization of the manuscript is as follows. Section 2 deals with preliminary results and definitions, namely those relative to the conditional risk where the conditioning is on |X| = r, as well as first properties of the best equivariant estimator δ g * λ when |μ| = λ. Section 3 begins with a further description of the optimization problem of determining g * λ (Theorem 3.1), while Subsection 3.1 deals with concave F , a non-shrinking property, and the lack of inefficiency of estimators that take values on the boundary of the parameter space. Subsections 3.2, 3.3 establish properties of δ g * λ and namely a shrinkage property for a wide class of losses, including convex losses. Subsection 3.4 contains dominance findings, while Section 4 is dedicated to examples, illustrations, numerical evaluations, and further observations. Finally, we expand on connections and implications for predictive density estimation in an Appendix (Section 6).

Notations, definitions, and preliminary results
For our model X ∼ N (μ, σ 2 ) with |μ| ≤ mσ, we assume σ = 1 without loss of generality and we set λ = |μ|, R = |X|, r = |x|, and φ as the N (0, 1) pdf. With F absolutely continuous, we can write F (t) = t 0 f (x) dx , t ≥ 0, f representing (throughout) the Radon-Nikodym derivative of F and with f = F a.e. The problem is invariant under sign changes and equivariant estimators are even functions of X. Equivariant estimators will thus possess constant risk on orbits S λ = {−λ, λ}, and we can therefore extract a best equivariant estimator for (what we will refer to as) the local problem where |μ| = λ. This can be done by conditioning on the maximal invariant statistic R (e.g., Eaton, 1989). We hence study the conditional risks ρ(λ, g(r), r) = E μ (F (|δ g (X) − μ|)|R = r)) , r > 0 , (2.1) for equivariant estimators δ g of the form with defining multiplier function g : [0, ∞) → R.
The decomposition leads to the evaluation of the unconditional frequentist risk where R 2 ∼ χ 2 1 (λ 2 ) and λ ∈ [0, m]. Such inferences are summarized with the following Lemma and will be further applied in Section 3.4. The main idea used here to obtain dominance results, which goes back to the work of Moors (e.g., Moors, 1985), is that estimator δ g will dominate estimator δ g1 whenever it lowers the conditional risk for all r > 0 and for all λ (also see Marchand and Strawderman, 2004, for a presentation).
Remark 2.1. Alternatively and equivalently, the estimator δ g * λ is Bayes with respect to the two-point uniform prior P (μ = λ) = P (μ = −λ) = 1/2, which we denote as π λ . This can be checked directly 1 , but it also must be the case based on the fact that the best equivariant estimator can be represented as the Bayes estimator associated with the corresponding Haar measure, which is here the two-point uniform prior on {−λ, λ} (e.g., Eaton, 1989). 1 Expanding on this, first observe that the expected posterior loss of the estimate δ is given by ρ (λ, δ, x) and that working with (2.6). We thus infer that δπ λ (x) is an odd function of x and that δπ λ (x) = argmin y {ρ(λ, y, |x|} sgn(x) = δ g * λ (x) for all x.

Main results
The main objective here is to study the efficiency of the maximum likelihood estimator δ mle , as well as other boundary-valued estimators for losses F (|d−μ|) as measured by the corresponding risk in (1.3). In particular, there is the issue of inadmissibility and the question of providing dominating estimators of δ mle , whenever possible. In this regard, we will be particularly interested in the performance of the boundary uniform Bayes estimator δ BU (X) and conditions on (m, F ) for which δ BU (X) dominates δ mle . Other inferences, not only are useful, but also turn out to be of interest. These include monotonicity conditions, as well as conditions for which the best equivariant estimator for the local problem |μ| = λ takes values on {−λ, λ}.
Subsection 3.1 deals with losses, which include concave F , where the best equivariant estimator takes values on the boundary of the parameter space, and where improvement by shrinkage is not possible. Afterwards, we study losses where improvement by shrinkage is possible. Subsection 3.2 provides conditions and examples, while subsection 3.3 establishes key monotonicity conditions. These lead to dominance results presented in subsection 3.4, and examples and illustrations then follow in Section 4.
Before proceeding, we present the following general result which addresses the minimization problem of the conditional risks ρ(λ, y, r) as a function of y for fixed (λ, r).

Proof. The proof of parts (i) and (ii) is based on equation (2.7). Under condition
Parts (iii), (iv), and (v) are established by using equation (2.6). For part we obtain that the function ρ(λ, ·, r) reaches its minimum at λ. For parts (iv) and ( Example 3.1. Theorem 3.1 covers a wide class of loss functions F identifying the best point estimate of μ for fixed λ, r. Several developments below deal with rather smooth choices of F , but Theorem 3.1 covers many more cases. Suppose, for the sake of an illustration that F (t) = max(t, t 2 ) for t ∈ [0, 2λ]. We consider separately cases: (A) λ > 1, (B) 1/2 < λ ≤ 1, (C) 0 < λ ≤ 1/2.

(A)
Observe that F is not differentiable at 1, but convex on [0, 2λ], so that we can focus on parts (iv) and (v) of Theorem 3.1 and its discriminating inequality. For λ > 1, we obtain From this, we obtain for 0 < r < r 0 : ) .

Estimating a bounded normal mean
From this, we obtain from part (iv) of Theorem 3.1, for 0 < r < r 0 : We point out that parts (i) and (ii) of Theorem 3.1 can also be applied directly for the above analysis.
and an application of part (iv) of Theorem 3.1 leads to the same evaluation of g λ .

Situations where the best equivariant estimator is not a shrinker
Building on Theorem 3.1, we expand here on situations where the best equivariant estimator for |μ| = λ is not a shrinker. These include cases where F is a concave function on [0, 2λ].
, a Bayes and admissible estimator of μ.
Proof. In view of Remark 2.1 and part (a), estimators δ * g λ (x) are (essentially) unique Bayes with finite Bayes risk, and thus admissible. This establishes part (b). Part (a) is a corollary of Theorem 3.1. Indeed, given that f (λ−y) ≥ f (λ+y) for almost all y ∈ [0, λ], part (ii) of Theorem 3.1 yields the result.
The above result tells us that estimators taking values on the boundary {−m, m} of the parameter space [−m, m], such as δ mle , cannot be dominated by making use of the conditional risk method of Moors (1985), or Marchand and Perron (2001), for a large class of losses F , which include concave F on [0, 2λ]. Of course, for the specific case of absolute-value loss, we have the stronger result that δ mle is admissible (Iwasa and Moritani, 1997;Kucerovsky et al., 2009), but the result here applies to a whole range of losses.

Remark 3.1. Examples of losses with concave
with a > 0, 0 < c ≤ 1; and more generally cases where F is the cdf associated with a decreasing density F on R + , , γ > 0, (i.e., half-Logistic).

Situations where the best equivariant estimator is a shrinker
We now turn to situations where the best equivariant estimator δ g * λ for |μ| = λ takes values on the interior of the interval [−λ, λ]. As a consequence of Theorem 3.1, we obtain the following description of δ g * λ . Corollary 3.1. Let λ > 0, r > 0 be fixed. Assume that condition (3.1) holds, that F is differentiable at 0 and 2λ, with F (0) = 0 and F (2λ) > 0. Then part (i) of Theorem 3.1 applies and the best equivariant estimator for |μ| = λ is given by (λ, λ, r). To justify that part (i) of Theorem 3.1 applies, we use equation (2.7) to derive which shows that the set {t : ψ(λ, t, r) > 0} has positive Lebesgue measure, so that ess inf{f (λ − y)/f (λ + y): y ∈ [0, λ]} < exp(−2λr). Specific instances where condition (3.1) holds, and Corollary 3.1 is applicable, includes cases where f is continuous on [0, 2λ] and where either: is nondecreasing in y for y ≥ y 1 , and is nonincreasing in y for y < y 1 for some y 1 ∈ (0, λ). In such cases, in contrast to concave F cases, best equivariant estimators for the local problem |μ| = λ are shrinkers, i.e., |δ g * λ (x)| < λ for almost all x. This finding, along with further monotonicity properties still to establish, will be exploited below to identify complete classes of estimators and obtain dominating estimators of boundary-taking estimators such as δ mle (X) for the global problem with |μ| ≤ m. Our formulation highlights namely conditions (I) and (II), as these will play a key role later on for monotonicity and dominance results. We pursue with illustrations. Example 3.2. As mentioned in the previous paragraph, the shrinkage conclusion of Corollary 3.1 applies to loss functions in (1.1) with F convex on [0, 2λ] (condition I), such as L s loss F (t) = t s , with s > 1, and such as F (t) = e βt − βt − 1 with β = 0 (i.e., a symmetrized Linex loss function), among many other examples. For L s loss with F (t) = t s , s > 1, the best equiv- For L 2 loss, we recover the familiar g * λ (r) = λ tanh(λr) and δ g * λ (x) = λ tanh(λx) (e.g., Casella and Strawderman, 1981). Example 3.3. Investigating the conditions of Corollary 3.1 for reflected normal loss and more generally to loss functions (1.1) with 2γ , for all λ > 0, γ > 0, and α ∈ (1, 3]. Finally, we point out that stronger condition (II), that the ratio f (λ−y) f (λ+y) is nonincreasing in y ∈ [0, λ], is seen to be satisfied with the above analysis virtue if and only if ∂ ∂y log( f (λ−y) f (λ+y) ) is negative at y = 0 + , which is equivalent to λ ≤ ( 2γ(α−1) α ) 1/α . As an example of the best equivariant estimator δ g * λ , consider reflected normal loss L γ (with α = 2 above), for which we obtain from (3. We conclude this example by pointing out that the Bayes estimator δ g * λ and its defining equation (3.2) were studied by Towhidi and Behboodian (2002) with the objective of determining conditions for minimaxity (also see van Eeden, 2006;pp. 48-49). We will comment on this below in Example 4.3.

Monotonicity properties of g * λ (r)
We require the following monotonicity properties of the best equivariant multiplier g * λ (r). Lemma 3.1. Consider the context of Corollary 3.1, the best equivariant estimator δ g * λ under loss F (|d − μ|) for |μ| = λ, g * λ (r) being the solution in y of equation (3.2). Suppose that f is continuous on [0, 2m], f (0) = 0, and f > 0 on (0, 2m]. is nondecreasing in λ for λ > y and fixed y > 0. Therefore the left-hand side of (3.2) also is nondecreasing in λ for λ > y and fixed y > 0. Now, in view of this and the monotone decreasing condition for the ratio f (λ−y) f (λ+y) , it must be the case that g * λ (r) is nondecreasing in λ for λ ∈ (0, m) given the equilibrium in (3.2). Part (b) is established analogously as equation (3.2), along with the given monotonicity condition on f (λ−y) f (λ+y) , imply that is nonincreasing in both r and g * λ (r), whence the monotone increasing property of g * λ (r) in r. Example 3.4 (L s losses and other convex losses -continuation of Example 3.2). The logconcavity condition on f in part (a) of Lemma 3.1 is quite weak and satisfied by all the losses studied up to now, as well as many more. It may be interpreted as a condition stipulating that the loss does not rise too steeply from the origin as a function of the error of estimation d − μ. An example of a loss in (1.1) that does not satisfy such a condition, and penalizes severely error of estimation, is F (t) = e t 2 − 1. Among loss functions satisfying the condition, are included L s loss with s > 1 and symmetrized Linex loss given in Example 3.2. Furthermore, losses in (1.1) with convex F have ratios f (λ−y) f (λ+y) which decrease in y since f = F is increasing. We thus conclude that the monotonicity properties of Lemma 3.1 hold for all L s losses with s > 1, symmetrized Linex losses, and more generally all strictly convex losses in (1.1) with log concave F .

Example 3.5 (Reflected normal loss and extensions-continuation of Example 3.3). As seen in Example 3.3, for loss functions in
Since F is logconcave on R + , these conditions are sufficient for the monotonicity properties of Lemma 3.1 to hold.
Remark 3.2. The monotonicity properties of Lemma 3.1 need not hold in the absence of the given conditions. An illustration is given by Example 3.1 where, for 1/2 < λ ≤ 1 and small enough values of r (r < r 0 as given), we have g * λ (r) = 1 − λ which is strictly decreasing in λ for 1/2 < λ ≤ 1.

Proof. Part (a) follows as a direct application of part (b) of Lemma 2.1 and part (a) of Theorem 3.1 which tells us that the conditional local risk ρ(λ, y, r)
is nonincreasing in y for 0 < y < g * λ (r), and nondecreasing for y > g * λ (r). Part (c) follows from part (b) since the additional assumptions and an application of Lemma 3.1 imply thatḡ m ≡ g * m . For part (b), Theorem 3.1 applies and tells us that g 1 (r) > g * λ (r) for all λ ∈ [0, m] whenever r ∈ A g1 . An application of part (a) implies for the conditional risks that a(λ, g 1 , r), g 1 (r) , and the dominance result follows.
The previous result is as a complete class result in the spirit as those obtained by Moors (1985), as well as Marchand and Perron (2001, 2005, 2009), applicable to squared error loss. The novelty here is that the results are presented for a class of loss functions, which include strictly convex loss functions, as well as for reflected normal loss L γ ; the study of the latter having also been motivated by connections with predictive density estimation (see Section 6). Practically speaking, equivariant estimators δ g1 that take values on or too close to the boundary {−m, m}, are inadmissible and improved by projecting towards δ gm , and even beyond δ gm , as specified in the previous Theorem.
We now apply the dominance results to the restricted maximum likelihood estimator of μ given in (1.2). For the loss functions satisfying the required conditions, dominating estimators can be determined by applying part (b) of Theorem 3.3. We seek more and search for conditions under which the Bayes boundary estimator δ BU satisfies the dominance conditions. For this, we will require that δ BU shrinks δ mle towards 0, but we obtain a quite simple condition on m for such a shrinkage and dominance to occur. Proof. With the given assumptions, we can apply Corollary 3.2, as well as part (c) of Theorem 3.3 for δ g ≡ δ g * m (i.e., δ g ≡ δ BU ). There remains to further show, under the given condition m ≤ c 0 , that δ BU shrinks δ mle towards 0, which is equivalent to g * m (r) r ≤ 1 for all 0 < r < m. Set h(t) = log f (t), so that defining equation (3.2) for m = λ is written as Differentiate the above twice with respect to r to obtain Now, observe that the right-hand side of (3.6) is negative by virtue of convexity condition C3 on h = (log f ) . With h (m−g * m (r))+h (m+g * m (r)) ≥ 0 following from condition C2 and the monotone increasing in r property of g * m (r) (Lemma 3.1), we conclude that g * m (r) must be a concave function of r > 0 under the given assumptions.
The concavity of g * m (r) implies that g * m (r) r is decreasing in r > 0 since g * m (r) is increasing in r and since g * m (0) = 0 (equation (3.4)). Therefore a necessary and sufficient condition for the shrinkage property g * m (r) ≤ r, for r > 0, to occur is where we have further made use of (3.5). To conclude the proof, there remains to given condition C1, and since h (c) > 0 for 0 < c ≤ m given condition C2, which tells us that h (c − y) + h (c + y) ≥ 0 for all y ∈ (0, c), and namely for y → 0 + .

Examples and numerical comparisons
We illustrate here how our dominance results apply for various loss functions, providing also numerical risk function comparisons for reflected normal loss. f (m) = m 2 s−1 , we conclude that the boundary uniform Bayes estimator δ BU dominates δ mle for m ≤ √ s − 1. For s = 2 (i.e., squared error loss), this becomes m ≤ 1 thus recovering a dominance result first obtained by Casella and Strawderman (1981), and with the entire proof an L s loss extension of Marchand and Perron's (2001) univariate squared error loss result.

Example 4.2 (Reflected normal loss and extensions; continuation of Examples 3.3 and 3.5). Consider again the class of loss functions
2γ , γ > 0 and α > 1 and label these as F α,γ .
• As seen in Example 3.3, the conditions of Corollary 3.1 are satisfied for all γ > 0, 1 < α ≤ 3. It thus follows (Corollary 3.2) that the estimator δ mle is inadmissible for all such losses with dominating estimators provided by part (b) of Theorem 3.3, and including δ g with g(r) =ḡ m (r) ∧ g mle (r). Consequently, part (c) of Theorem 3.3, as well as Corollary 3.2, apply withḡ m = g * m whenever m ≤ c 1 . • For applying Theorem 3.4, consider m ≤ c 1 (as above) so that conditions C1 and C2 are satisfied. For condition C3, we have that

• As seen in Example 3.5, conditions C1 and C2 of Theorem 3.4 are satisfied, and equivalently Theorem 3.3(c)'s conditions, as long as
Observe that 2γc 2 1 + αc α 1 > 2γ(α − 1), which implies that c 0 ≤ c 1 . Theorem 3.4 thus implies that δ BU dominates δ mle : • For reflected normal loss with α = 2, we obtain explicitly c 0 = γ 1+γ , and thus the dominance of δ BU over δ mle whenever m ≤ γ 1+γ . Observe that this cut-off point approaches 1 for large γ, which corresponds to the cut-off point for dominance under squared error loss (see Example 4.1) and which is plausible given the heuristic argument following (1.4).
Remark 4.1. It is of interest to assess how a Bayes estimator relative to a loss function performs for other loss functions. Some of the theoretical developments are useful for addressing such a question. As an illustration, consider reflected normal loss L γ in (1.4) and the corresponding boundary uniform Bayes estimator δ BU,γ (x) = g BU,γ (|x|)sgn(x), where g BU,γ (r) solves (3.3) for λ = m. Taking logarithms on both sides and since 1 2 log 1+u 1−u = tanh −1 (u) for u ∈ (0, 1), we obtain alternatively g BU,γ (r) = m tanh m(r + g BU,γ (r) γ ) , for all m, r, γ > 0 .
Clearly then, we have g BU,γ (r) ≥ m tanh(mr) for all m, r, γ > 0 which, in other words, tells us that δ BU,γ always expands on the squared error loss boundary uniform Bayes estimator δ BU (x) = m tanh(m|x|) sgn(x). It thus follows that 1+γ . An application of Corollary 3.2 and Corollary 3.1 for squared error loss (d − μ) 2 therefore implies that δ BU,γ dominates δ mle under squared error loss when m ≤ γ 1+γ . A summary of inferences under reflected normal loss is as follows. In such cases where δ BU does not dominate δ mle , its truncation onto δ mle given by δ g1 (x) = (g BU (r) ∧ g mle (r))sgn(x) necessarily dominates δ mle (Corollary 3.2). However, gains in risk as witnessed by Figure 2 are quite slim. The Bayes estimator under L 2 loss, none other than the posterior mean given by δ g2 (x) = m tanh(mx), remains an interesting benchmark. As seen above in Remark 4.1, it shrinks more than δ BU with corresponding good relative risk performance for small of moderate values of |μ| as suggested by numerical evaluations such as those illustrated by Figure 2.
To conclude, Figure 2 suggests that the risk of δ BU attains its maximum on the boundary of the parameter space [−m, m] and that, consequently, it is minimax for m = 0.75 and γ = 1 (among other values of (m, γ) where m is small). This is not predicted by Towhidi and Behboodian's (2002) minimaxity condition m ≤ ( √ γ 2 ∧ γ 1+γ ) = 1/2 (for γ = 1), but still plausible nevertheless.

Concluding remarks
This paper relates to the estimation of a bounded normal mean with a distinctive feature being the analysis provided for a large class of strict bowled-shaped losses, including convex, non-convex or concave choices, L q and reflected normal losses. Non-convex penalties, as well as various strict bowled-shaped losses, are appealing and relevant to a wide class of statistical problems (e.g., penalized regression, penalized variable selection). The latter arise precisely in predictive density estimation problems as further expanded upon in the Appendix. In assessing the frequentist risk efficiency of the restricted mle, as well as other estimators taking values on the boundary of the parameter space, our findings clearly highlight the role of the loss, as well as the width of the parameter space. In many cases, which include convex choices of loss like L q loss, as well as others like reflected normal loss, we establish the inadmissibility of the mle and provide improvements. Namely, we establish using conditional risk techniques conditions for which the Bayes boundary uniform prior δ BU dominates the mle, despite the absence of explicit expressions for δ BU and the risks of the estimators. Further progress relative to the risk performance of other Bayes estimators, such as the fully uniform Bayes estimator, as well as to multivariate versions of the problem, are challenging and pertinent propositions for future work.

Appendix: Connections and implications for predictive density estimation
Additional motivation for the point estimation problem considered here stems from further connections with predictive density estimation. There has been much recent Bayesian and decision theory analysis of predictive density estimators, in particular for multivariate normal or spherically symmetric settings, as witnessed by the work cited in this section, as well as Komaki (2001), George, Liang and Xu (2006), Brown, George and Xu (2008), Kato (2009), Maruyama and Strawderman (2012), Boisbunon and Maruyama (2014), Maruyama and Ohnishi (2016), among others. We describe here several such problems where the risk evaluation of a subclass of predictive density estimators, including plug-in density estimators, is equivalent to the risk evaluation of a point estimator under a dual loss. Consider spherically symmetric and independently distributed (6.1) with p and q known Lebesgue densities. 3 For predictive analysis purposes, one wishes to obtain a predictive densityq(y 2 ; y 1 ), based on observed y 1 , as an estimate of q( y 2 − μ 2 ), y 2 ∈ R d . Several loss functions are at our disposal to measure efficiency and these include the class of α−divergence loss functions (e.g., Csiszár, 1967) given by Notable examples in this class include Kullback-Leibler (h −1 ), reverse Kullback-Leibler (h 1 ), and Hellinger (h 0 /4). Integrated absolute error and squared error losses, referred hereafter as L 1 and L 2 losses, provide other choices. These are L s losses, s = 1, 2, given by For all of the above losses, the performance of predictive densitiesq(·; Y 1 ) may be measured by the frequentist risk The dual relationships which we describe apply to predictive densities of the form 1 c d q( whereμ(Y 1 ) is an estimator of μ. Cases c = 1 correspond to plug-in predictive density estimators, while cases c > 1 correspond to scale expanded variants. As discussed in Fourdrinier et al. (2011) for Kullback-Leibler loss, and Kubokawa, Marchand & Strawderman (2015; for L 2 and L 1 losses, such scale expansions are interesting to consider and can provide significant risk improvement on plug-in procedures. As an illustration, consider the following multivariate normal version of (6.1): for which the minimum risk equivariant MRE (under changes of location) predictive density estimator, or equivalently the generalized Bayes predictive density estimator with respect to the prior π(μ) = 1, is given by a Ghosh, Mergel and Datta, 2008). With the exception of reverse Kullback-Leibler loss, all the above MRE predictive density estimators are, for α ∈ [−1, 1), indeed scale expansion variants and dominate the plug-in N d (Y 1 , σ 2 Y2 ) density under the corresponding loss. Here is an adapted version of Kubokawa, Marchand and Strawderman (2015, Theorem 3.1, part a). Lemma 6.1 (Duality between integrated L 2 and reflected normal losses). For normal model (6.6), the frequentist risk of a predictive density estimatorq c,μ ∼ N d (μ(Y 1 ), c 2 σ 2 Y2 I d ) of the density of Y 2 ∼ N d (μ, σ 2 Y2 I d ) under integrated L 2 loss is dual to the frequentist risk ofμ(Y 1 ) for estimating μ under reflected normal loss L γ0 with γ 0 = (c 2 + 1)σ 2 Y2 . Namely,q c, In the context of Lemma 6.1, it turns out that both Kullback-Leibler and reverse Kullback-Leibler losses have squared error as a dual point estimation loss (e.g., Fourdrinier et al., 2011 for KL loss). For other α−divergence losses, it is again reflected normal loss which is dual for plug-in predictive density estimators and for scale expansion variants as in (6.5) (also see Ghosh, Mergel and Datta, 2008 for related work). Lemma 6.2 (Duality between α−divergence and reflected normal losses). For normal model (6.6), the frequentist risk of a predictive density estimatorq c,μ ∼ loss (6.2), with |α| < 1, is dual to the frequentist risk ofμ(Y 1 ) for estimating μ under reflected normal loss L γ0 with γ 0 = ( c 2 1+α + 1 1−α ) σ 2 Y2 . Namely,q c,μ1 ∼ N d (μ 1 (Y 1 ), c 2 σ 2 Y2 I d ) dominatesq c,μ2 ∼ N d (μ 2 (Y 1 ), c 2 σ 2 Y2 I d ) under α−divergence loss iffμ 1 (Y 1 ) dominatesμ 2 (Y 1 ) under loss L γ0 .
With regards to the last result, the findings of this paper apply for d = 1, X = Y 1 ∼ N (μ, σ 2 ), but general cdf Q. We obtain the following as a consequence of Theorem 3.2 for integrated L 1 loss.