On Bayesian credible sets, restricted parameter spaces and frequentist coverage

: For estimating a lower bounded parametric function in the framework of Marchand and Strawderman [6], we provide through a uniﬁed approach a class of Bayesian conﬁdence intervals with credibility 1 − α and frequentist coverage probability bounded below by 1 − α 1+ α . In cases where the underlying pivotal distribution is symmetric, the ﬁndings represent exten- sions with respect to the speciﬁcation of the credible set achieved through the choice of a spending function , and include Marchand and Strawder- man’s HPD procedure result. For non-symmetric cases, the determination of a such a class of Bayesian credible sets ﬁlls a gap in the literature and includes an “equal-tails” modiﬁcation of the HPD procedure. Several examples are presented demonstrating wide applicability.


Introduction
Bayesian credible sets are not designed (e.g., Robert [7]) and are far from guaranteed (Fraser [3]) to have satisfactory, exact or precise frequentist coverage but it is nevertheless of interest to investigate (Wasserman [9]) to what extent there is convergence or divergence in various situations. A historically resonating example where there is exact convergence arises for estimating the mean of a N (µ, σ 2 ) distribution, and where the use of the non-informative prior leads to a (1 − α) × 100% HPD credible set (i.e. the z or t confidence interval) with exact frequentist coverage. This, however, is very much the exception. Even, in the simple presence of a lower bound on the mean parameter µ (e.g., Mandelkern [4]), with the prior taken to be the truncation of the non-informative prior onto the restricted parameter space, the frequentist coverage of the (1 − α) × 100% HPD credible set fluctuates from its credibility (or nominal coverage) 1 − α. However, the HPD procedure does not fare poorly as a frequentist procedure for large 1 − α as witnessed by the lower bound 1−α 1+α on its frequentist coverage due to Roe and Woodroofe ( [8], known σ 2 ) and Zhang and Woodroofe ( [10], unknown σ 2 ), as well as the better lower bound 1 − 3α 2 (for α < 1/3, known σ 2 ) obtained by Marchand et al. [6].
In a generalization of the above, Marchand and Strawderman ( [6]) introduced a unified framework for which the (1 − α) × 100% HPD credible set of a lower bounded parametric function has frequentist coverage greater than 1−α 1+α for all values lying in the restricted parameter space. This framework, as well as its various applications, will be revisited in Sections 2 and 5, but let us consider for sake of illustration the basic examples: (i) X ∼ f 0 (x − θ) with known f 0 , θ ≥ 0; and (ii) X ∼ Gamma(α, θ) with θ ≥ 1 with known α. For location family densities as in (i) with f 0 unimodal and symmetric, Marchand and Strawderman's results apply for the flat prior on the truncated parameter space [0, ∞) and the corresponding (1 − α) × 100% HPD credible set, with the guarantee that the actual frequentist coverage is bounded below by 1−α 1+α for all θ ≥ 0. However, if f 0 is not symmetric, such a result does not hold in general ( [6], Example 1). The same is true for a vast number of so-called non-symmetric situations arising in Marchand and Strawderman's framework, including the Gamma models in (ii) where the prior is given by 1 θ I [1,∞) (θ), that is the truncation on [1, ∞) of the usual non-informative prior 1 θ I (0,∞) (θ). It is true that the bound holds for certain specific classes of f 0 's ( [6], Theorem 2, a), and it is also the case that numerical evaluations of a theoretical and unexplicit lower bound for frequentist coverage provides further evidence for satisfactory coverage for a specific Gamma model in (ii) ( [6], Example 2). Nevertheless, a clear analytical result or lower bound for frequentist coverage in such non-symmetric cases is lacking, and it our motivation here to try to fill this gap.
For a large variety of situations with a lower bounded parametric constraint, we obtain here a class of Bayesian (1 − α) × 100% credible sets which provide minimal frequentist probability coverage exceeding 1−α 1+α . These Bayesian confidence intervals include an "equal-tails" modification, or approximation, of the HPD credible set, which also coincides with the latter in situations of underlying symmetry. Our findings are achieved by introducing and exploiting a spending function interpretation of Bayesian confidence intervals, and lead to a class of procedures (rather than a single one) which share the above lower bound for frequentist coverage. The rest of the paper is organized as follows. Preliminary results, definitions and model assumptions, including those related to the spending function associated with a Bayesian credible interval, are presented in Section 2, while Bayesian credible interval representations are outlined in Section 3. The main findings concerning frequentist coverage appear in Section 4 and various applications are presented and commented on in Section 5.

Assumptions, invariance, pivot, prior, and implications
As in basic examples (i) and (ii), we consider model densities f (x; θ); x ∈ X , θ ∈ Θ ⊂ R p ; for an observable X, and we are concerned with interval estimation of a parametric function τ (θ) (R p → R) with the additional constraint τ (θ) ≥ 0. We assume there exists a pivot of the form T (X, θ) = a1(X)−τ (θ)
We further assume that the unrestricted decision problem is invariant under a group G of transformations and that the pivot satisfies the invariance requirement T (x, θ) = T (gx,ḡθ), for all x ∈ X , θ ∈ Θ, g ∈ G,ḡ ∈Ḡ, with X , Θ, G, and G being isomorphic. For instance, in basic example (i), the invariance is achieved with the additive group G on R p and since T (x, θ) = x − θ = (x + g) − (θ + g) = T (gx,ḡθ) for all group elements g.
Collecting the above assumptions, we have for further reference.

1422É. Marchand and W. E. Strawderman
Lebesgue density g 0 . We further assume that the decision problem is invariant under a group G of transformations and that the pivot satisfies the invariance requirement T (x, θ) = T (gx,ḡθ), for all x ∈ X , θ ∈ Θ, g ∈ G,ḡ ∈Ḡ, with X , Θ, G, andḠ being isomorphic.
We consider prior measures π H and π 0 , where π 0 (θ) = π H (θ)I [0,∞) (τ (θ)), and π H is the Haar right invariant measure which satisfies the property π H (Aḡ) = π H (A) for every measurable subset A of Θ, and for every g ∈ G. The right Haar measure π H exists and is unique up to a multiplicative constant for locally compact groups such as location, scale, and location-scale. For the basic location and the Gamma model (or scale model) examples of the Introduction, right Haar invariant measures are given by π H (θ) = 1 and π H (θ) = 1 θ respectively. For a sample from a location-scale family with . . , n, the common non-informative prior π(θ) = 1 θ2 is right Haar invariant. We refer to [1] and [2] for detailed treatments of invariance and Haar invariant measures.
A key feature relative to Assumption 1 and the choice of the right Haar invariant measure is that the frequentist distribution of T (X, θ); which is free of θ by virtue of the pivot assumption for T (X, θ); coincides with the posterior distribution of T (x, θ) under π H for any given x, i.e., We will continue, after the next Lemma, by illustrating the above and drawing implications of immediate interest. For sake of completeness, we reproduce here a key lemma from [6] justifying (1) and we refer to their work for further details.
Lemma 1 ([6], Corollary 1). Suppose X , Θ, G, andḠ are all isomorphic, and that T (X, θ) is a function for which T (x, θ) = T (gx,ḡθ), for all x ∈ X , θ ∈ Θ, g ∈ G,ḡ ∈Ḡ. Then condition (1) Now, for the basic unrestricted location family example with the flat prior π H (θ) = 1, which is Haar right invariant, observe that the posterior density of θ is given by so that the posterior density of −T (x, θ) = θ − x associated with π H is given by g 0 as well. This correspondence for basic example (i) illustrates property (1) which is, of course, more general under Assumption 1.
In general, observe that the posterior cdf under π H for τ (θ) is available from Now, under the truncation π 0 of π H , the above correspondence between the frequentist and posterior distributions of −T (X, θ) does not hold, and the posterior cdf under π 0 of τ (θ) differs. However, we can still express the posterior distribution of τ (θ) under π 0 in terms of π H and G. Indeed, with π 0 (θ) = π H (θ) |x ∼ G under π H , we have for a measurable set A ⊂ Θ 0 = {θ ∈ Θ : τ (θ) ≥ 0}, and for any x: In terms of the posterior survival function of τ (θ) under π 0 , the above yields along with (2), for y ≥ 0, We will make use, in Section 3, of the above in setting and describing the bounds of Bayesian credible sets for τ (θ) under π 0 .

The spending function associated with a Bayesian credible set
With the objective of constructing a (1 − α) × 100% Bayesian credible set or region, the determination of a posterior distribution for τ (θ) supported on [0, ∞) leaves open many choices and various different approaches (e.g., [1], section 4.3.2). The HPD credible set is one such region chosen to minimize volume and leading to intervals for unimodal posterior densities. In our set-up, (1−α)×100% Bayesian credible intervals are, more generally, of the form An alternative (and equivalent) way to set or view the bounds l(x) and u(x), for a given x, is to focus on the complementary set [0, l(x)) ∪ (u(x), ∞) and to allocate (or "spend") probabilities α − α(x) and α(x) respectively on its two disjoint parts, with α(x) ∈ [0, α]. It is clear (when the posterior density is absolutely continuous) that the choice α(x) leads to a unique choice of [l(x), u(x)], and vice-versa. Since we are interested in the frequentist properties of such Bayesian credible intervals, we will represent this allocation as a spending function. Moreover, our findings guaranteeing minimal frequentist coverage of at least 1−α 1+α for a class of Bayesian credible sets will be conveniently expressed as conditions on the corresponding spending function.

1424É. Marchand and W. E. Strawderman
For example, a lower-tailed credible interval for a given x corresponds to the selection α(x) = α, an upper tailed credible interval corresponds to α(x) = 0, and an equal tailed (based on the posterior π) corresponds to α(x) = α/2.

Checklist
To facilitate the further presentation of the results, here is a list of definitions and notations used. given by C(θ) = P θ (I π0,α(·) (X) ∋ τ (θ))

Bayesian credible intervals: Representations and properties
In this section, we expand upon two different, yet equivalent, and instructive approaches to constructing a credible set for τ (θ) associated with prior π 0 . These are: (A) the spending function approach, and (B) the approach based on the quantiles of the pivot.

1426É. Marchand and W. E. Strawderman
when l(x) > 0. These above bounds coincide with those of the HPD procedure (when l(x) > 0) in the symmetric case of Example 1, as well as for the spending function given in (4) which can be verified directly from (3). NOTE: We wish to emphasize that the terminology "equal tails" does not mean α(x) = α/2 (i.e., equal tails under the posterior distribution), but rather refers to the choice of (equal tails) quantiles −γ 1 and γ 2 under G.
The next section's lower bound of 1−α 1+α on frequentist coverage applies to a class of Bayesian credible intervals. This class will include an equal-tails credible interval I π0,αeqt(·) which relates to both approaches presented in this section. On one hand, it borrows the bounds (and hence the spending function) of the HPD procedure for symmetric about 0 unimodal densities and, on the other hand, it is defined through the above equal-tailed choice (whenever l(x) > 0).
Proof. It suffices to show directly that (8) is satisfied for the selection α(x) = α eqt (x) given in (4) for x such that t(x) ≥ y 0 . Indeed, we have for such x's: Remark 1. In cases where the underlying pivotal distribution is non-symmetric, Corollary 1 is a new result, generalizing Theorem 1 of [6], and is widely applicable given the lack of assumptions on g 0 . Also, the bounds of the equal-tails procedure are easier to evaluate than that of the HPD credible interval. Additionally, the findings of Theorem 1 go beyond a single procedure, even in the symmetric case, by providing a class of credible sets, as specified by a spending function, with frequentist coverage bounded below by 1−α 1+α .
We do not have a recommended prescription for the choice of the spending function among those specified by Theorem 1 as guaranteeing minimal frequentist coverage of at least 1−α 1+α . The G−equal-tails choice is simple, intuitively appealing and matches the HPD procedure under symmetry of the pivotal density, while upper tailed and lower tailed choices are not allowed for x such that t(x) ≥ y 0 . The particular choices of spending function in (8): (i) , are other interesting choices that push I π0,α(·) as much as allowed by (8) towards one sided credible intervals with upper bound equal to +∞ or lower bound equal to 0, respectively.

Examples
At the risk of some repetition with the examples provided by [6], it is still beneficial here to present various applications with accompanying commentary. Assumption 1 is satisfied in all of the examples below with the underlying family of distributions being the location family, the scale family, or the location-scale family. In all of the examples, Theorem 1 and Corollary 1 provide conditions on the spending function α(·) so that the Bayesian intervals I π0,α(·) (X) have minimal frequentist coverage greater than 1−α 1+α for all θ such that τ (θ) ≥ 0. These intervals include the equal-tails procedure given in Definition 2 and can be evaluated in general using the expression given in Lemma 2.
Remark 2. Results such as those in (A) are applicable as well for several observations by conditioning on a maximal invariant statistic V . Such a maximal invariant statistic V is an ancillary statistic and specifically an invariant function such that every other invariant statistic is a function of V . Indeed, suppose that X = (X 1 , . . . , X n ) ∼ f 0 (x 1 −θ, . . . , x n −θ), where f 0 is known and where the X i 's are not necessarily independently distributed. Here, V = (X 2 −X 1 , . . . , X n −X 1 ) is a maximal invariant statistic. One can then proceed, for a given value v of V , with an interval estimate I π0,α(·,v) (X 1 , v) as given in Lemma 2 with G ≡ G v representing the cdf of the pivot X 1 − θ conditional on V = v, and α(x, v) satisfying the conditions of Theorem 1 and (8). This is feasible by the pivot and ancillarity property with the joint distribution of (X 1 −θ, V ) independent of θ. In such a case, Theorem 1 applies to the conditional frequentist coverage C(θ, v) = P θ (I π0,α(·,v) (X, v) ∋ τ (θ)|V = v) yielding the inequality C(θ, v) > 1−α 1+α for all θ ≥ 0. Since this is true for all v, the unconditional frequentist coverage C(θ) of the Bayes credible set I π0,α(·,·) (X, V ) will also exceed 1−α 1+α for all θ ≥ 0 (see [6] for more details related to a multivariate Student model). In the same vein, all the scenarios below (B to G), although presented for simplicity in the single observation case, are also applicable in presence of a sample by conditioning on a maximal invariant statistic.