On the behaviour of Bayesian credible intervals in partially identiﬁed models

: Partially identiﬁed models typically involve set identiﬁcation rather than point identiﬁcation. That is, the distribution of observables is consistent with a set of values for the target parameter, rather than a single value. Interval estimation procedures therefore behave diﬀerently than for identiﬁed models. For instance, a Bayesian credible set arising from a proper prior distribution will tend to a non-degenerate set as the sample size goes to inﬁnity. A natural question arising is for what parameter values does the limit of the Bayesian credible interval fail to cover its target? Intuition suggests this would arise for parameter values which are not very consistent with the prior distribution. The aim of this paper is to quantify this intuition.


Introduction
Limitations in the form of discrepancies between ideal data and available data can lead to partial identification, such that learning the distribution of observable variables only reveals a set of possible values for the target parameter, not a single value. The set of target parameter values consistent with the distributional law of the observables is often termed the identification region. The concept of partial identification is surveyed by Manski [16], with recent literature on interval estimation in partially identified models including Imbens and Manski [12], Vansteelandt et al. [22], Romano and Shaikh [21], and Zhang [23]. An interesting distinction arises in this literature. In frequentist terms, interval estimators can be designed to have at least nominal probability of containing the true value of the target, or to have at least nominal probability of containing the entire identification region.
From the Bayesian perspective, starting with a proper prior distribution over all parameters one can interpret a posterior credible set as being likely to contain the true value of the target, given the combination of observed data plus prior beliefs. Work considering Bayesian interval estimators in partially identified models includes Moon and Schorfheide [17], Liao and Jiang [15], and Gustafson and Greenland [11]. One particular feature distinguishing the Bayesian approach is that the large-sample limit of the posterior distribution will generally assign more weight to some values in the identification region than others. The extent to which learning the shape of this distribution is inferentially useful, over and above learning the identification region, is taken up in Gustafson [9]. As emphasized in Gustafson and Greenland [11], the usual calibration property of Bayesian procedures is unaffected by the lack of identification. That is, the average frequentist coverage of the Bayesian credible set, taken with respect to the prior distribution over the parameter space, equals the nominal coverage. In contrast, approaches such as that of Vansteelandt et al. [22] aim to achieve nominal coverage at 'worst-case' spots in the parameter space, with better than nominal coverage, paid for via extra length, achieved elsewhere.
In the partially identified context, interval estimators generally, and Bayesian credible sets specifically, will tend to a non-degenerate set as the sample size increases. Toward a full understanding of Bayesian credible sets in partially identified contexts, a natural question to ask is: when does the limiting set contain its target? More particularly, the parameter space can be cleaved into parameter values under which the limiting credible set does and doesn't contain its target. Initial intuition suggests that the choice of prior distribution should play a role in this regard. Since the posterior density will be pushed upward (downward) in regions where the prior density is high (low), we might look to parameter values in the tails of the prior distribution as values under which generated datasets would produce credible intervals missing their targets. As well, the values of those parameter components which are less informed by the data might be suspected to play a bigger role in whether or not the limiting credible interval contains its target. Through theory and examples, this paper aims to quantify these intuitions.

Methodology
Let π(θ, D) denote the joint density of a parameter vector θ ∈ Θ and observable data D, as arises as the product of a proper prior density π(θ) and a statistical model density π(D|θ). Assume that θ comprises a 'scientifically intuitive' parameterization of the model, such that investigators would feel comfortable specifying a prior distribution for θ, as opposed to specifying a prior in some other parameterization. Also assume that the primary inferential interest lies in some scalar aspect of θ, denoted as the estimand ψ = g(θ). As a point to bear in mind for later, we presume that the proper prior for θ is elicited with regard to the scientific interpretations of the elements of θ only, and without regard to whether or not the model π(D|θ) is identified. For instance, imagine an investigator must commit to a prior on the scientific quantities at hand before finding out whether he will receive (i), a large budget for data acquisition which permits gold-standard measurements and consequently an identified model, or (ii), a small budget which necessitates cheaper proxy measurements giving rise to only a partially identified model.
When useful, we write D n to emphasize observable data comprised of n observations which are independent and identically distributed given θ. To consider interval estimation, let I (α) π (D n ) denote a credible set for ψ of some type, having posterior probability 1 − α of containing ψ based on prior π. The two primary examples of 'types' would be equal-tailed intervals and highest posterior density (HPD) sets. We do not consider 'pathological' choices of credible sets, such as sets which purposefully exclude values in the 'middle' of the posterior distribution. This proviso makes it more meaningful to discuss credible interval failure arising when true parameter values lie in the tail of the prior distribution.
The immutable calibration property of Bayesian interval estimation is as follows. With respect to the joint distribution π on (θ, D n ), This follows immediately upon writing the probability on the left-hand side as an iterated expectation, with the inner expectation taken with respect to the posterior distribution of (θ|D n ). We can interpret (2.1) in the following manner. With respect to a sequence of studies arising under different conditions (i.e., a different 'true value' of θ each time), the 1 − α credible interval contains its target in proportion 1 − α of the studies, under a strong proviso. Particularly, (2.1) assumes the distribution generating the sequence of true θ values coincides with the distribution used by investigators as a prior distribution. Put another way, if 'nature's prior' and the investigators' prior match, then the investigators' interval estimation procedure is well calibrated. It is worth reinforcing that (2.1) is exact for any n, and does not require model identification.
Of course identified and correctly specified parametric models yield nicely behaved credible intervals, under weak regularity conditions. As n increases, I (α) π (D n ) converges to the correct value of ψ, with the interval width behaving as n −1/2 . On the other hand, partially identified models typically have the feature that the posterior distribution on the target of inference does not shrink to a point-mass as the sample size grows. Rather, for a fixed underlying value of θ, the distribution of (ψ|D n ) will converge to a non-degenerate distribution as n increases. Commensurately, I (α) π (D n ) will converge to a non-degenerate set as n → ∞. Emphasizing that this limit depends on the underlying value of θ which spawns the data sequence D n , we denote the limiting credible set as J (α) π (θ). Whereas model identification plus weak regularity conditions imply that Bayesian credible intervals have approximately correct frequentist coverage for large n, this cannot apply in the partially identified case. A non-degenerate J (α) π (θ) either contains ψ = g(θ) or it doesn't. Consequently, for a given θ value the large-n limit of the frequentist coverage of the Bayesian interval estimator is either 0% or 100%. On the other hand, from (2.1) it follows that This is simply the limiting version of (2.1). Of course now the event in question involves parameters only, so that the probability is with respect to the prior distribution on θ. We can interpret (2.2) as saying that the subset of the parameter space for which the limiting frequentist coverage of the Bayesian interval is zero must have probability mass α with respect to the prior π.
π (θ)} be the 'bad' subset of the parameter space on which the limiting credible interval fails to cover its target. That is, for given prior π and choice of α, B (α) π is the set of all θ values such that if data were generated under θ, then the credible interval, in the large-sample limit, would fail to contain its target. Equivalently, this is the subset of parameter values under which the limiting frequentist coverage of the credible interval is zero. As stated above, it is immediate from (2.2) that B (α) π is small in the sense of having probability α under the joint prior π across the parameter space. What this paper investigates, however, is where B (α) π lies in the parameter space. A first intuition is that the bad set will lie in the tails of the prior distribution, i.e., failure-to-cover will arise when the true parameter values are not very compatible with the prior distribution. A second thought might be that compatibility of prior and true values may be more important for parameters that aren't well informed by the data. The aim of this paper is to make these ideas more precise.
As emphasized by a number of authors (Barankin [1], Kadane [14], Dawid [4], Poirier [19], Gustafson [7]), the posterior structure in many partially identified models can be laid bare upon reparameterizing from θ to (φ, λ) such that (i) D n ⊥ λ|φ, and (ii) (D n |φ) comprises a 'regular' model admitting root-n consistent estimation of φ. Barankin [1] calls φ a sufficient parameter, and emphasizes the need to choose a 'minimal sufficient' φ, i.e., so that the distribution of the data is not completely determined by a non-invertible function of φ. Gustafson [7] calls (φ, λ) a transparent parameterization. Informally, the key feature is that φ appears in the likelihood function, while λ does not. The limiting posterior distribution of φ decomposes into a point-mass at the true value of φ combined with the prior conditional distribution of (λ|φ). One can then determine the induced limiting posterior distribution on ψ = g(θ(φ)). When useful we will use π * () to distinguish the prior density for (φ, λ) from the prior density π() for θ, i.e., The limiting posterior density for λ is immediately obtained, up to a constant of proportionality, by viewing (2.3) as a function of λ with φ fixed at the true value. As a motivating example in which a natural initial parameterization is already transparent, say that interest is focussed on λ = E(Y ), where λ ∈ (0, 2). However, due to interval censoring, Y L rather than Y is observed, where Y L ≤ Y ≤ Y L + 1, with probability one. Moreover, say the distribution of Y L is known up to its mean φ = E(Y L ), and it is also known a priori that 0 < φ < 1. Clearly θ = (φ, λ) comprises a transparent parameterization, with ψ = g(θ) = λ being the inferential target. An identification region of the form λ ∈ (φ, φ + 1) results. Moreover, say any marginal prior with full support π(φ) is chosen, along with the conditional prior (λ|φ) ∼ U nif (φ, φ + 1). Then the limit of the posterior distribution on the interest parameter λ is a uniform distribution on (φ 0 , φ 0 +1), where φ 0 is the true value of φ. Immediately then B parameter space on which λ ∈ (φ, φ + α/2) or λ ∈ (φ + 1 − α/2, φ + 1). In the case of α = 0.2, this is displayed graphically in the left panel of Figure 1.
In Section 3 we pursue transparent parameterizations as a route to determining limiting credible intervals, and hence the bad set B Theorem 2.1. Let h(θ) be any identified quantity, i.e., with respect to a transparent parameterization, h(θ(φ, λ)) does not vary with λ. Then, for any value of h 0 in the support of the prior induced on h(θ), we have That is, with respect to the conditional prior distribution of (θ|h(θ) = h 0 ), the failure-to-cover subset of the parameter space has probability α.
Proof. For a given h 0 , let B (α) 0 be the set of θ values for which the limiting 1 − α credible interval arising from the prior π(θ|h(θ) = h 0 ) fails to cover the target. Now note that for any θ satisfying h π . This happens because for data generation under such a θ, the same limiting credible interval for ψ arises under π(θ) as under π(θ|h(θ) = h 0 ). That is, since asymptotically the data already reveal the value of φ, correct a priori conditioning on some function of φ will have no effect whatsoever in the large-n limit. Thus as desired, where the second equality follows directly from (2.2) applied with the prior π(θ|h(θ) = h 0 ).
The theorem says that the performance of the limiting credible interval cannot be driven exclusively by the compatibility of identified parameters and their priors. Regardless of whether h 0 is an a priori likely or unlikely value of h(θ), where size is taken as probability under the prior distribution of {θ|h(θ) = h 0 }. A consequence of this, crudely stated, is that coverage can never be guaranteed to occur solely because an identified quantity happens to lie near its prior mode, nor can failure-to-cover be guaranteed just because an identified quantity lies in the extreme tails of its prior. This provides a sense in which the values of unidentified quantities must play a role in determining whether coverage occurs.

Interval censoring, continued
Consider the interval censoring problem described in the previous section. We have already seen the form of the failure-to-cover set in this problem, for a prior under which φ has full support while (λ|φ) is uniformly distributed. We now examine the failure-to-cover set under a different prior specification, namely a uniform marginal prior for λ and a uniform conditional prior for (φ|λ). More specifically, λ ∼ Unif(0, 2), (φ|λ) ∼ Unif[max(λ − 1, 0), min(λ, 1)], results in a joint prior with full support on the (φ, λ) parameter space. We then have the limiting posterior distribution governed by the prior conditional density Since this is readily integrated analytically, the α/2 and 1 − α/2 quantiles of π(λ|φ) are easily obtained for a given φ, and they constitute the limit of the 1 − α equal-tailed credible interval. Values of (φ, λ) for which λ falls outside the interval constitute B (α) π . This 'bad set,' as illustrated in the right panel of Figure 1, is seen to differ from that arising from the earlier choice of prior. Particularly the top (bottom) failure band widens (narrows) as φ increases. We also note that the Figure is consistent with Theorem 2.1, with different φ values giving rise to the same chance of failure to cover.

Prevalence estimation with nonignorable missingness
As a second example where the failure-to-cover set is readily characterized, consider estimating the prevalence of a binary trait in the face of nonignorable missingness. We seek to infer p = P r(X = 1) based on iid observations of (R, XR), i.e., R is a binary variable with R = 0 indicating missingness of X. The unknown parameters are θ = (p, q 0 , q 1 ), where q i = P r(R = 1|X = i), for i = 0, 1, while ψ = g(θ) = p is the inferential target. A transparent parameterization is obtained by taking φ = (s, t), λ = p, where s = P r(R = 1) = (1 − p)q 0 + pq 1 in Example 2. For select values of the target prevalence p, the values of (q 0 , q 1 ) for which failure-to-cover occurs are shaded. and t = P r(X = 1|R = 1) = pq 1 /{(1 − p)q 0 + pq 1 }. Immediately we have π * (p|s, t) ∝ π(θ(p, s, t)) For the sake of a very tractable illustration, say that the prior π on θ is chosen such that (p, q 0 , q 1 ) are mutually independent, with p ∼ Beta(2, 2) and q i ∼ U nif (0, 1). This can be interpreted as a 'flat' prior on the parameters governing the (R, X) association, along with a prior on the target prevalence which slightly downweights lower/higher values. Under this specification, (3.1) specializes to a limiting posterior distribution on p which is uniform on the identification interval (st, st + (1 − s)). Thus for an equal-tailed 1 − α credible interval, failure-to-cover occurs if p < st+(α/2)(1−s) or p > st+(1−α/2)(1−s). Reexpressed in terms of the original parameterization θ, we have For fixed p then, the failure-to-cover region is described by linear boundaries in the (q 0 , q 1 ) plane. This is depicted in Figure 2 for a few values of p. We see that an extreme value of p can lead to failure-to-cover for a 'majority' of q values. For a mid-range value of p, however, either q 0 or q 1 must be near one for failure to arise, with the requirement for nearness weaker if the other q is close to zero. For this problem then, we have a fairly simple and full understanding of the way in which θ = (p, q 0 , q 1 ) must be extreme in order for the credible interval to miss the target. This example also lends itself to investigation of how the failure-to-cover set B (α) π depends on the choice of prior π. Say we retain uniform prior distributions on q 0 , q 1 but generalize from p ∼ Beta(2, 2) to p ∼ Beta(a, b), so that the limiting posterior distribution on p follows a Beta(a − 1, b − 1) distribution truncated to the identification interval. Then where F () is the Beta(a − 1, b − 1) distribution function. For instance, say that background knowledge speaks to high prevalence being unlikely, so that investigators settle on hyperparameters (a, b) = (2,8). Figure 3 indicates that the resulting failure-to-cover set is quite different than for (a, b) = (2, 2), with highly nonlinear boundaries. There is a temptation to make direct comparisons between Figures 2 and 3. For instance, one notes that the intersection of B (0.1) π with {p = 0.4} is smaller under the first prior than under the second, whereas the reverse is true when we intersect with {p = 0.15} instead. However, since such discrepancies must indeed 'even out,' in the sense that (2.2) applies to both priors, such observations do not seem particularly useful.

Imperfect compliance in a randomized trial
Next we consider a version of the imperfect compliance model with binary variables considered by various authors, including Chickering and Pearl [3], Imbens and Rubin [13], Pearl [18,Ch. 8], and Richardson, Evans and Robins [20]. Clinical trial subjects are randomly sampled from a population comprised of nevertakers, always-takers, and compliers, in unknown proportions ω N T , ω AT , and ω CO = 1 − ω N T − ω AT respectively. Each subject is randomly assigned to either control or treatment. As the labels suggest, never-takers will not take treatment regardless of their assignment, always-takers will take treatment regardless of their assignment, and compliers will follow their assignment. We exclude the possibility of defiers in the population, though the general version of the problem allows for them. Assume that a subject's binary response is Y 0 if treatment is not taken, and Y 1 if treatment is taken, regardless of treatment assignment. Then a subject's outcome is Y = (1 − X)Y 0 + XY 1 , where X indicates reception of treatment, whereas Z indicates assignment to treatment. For compliance type C ∈ {N T, AT, CO}, let γ C,i by the mean of Y i amongst the sub-population of that type. The observed data consist of (Z, X, Y ) for sampled subjects. Assuming that Z is based only on randomization (i.e., is independent of Y 0 , Y 1 and compliance type), the observed distribution of (X, Y |Z), in terms of parameters θ = (ω N T , ω AT , γ N T,0 , γ N T,1 , γ AT,0 , γ AT,1 , γ CO,0 , γ CO,1 ), is characterized by: pr(X = 0|Z = 1) = ω N T pr(X = 0, Y = 1|Z = 1) = ω N T γ N T,0 pr(X = 1, Y = 1|Z = 1) = ω CO γ CO,1 + ω AT γ AT,1 .
Particularly, it is easy to verify that the map from φ to the (X, Y |Z) distribution is invertible. Thus a first thought is that the important consideration for interval coverage might be compatibility between the prior and the true values for (γ N T,1 , γ AT,0 ), since these parameters are absent from the likelihood function. This is backed up with the intuition that these are precisely the counterfactual quantities that are inherently uninformed by the data, i.e., the mean response for never-takers if they take, and the mean response for always-takers if they don't take. Consider taking the prior π(θ) to be a uniform distribution, i.e., a Dirichlet(1,1,1) prior for ω = (ω N T , ω AT , ω CO ) and a uniform prior on (0, 1) 6 for the elements of γ, with a priori independence between ω and γ. Now, say the target of inference ψ = g(θ) is the (global) average causal effect (ACE), given as: which depends on components of φ and components of λ.
Gustafson [10] shows that for n independent and identically distributed realizations of (Y, X, Z), as n → ∞, the posterior distribution of ψ converges to a symmetric, trapezoidal density. Particularly, let The support of the limiting trapezoidal density is a(φ) ± b(φ), while the 'top' of the trapezoid extends along a(φ) ± b * (φ), and consequently the height of the Since the limiting posterior is symmetric, equal-tailed and HPD credible intervals agree (with HPD suitably interpreted in light of the 'flat-topped' density). Thus the limiting level 1 − α credible interval for the target will have the form with v(r) = |1−r|/(1+r). Note that the limiting interval is narrowest relative to the support of the limiting posterior when ω AT = ω N T , with the limiting density becoming triangular and k α (1) = 1 − √ α. The interval is widest when ω AT = 0 or ω N T = 0, with the limiting density becoming rectangular and k α (r) → 1 − α as r → 0 or r → ∞.
To gain some intuition, we simulate draws from the joint prior density π(θ) and ascertain which realizations fall in B (α) π , for α = 0.05. We know from (2.2) that the proportion of draws falling in this 'bad' set is α, but the results in Figure 4 show that the location of B (α) π in the parameter space is not understood trivially. First, membership in B (α) π is not driven exclusively by the values of λ = (γ AT,0 , γ N T,1 ). The plots show that a necessary, but not sufficient, condition for failure-to-cover is that these two parameters take on extreme values in opposite directions. That is, 'failures' are found to have λ values in the bottom-right or upper-left corners of the unit-square. However, 'successes' are found in these regions as well, with no smooth boundary in λ-space to separate B (α) π from its complement.
The other feature evident from Figure 4 is that failure-to-cover is not strongly driven by the value of the target. The prior distribution of (ψ|θ ∈ B (α) π ) is more dispersed than that of (ψ|θ / ∈ B (α) π ), but not to a great extent. Put more succinctly, many θ values for which failure occurs have ψ = g(θ) values near the middle of the prior for ψ, and many of the θ values yielding ψ values in the tails of the prior correspond to successful coverage. Thus there is not a direct explanation for failure-to-cover in terms of the unidentified parameters alone, nor in terms of the target parameter alone. To explore in more depth, from the forms of the ACE and the limiting credible interval, it is apparent that failure to cover occurs when This underscores that failure is not determined exclusively by the values of nonidentified parameters λ = (γ N T,1 , γ AT,0 ), since the failure region depends on identified parameters via the ratio r = ω AT /ω N T . Particularly, the failure region intersected with a fixed value of r ∈ (0, 1) corresponds to equal-sized triangles in the lower-right and upper-left corners of the unit-square describing (γ AT,0 , γ N T,1 ) ∈ [0, 1] 2 . More particularly, these triangles are formed via the parallel lines of slope r, passing through the unit square. While the size of the triangles could be deduced from (3.2), direct appeal to Theorem 2.1 immediately gives the areas as α/2 each. Figure 5 gives a pictorial representation of how the failure-to-cover region for λ varies with r = ω AT /ω N T . In the r = 0 limit the region devolves to rectangles defined by γ N T,1 < α/2 and γ N T,1 > 1 − α/2. This is intuitively sensible: when there are fewer (or no) always-takers, success/failure of the interval will be determined more (or exclusively) by the value of γ N T,1 . Congruently, γ AT,0 plays a larger role as r becomes large. This behavior explains the lack of separation between B (α) π and its complement in terms of λ alone, i.e., how it is possible to have successes lying 'south-east' of failures in the top right panel of Figure 4.
Roughly put, the situation with this example can be summarized as follows. Failure-to-cover arises when the nonidentified parameters lie in 'a tail' of their prior distribution, but 'which tail' depends on a particular aspect of the identified parameters. This interaction between nonidentified and identified components is more complicated than simply seeing failure-to-cover when the value of the inferential target is extreme compared to its prior. However the interaction is quite understandable, in terms of which nonidentified parameter is a bigger determinant of failure-to-cover when the relevant identified quantity is smaller or larger.

Example: Prevalence of misclassified trait
Say that X is a binary trait, with interest lying in its population prevalence p = P r(X = 1). However, the observable binary variable isX, which is subject to misclassification, i.e., if we letp = P r(X = 1), thenp = rγ N + (1 − r)(1 − γ P ), where γ N = P r(X = 1|X = 1) and γ P = P r(X = 0|X = 0) are the sensitivity and specificity of the classification scheme respectively. Thus the original scientific parameterization is θ = (p, γ N , γ P ) and the inferential target is ψ = g(θ) = p.
Consider the situation where the investigator commits to lower bounds on sensitivity and specificity but applies uniform prior distributions above the bounds and also applies a uniform prior to the target parameter p. That is Clearly in the large-sample limit of observing independent and identically distributed realizations ofX, conditioning on the observed data is equivalent to conditioning on the true value ofp. More particularly, a transparent parameterization is obtained by taking φ =p and λ = (γ N , γ P ), and the large-sample limit of the posterior distribution over the target p is identically the prior conditional distribution of (p|p) (where the conditioning is on the true value ofp). This distribution is given in Gustafson [8] as having support where a * = max{a,p}, b * = max{b, 1 −p}. The interval (3.3) is then the identification region for the target parameter in this partially identified model. Moreover, the conditional prior density (and equivalently the limiting posterior density) over this support is given by: Failure to cover in Example 4. The top panels report asymptotic success ('o') or failure ('x') to cover, for a sample of θ values drawn from the unconditional prior π(θ). The remaining panels report asymptotic success/failure to cover for a sample of θ values obtained via draws from the conditional prior π * (γ N , γ P |p), forp = 0.05 (second row),p = 0.15 (third row), andp = 0.25 (fourth row). In each case, the right-hand plot is an enlargement of the shaded region in the left-hand plot, along with an addition of more sampled points.
Hasselt [2] give related expressions for limiting posterior distributions in this problem under different choices of prior distributions.
The form of (3.4) is such that limiting credible intervals are readily computed, but closed-form expressions for their endpoints would be cumbersome. We focus on 90% HPD credible sets. In the case of hyperparameters (a, b) = (0.7, 0.85), we draw values of θ = (p, γ N , γ P ) from the joint prior distribution, and check membership in B (α) π . The top panels of Figure 7 show that failure-to-cover can occur when one of sensitivity and specificity is large but the other is small. Echoing findings in the previous example, however, failure-to-cover is not completely determined by the extremity of λ = (γ N , γ P ), i.e., B (α) π is not determined by a smooth boundary in λ-space.
Again we appeal to Theorem 2.1. Drawing values of θ by sampling from π * (γ N , γ P |p =p 0 ) for selected values ofp 0 , we examine the intersection of Coverage is given as a function of n for the five selected values of θ described in the text. The coverage decreases as we move from values (a) through (e), i.e., further toward the corner of the prior support for (γ N , γ P ). At each selected value of n and θ, coverage is approximated via 4000 realized data values, implying a Monte Carlo simulation standard error for coverage evaluation of 0.008 or less. Note the use of a logarithmic axis for n.

B
(α) π with {θ :p(θ) =p 0 }. Theorem 2.1 guarantees this intersection to have probability α with respect to the conditional prior, while Figure 7 illustrates that this set is determined by a smooth boundary in λ-space. Moreover, this boundary is seen to vary withp 0 . Thus again we see that failure-to-cover arises when λ lies in a tail of the prior distribution, but which tail depends on an identified quantity. As an aside, Figure 7 also illustrates the 'indirect learning' that can arise because the support of π * (γ N , γ P |p =p 0 ) can, for some values of p 0 , be smaller than the support of π(γ N , γ P ).
For each of the five θ values the frequentist coverage is plotted against sample size n in Figure 8. As is consistent with theory, the coverage is seen to tend to zero or one as n increases. Particularly, the limit is one for the less extreme cases (a) through (c), and zero for the more extreme cases (d) and (e). In these latter instances convergence is slow in practical terms.

Discussion
All four of our examples possess parameters which intuitively aren't informed by the data. In the first example this is the location of the target mean inside the interval induced by censoring. In the second example these are the probabilities of missingness for the two levels of the outcome. In the third example these are the counterfactual average responses of never-takers if they take treatment and always-takers if they do not. In the fourth example these are the sensitivity and specificity of the classification scheme. Intuition suggests we can only do sensitivity analysis with respect to such parameters, and indeed applying prior distributions to them and proceeding with Bayesian inference is often regarded as a probabilistic form of sensitivity analysis (see, for instance, Greenland [5,6]). Thus we might only expect a 'wrong' answer to arise if the true values of the uninformed-by-data parameters are extreme with respect to the chosen priors. We have shown this to be correct in the main, but with some devil lurking in the details. Notably, what constitutes extreme can vary with identified parameters. While Theorem 2.1 precludes the size of the extreme set varying with an identified quantity, our examples illustrate that the location of the extreme set can vary with an identified quantity.
It should also be noted that failure-to-cover is inevitable, and cannot be made to go away via a particular choice of prior. As emphasized by Moon and Schorfheide [17], the limits of credible sets must lie inside the identification region, hence failure-to-cover can arise. Of course, this is not a bad thing, in that 1 − α credible intervals are supposed to miss their target sometimes (unless α = 0). The difficulty in thinking about this stems from the fact that in the identified case, 'sometimes' refers to some data realizations. In the nonidentified case, however, 'sometimes' refers to some values of the parameters.
where the expectations are with respect to the prior distribution of (γ N , γ P ). Thus a Monte Carlo sample from this distribution can be used to numerically approximate both expectations in (4.2). Moreover, since this takes the form of a ratio estimator, a Monte Carlo standard error is readily obtained to quantify the approximation error. The results appearing in Figure 8 use 20, 000 realizations of (γ N , γ P ).
Thus for a given θ 0 value and given sample size n, the frequentist coverage of the 1 − α equal-tailed credible interval is obtained by repeatedly simulating a value Y = y and reporting the proportion of times for which pr(p < p 0 |Y = y) lies between α/2 and 1−α/2. Note that for a given y this only requires numerical evaluation of (4.2) at a single value of p 0 , whereas checking coverage for an HPD interval would require evaluation over a fine grid of values.