Optimal Robustness Results for Relative Belief Inferences and the Relationship to Prior-Data Conﬂict

. The robustness to the prior of Bayesian inference procedures based on a measure of statistical evidence is considered. These inferences are shown to have optimal properties with respect to robustness. Furthermore, a connection between robustness and prior-data conﬂict is established. In particular, the inferences are shown to be eﬀectively robust when the choice of prior does not lead to prior-data conﬂict. When there is prior-data conﬂict, however, robustness may fail to hold.


Introduction
Robustness to the choice of the prior is an issue of considerable importance in a Bayesian statistical analysis. If an inference is very sensitive to the choice of the prior, then this could be viewed as either a negative for the inference method being used or for the choice of prior. In this paper it is shown that certain inferences are in a sense optimally robust to the choice of the prior. Furthermore, when the sensitivity of the inferences to the prior is measured quantitatively, it is shown that there is an intimate connection between the effective robustness of the inferences and whether or not there is prior-data conflict. So by choice of the inferential methodology and the avoidance of prior-data conflict, robustness of the inferences to the choice of prior is achieved. It is to be noted that, while the results derived here are for specific Bayesian inferences, the optimality of these inferences with respect to robustness implies that the effect of prior-data conflict applies to all Bayesian inferences.
The basic ingredients for a statistical analysis are taken here to be the data x, a statistical model {f θ : θ ∈ Θ}, where each f θ is a probability density with respect to volume measure μ on the sample space X , and a proper prior density π with respect to volume measure ν on Θ. Note that volume measure on a discrete set is taken to be counting measure. Furthermore, suppose that interest is in making inferences about the quantity ψ = Ψ(θ) where Ψ : Θ → Ψ is onto and we don't distinguish between the function and its range to save notation. Throughout the paper the spaces Θ and Ψ which has some use in the developments here. Note that there is no need to coordinatize nuisance parameters as λ = Λ(θ), since all nuisance parameters are integrated out via π(· | ψ) and, in general, there need not exist a complementing function Λ such that (Ψ, Λ) is 1-1.
The approach taken here is to study robustness to the prior for relative belief inferences for ψ rather than all possible inferences. Relative belief inferences are based on the relative belief ratio defined as when this limit exists for a sequence of neighborhoods N δ (ψ) of ψ converging nicely to ψ (see Rudin (1974)) for the definition of "converging nicely". Whenever there exist versions of the densities π Ψ and π Ψ (· | x), of the probability measures Π Ψ and Π Ψ (· | x) respectively, taken with respect to ν Ψ , that are continuous at ψ with π Ψ (ψ) > 0, then (2) exists and is given by RB Ψ (ψ | x) = π Ψ (ψ | x)/π Ψ (ψ). A similar statement for the densities of the probability measures M and M (· | ψ) with respect to μ establishes that the relative belief ratio of x, having observed ψ, is given by m(x | ψ)/m(x) and, when both conditions hold, then (1) obtains. Since RB Ψ (ψ | x) measures the change in belief that ψ is the true value, it is a measure of evidence. Here RB Ψ (ψ | x) > 1 means that there is evidence in favor of ψ being the true value, as belief in ψ has increased after seeing the data, and RB Ψ (ψ | x) < 1 means that there is evidence against ψ being the true value, as belief in ψ has decreased after seeing the data. Section 2 provides some more details concerning relative belief inferences for both estimation and hypothesis assessment but also see Baskurt and Evans (2013) and Evans (2015). The relevant mathematical details for the formulas provided for π Ψ and π Ψ (· | x) can be found in texts that deal with geometric measure theory such as Federer (1969), Hirsch (1976), Krantz and Parks (2008) and Tjur (1974).
Results in Section 3 establish that relative belief inferences have optimal robustness properties when the marginal prior for ψ is allowed to vary over all possibilities in the class of -contaminated priors. This generalizes results found in Wasserman (1989), Ruggeri and Wasserman (1993) and de la Horra and Fernandez (1994). Furthermore, an ambiguity concerning the interpretation of the results is resolved. As such this provides further justifications for these inferences.
While inferences may be optimally robust, this does not imply that they are in fact robust. In Section 4 quantitative measures of the sensitivity of relative belief inferences to both the marginal prior of ψ and the conditional prior for θ given Ψ(θ) = ψ are derived. In Section 5 it is shown that these inferences are indeed robust when the base prior π does not suffer from prior-data conflict. This adds weight to arguments concerning the importance of checking for prior-data conflict before reporting inferences, as prior-data conflict can imply sensitivity of the inferences to the choice of the prior. Prior-data conflict is interpreted as the true value lying in the tails of the prior and consistent methods have been developed for assessing this in Evans and Moshonov (2006) and Evans and Jang (2011b). Methodology for modifying a prior when priordata conflict is encountered, through the selection of a prior weakly informative with respect to the base prior, is developed in Evans and Jang (2011c). Other approaches to identifying and dealing with prior-data conflict can be found in O'Hagan (2003), Marshall and Spiegelhalter (2007), Dahl et al. (2007), Scheel et al. (2011), Presanis et al. (2013. Robustness to the prior has been considered by many authors and there are a number of different approaches. Many discussions are concerned with determining the range of values that some characteristic of interest takes when the prior is allowed to vary over some class. Berger (1990), Berger (1994) contain broad reviews of work on this topic and Rios Insua and Ruggeri (2000) is a collection of papers by key contributors. Sivaganesan et al. (1993) develop γ-credible regions for θ that have posterior content at least γ for every prior in an -contaminated class and have smallest Euclidean volume amongst all such regions and so are optimal. By contrast, as developed in , the relative belief γ-credible region for a general ψ minimizes the prior content, with respect to the base prior, among all γ-credible region for ψ and these regions are shown here to also possess optimal robustness properties. Dey and Birmiwall (1994) consider global robustness measures based upon measures of distance from the posterior distribution. While the concern here is with robustness to the prior, it is also relevant to be concerned with robustness to the likelihood. Given the similarities between likelihood inferences and relative belief inferences, see Evans (2015), the robustness results in Royall and Tsou (2003) suggest similar properties will hold but this topic is not pursued further here. It is to be emphasized that the discussion here is only concerned with proper priors and it is unclear how this relates to the situation where improper priors are used.
Robustness to the prior is studied here via the Gâteaux derivative. Undoubtedly the Fréchet derivative is a more reasonable choice but it comes with some computational costs and so we have used the simpler directional derivative here. Robustness to the prior, as measured by the Fréchet derivative, will be pursued in another study.

Relative Belief Inferences
When RB Ψ (ψ | x) > 1 this is the factor by which prior belief in the truth of ψ has increased after seeing the data. Clearly the bigger RB Ψ (ψ | x) is, the more evidence there is in favor of ψ while, when RB Ψ (ψ | x) < 1, the smaller RB Ψ (ψ | x) is, the more evidence there is against ψ. This leads to a total preference ordering on Ψ, namely, ψ 1 is not preferred to ψ 2 whenever RB Ψ (ψ 1 | x) ≤ RB Ψ (ψ 2 | x) since there is at least as much evidence for ψ 2 as there is for ψ 1 . This in turn leads to unambiguous solutions to inference problems.
The best estimate of ψ is the value for which the evidence is greatest, namely, for every γ ∈ [0, 1] and so, for selected γ, the size of C Ψ,γ (x) can be taken as a measure of the accuracy of the estimate ψ(x). The interpretation of RB Ψ (ψ | x) as the evidence for ψ, forces the use of the sets C Ψ,γ (x) for our credible regions. For if ψ 1 is in such a region and RB Ψ (ψ 2 | x) ≥ RB Ψ (ψ 1 | x), then ψ 2 must be in the region as well as there is at least as much evidence for ψ 2 as for ψ 1 . Optimal properties for relative belief credible regions, in the class of all credible regions, have been established in  and Evans and Shakhatreh (2008). As previously mentioned, a relative belief γ-credible region for a general ψ minimizes the prior content and always has posterior content greater than or equal to its prior content. The latter generalizes a result of Piccinato (1984) to any ψ. Optimal properties for the estimator ψ(x) are established in Evans and Jang (2011a).
For the assessment of the hypothesis H 0 : Ψ(θ) = ψ 0 , the evidence is given by RB Ψ (ψ 0 | x). One problem that both the relative belief ratio and the Bayes factor share as measures of evidence, is that it is not clear how they should be calibrated. Certainly the bigger RB Ψ (ψ 0 | x) is than 1, the more evidence there is in favor of ψ 0 while the smaller RB Ψ (ψ 0 | x) is than 1, the more evidence we have against ψ 0 . But what exactly does a value of RB Ψ (ψ 0 | x) = 20 mean? It would appear to be strong evidence in favor of ψ 0 because beliefs have increased by a factor of 20 after seeing the data. But what if other values of ψ had even larger increases? For example, the discussion in Baskurt and Evans (2013) of the Jeffreys-Lindley paradox makes it clear that the value of a relative belief ratio or a Bayes factor cannot always be interpreted as an indication of the strength of the evidence.
The value RB Ψ (ψ 0 | x) can be calibrated by comparing it to the other possible values RB Ψ (· | x) through its posterior distribution. For example, one possible measure of the strength is which is the posterior probability that the true value of ψ has a relative belief ratio no greater than that of the hypothesized value ψ 0 . While (3) may look like a p-value, it has a very different interpretation. For when RB Ψ (ψ 0 | x) < 1, so there is evidence against ψ 0 , then a small value for (3) indicates a large posterior probability that the true value has a relative belief ratio greater than RB Ψ (ψ 0 | x) and so there is strong evidence against ψ 0 . If RB Ψ (ψ 0 | x) > 1, so there is evidence in favor of ψ 0 , then a large value for (3) indicates a small posterior probability that the true value has a relative belief ratio greater than RB Ψ (ψ 0 | x) and so there is strong evidence in favor of ψ 0 . Notice that, in the set {ψ : RB Ψ (ψ | x) ≤ RB Ψ (ψ 0 | x)}, the "best" estimate of the true value is given by ψ 0 simply because the evidence for this value is the largest in this set.
Various results have been established in Baskurt and Evans (2013) supporting both RB Ψ (ψ 0 | x), as the measure of the evidence for H 0 , and (3) as a measure of the strength of that evidence. For example, the following simple inequalities are useful in assessing the strength of the evidence, namely, can be a more appropriate measure of strength.
It is worth noting that formally RB Ψ (· | x) is an integrated likelihood. The interpretation, however, is quite different as the value RB Ψ (ψ | x) is a measure of the evidence that ψ is the true value and this is not the case for an integrated likelihood which can be multiplied by any positive constant. In effect likelihoods of any variety only give relative measures of evidence when comparing two values. The importance of this distinction is readily seen by noting the difference in the interpretation when RB Ψ (ψ | x) < 1 and when RB Ψ (ψ | x) > 1. By contrast an integrated likelihood does not provide a clear distinction between having evidence for or against a specific value and this has numerous consequences for the theory of inference. For example, based on a measure of evidence it is possible to consider the bias a prior induces into an analysis as discussed in Baskurt and Evans (2013). Also, the profile likelihood is commonly used for inferences about ψ although it is not a likelihood in general. Piccinato (1984) and Liseo (1996) establish that Bayesian results concerning likelihood regions for θ can be easily generalized to profile likelihood regions for ψ, see Corollary 3. Our concern here, however, is with regions determined using the marginal prior and posterior distributions of ψ and, as noted these can be considered as based on the integrated likelihood. Further properties of inferences based on the integrated likelihood are discussed in Berger et al. (1999).

Optimal Robustness with Respect to the Marginal Prior
When interest is in making inferences about ψ = Ψ(θ), it is reasonable to ask how sensitive the relief belief approach is to the ingredients given by the prior. This entails exam- are to changes in the prior, as these four objects represent the essential relative belief inferences.
The full prior π for θ can always be factored as π(θ) = π Ψ (ψ)π(θ | ψ). In contrast to other discussions of robustness with respect to the prior, the sensitivity of the inferences to π Ψ and the sensitivity of the inferences to π(· | ψ) are considered separately, as this leads to more information concerning where the lack of robustness arises when this occurs. (x). From this it is immediate that ψ(x) = arg sup ψ RB Ψ (ψ | x) = arg sup ψ m(x | ψ) and so the relative belief estimate is optimally robust to π Ψ as the estimate has no dependence on the marginal prior. Furthermore, C Ψ,γ (x) is of the form {ψ : m(x | ψ) ≥ k} for some k and so the form of relative belief regions for ψ is optimally robust to π Ψ . The specific region chosen for the assessment of the accuracy of ψ(x) depends on the posterior and so is not independent of π Ψ . It is now proved that C Ψ,γ (x) has an optimal robustness property among all credible regions for ψ.
Consider -contaminated priors for θ of the form where Q is a probability measure on Ψ and Π is the base prior as described in the Introduction. Note that the conditional prior of θ given Ψ(θ) = ψ is fixed and independent of .
To assess the robustness of the posterior content of a set A ⊂ Ψ, it makes sense to look at δ

) and the supremum/infimum is taken over all probability measures on Ψ. For this let
and always one and only one of r(A), r(A c ) equals r(Ψ).
The following known result is needed. Lemma 1. (Huber (1973)) Let Q denote a probability measure on Ψ. For prior measure be the exact posterior content of the γ-relative belief region. The following result generalizes results found in Wasserman (1989) and de la Horra and Fernandez (1994) who considered robustness to the prior of credible regions for the full parameter θ. In particular, this result applies to arbitrary parameters ψ = Ψ(θ) and does not require continuity.

This establishes that (5) is minimized by
and r(A c ) = r(Ψ). By part (i) this is minimized by taking A c = C Ψ,1−γ * (x) (x) and the result is proved.
(iii) The solutions to the optimization problems in parts (i) and (ii), namely, C Ψ,γ (x) and C c Ψ,1−γ (x) respectively, both have posterior content equal to γ. As such one of these sets is the solution to the optimization problem stated in (iii). We have that .

The result follows from this because
It is interesting to consider the statistical meaning of the separate parts of Proposition 2 as the statements create a degree of ambiguity. If a system of credible regions is being used, say B Ψ,γ (x), then it makes sense to require that these sets are monotonically increasing in γ and the smallest set lim γ 0 B Ψ,γ (x) contains a single point which is taken as the estimate of ψ. The size of B Ψ,γ (x), for some specific γ, can then be taken as an assessment of the accuracy of the estimate where size is measured in some application dependent way. The relative belief regions satisfy this and the estimate, under the assumption of a unique maximizer of RB Ψ (· | x), is ψ(x). So effectively (i) is saying that C Ψ,γ (x) is optimally robust with respect to posterior content. Note that we have to exclude sets A with Π Ψ (A | x) > γ * (x) because, for example, the set A = Ψ is always optimally robust with respect to content but does not provide a meaningful assessment of the accuracy of the estimate. Given that ψ(x) and the form of C Ψ,γ (x) are optimally robust, this further supports the claim that relative belief estimation is optimally robust to the choice of the marginal prior. Note that the sets in (ii) do not satisfy the stated criteria for being a system of credible regions.
Part (iii) indicates that, when there are many sets with posterior content exactly equal to γ, and this is typically true in the continuous case, then C Ψ,γ (x) is optimally robust among these sets with respect to content. It makes sense to require γ * (x) ≥ 1/2 for any credible region as, if γ * (x) < 1/2, then there is more belief that the true value is in C c Ψ,γ (x) than in C Ψ,γ (x). It is also interesting to note an immediate consequence of Proposition 2 that is similar to a result in Liseo (1996) developed there in the context of a discussion of likelihood regions. Let C γ (x) be a relative belief credible region for θ and note that this is a likelihood region. Put

Corollary 3. The following hold, (i) among all sets
So profile likelihood regions also have optimal robustness properties. The difference between the two results is that profile likelihood regions and relative belief regions can be quite different. For example, a γ-profile likelihood region need not contain the relative belief estimate ψ(x) and conversely, as the following example demonstrates.

Example 1. Profile and integrated likelihood regions differ substantially.
Consider the model-prior combination given in Table 1 with sample where m δ (1) = 4/15 + 3δ/10 and these posterior probabilities converge to 0, 3/8 and 5/8 respectively as δ → 0. So the limiting posterior probabilities for ψ are π Ψ (0 | 1) = 5/8, π Ψ (1 | 1) = 3/8. The relative belief ratio for ψ is given by (1). From this ψ(1) = 1 if and only if δ ∈ [2/9, 1/2]. In this case, when δ is small and γ = 3/8, a γ-credible region based on the profile likelihood does not contain the relative belief estimate ψ(1) = 0 while the corresponding relative belief region does not contain the profile likelihood estimate. The essential difference here is that the relative belief estimate has the interpretation that it is the value of ψ for which the data have led to the greatest increase in belief and no such interpretation is available for the profile likelihood estimate.
Applying Lemma 1 gives and this can be close to 1 when RB Ψ (ψ(x) | x) is large. So, while C Ψ,γ (x) possesses an optimal robustness property with respect to posterior content, this does not imply that the posterior content is necessarily robust. This depends on other aspects of the particular problem which will be discussed.

Measuring Robustness Quantitatively
To measure the robustness of an inference to the prior π, when using the -contaminated class, it is natural to look at Gâteaux derivatives of the relevant quantity at π in various directions Q. The derivative is a measure of the sensitivity of the inference to small changes in the prior and so is local in nature. When the derivative is large for some Q, the inference is highly sensitive to the prior chosen and naturally this is viewed negatively. In this section this behavior of relative belief inferences is analyzed separately for -contaminated classes for the marginal π Ψ and the conditional π(· | ψ).

Sensitivity to the Marginal Prior
Consider the family of priors given by (4) but now restricted to those Q that are also absolutely continuous with respect to ν Ψ on Ψ and let q denote the density of Q. The posterior of ψ based on the contaminated prior is Π , .
The following result gives the Gâteaux derivative of the relative belief ratio.
The value of (7) can be large simply because RB Ψ (ψ | x) is large, so it makes more sense to look at the relative change. Therefore, for small , The Gâteaux derivative of the strength of the evidence is now computed.
. So, using (6), This implies that So the strength is robust to choice of the marginal prior π Ψ whenever m Q (x)/m(x) is small.
For both the measure of evidence RB Ψ (ψ 0 | x) and its strength, the ratio m Q (x)/m(x) plays a key role in determining the robustness. The implications of this are discussed in Section 5. Note that sup Q m Q (x)/m(x) = RB Ψ (ψ(x) | x) gives the worst case behavior of this ratio.
It is of interest to contrast these results with those for the commonly used MAP inferences which are based on the posterior density π Ψ (· | x). Proposition 6. The Gâteaux derivative of the posterior density of ψ in the direction Q at ψ 0 is given by Note that MAP-based inferences implicitly use π Ψ (ψ 0 | x) as a measure of the evidence that ψ 0 is the true value. Comparing this with the relative belief ratio we see that for small , and the relative change in π Ψ (ψ 0 | x) is dependent on the ratio of the posteriors as well as m Q (x)/m(x). So if π Ψ (ψ 0 | x) is small relative to q(ψ 0 | x) we will get a big relative change and this suggests that MAP inferences are much less robust than relative belief inferences. A similar result is obtained for the Bayesian p-value in Evans and Zou (2001). Consider also the following example.

Example 2. The MAP and relative belief estimates contrasted.
Suppose x is a sample from a Bernoulli(θ), θ ∼ beta(α, β) with α ≥ 1, β ≥ 1 and ψ = Ψ(θ) = θ p for some p ≤ 1. In this case ψ(x) =x p and the MAP estimate is given by ψ MAP (x) = (α − p + nx) p /(n + α + β − p − 1) p . Note that the relative belief estimate of ψ is just the appropriate transform of the relative belief estimate of θ. On the other hand the MAP estimate of θ is (α − 1 + nx)/(n + α + β − 2) and ψ MAP (x) is not the p-th power of this. It is the case, however, that the two estimates are essentially equivalent whenever n is large enough. How large n has to be, however, depends on α, β, p and the data. For example, when n = 1000, α = 1, β = 1, p = 0.1 andx = 0, then ψ(x) = 0 but ψ MAP (x) = 0.496 even though it would be natural to be quite certain that ψ ≈ 0 in such a case.
It is clear that ψ MAP (x) is much less robust to the prior than ψ(x). For example, as α → ∞, then ψ MAP (x) → 1, for any value ofx. So if the prior were chosen to reflect virtual certainty that θ = 1 by choosing α very large, then ψ MAP (x) will reflect this even when this is contradicted by the datax. This issue is discussed more fully in Section 5 where it is seen that prior-data conflict can play a key role in determining robustness to the prior.

Sensitivity to the Conditional Prior
Consider now priors for θ of the form Π is a probability measure on Ψ −1 {ψ} absolutely continuous with respect to ν Ψ −1 {ψ} with density q(· | ψ), for each ψ ∈ Ψ. So the marginal prior of ψ is now fixed and the conditional prior of θ is perturbed. The posterior of ψ based on this prior is Π , The relative belief ratio for ψ based on Π equals RB , . This leads to the following result.

Proposition 7. The Gâteaux derivative of RB
The implications of this result for robustness are discussed in Section 5.
Now consider the robustness of the strength of the evidence.
Proposition 8. If RB Ψ (· | x) has a discrete distribution with support containing no limit points, the Gâteaux derivative of then, for all > 0 such that x ≤ 1, and for all < 0, When RB Ψ (· | x) has a discrete distribution with support containing no limit points, then the lower and upper bounds equal for all small enough and the result follows. When RB Ψ (· | x) has a continuous distribution with density g(· | x), then From this it is seen that in the discrete case the strength is insensitive to local changes in the prior.
Consider the continuous case. When there is strong evidence either for or against ψ 0 , then RB Ψ (ψ 0 | x) will be in the right or left tail correspondingly of the posterior distribution of RB Ψ (· | x) and so g(RB Ψ (ψ 0 | x) | x) will tend to be small. As such the strength will be robust to small changes in the prior provided m Q (x)/m(x) is not large. When there is not strong evidence however, then g(RB Ψ (ψ 0 | x) | x) could be large and, if m Q (x)/m(x) is not small, then the strength is not robust. This underscores a recommendation in Baskurt and Evans (2013) that in the continuous case the parameter be discretized when assessing the evidence and its strength. For this, when ψ is realvalued, let δ > 0 be the difference between two ψ values that is deemed to be of practical importance. The prior and posterior distributions of ψ discretized to the intervals [ψ 0 + (2i−1)δ/2, ψ 0 +(2i+1)δ/2) for i ∈ Z are then used to assess the hypothesis corresponds to the interval [ψ 0 − δ/2, ψ 0 + δ/2). By Proposition 8 the strength is then insensitive to small changes in the prior.
It is perhaps not surprising that the robustness behavior of the relative belief ratio and its strength is more complicated when considering the effect of the conditional prior than with the marginal prior. The optimality results concerning robustness to the marginal prior underscore this.

Robustness and Prior-Data Conflict
The existence of a prior-data conflict means that the data support certain values of ψ = Ψ(θ) being the true value but the prior places little or no mass there. While various measures can be used to determine whether or not such a conflict has occurred, a logical approach is based on the factorization of the joint probability measure for (θ, x) given by Π × P θ = Π(· | T ) × M T × P (· | T ), where T is a minimal sufficient statistic, Π(· | T ) is the posterior probability measure for θ, M T is the prior predictive probability measure of T and P (· | T ) is the conditional probability measure of the data given T . The measure P (· | T ) is then available for computing probabilities relevant to checking the model {f θ : θ ∈ Θ}, the measure M T is available for computing probabilities relevant to checking the prior and Π(· | T ) is the relevant probability measure for computing probabilities for θ. A statistical analysis then proceeds by checking the model, perhaps via a tail probability based on a discrepancy statistic, and then proceeding to check the prior, if the data does not contradict the model. If both the model and prior are not contradicted by the data, then we can proceed to inference about θ. The logic behind this sequence lies in part with the fact that it makes little sense to check a prior if the model fails. Furthermore, separating the check of the prior from that of the model provides more information in the event of a conflict arising, as it is then possible to identify where the failure lies, namely, with the model or with the prior.
In Evans and Moshonov (2006) this factorization was adhered to and the tail probability was advocated for checking the prior where m T is the density of M T with respect to some support measure. So if (8) is small, then the observed value T (x) of the minimal sufficient statistic lies in the tails of M T and there is an indication of a prior-data conflict. In Evans and Jang (2011b) the validity of this approach was firmly established by the proof that (8) converges to Π(π(θ) ≤ π(θ true )) under i.i.d. sampling and some additional weak conditions. Furthermore, it was shown how to modify (8) so as to achieve invariance under choice of the minimal sufficient statistic. Also, Evans and Moshonov (2006) argued that (8) should be replaced by M T (m T (t) ≤ m T (T (x)) | U (T (x))) for any maximal ancillary U (T ) as the variation in T due to U (T ) has nothing to do with θ and so reflects nothing about the prior. The tail probability (8) is a check on the full prior and Evans and Moshonov (2006) also developed methods for checking factors of the prior so a failure in the prior could be isolated to a particular aspect.
First, however, consider the case when Ψ(θ) = θ and interest is in the robustness of inferences to the whole prior. From the results in Section 4.1, it is seen that the ratio m Q (x)/m(x) = m Q,T (T (x))/m T (T (x)), where m Q,T (T (x)) = Θ f θ,T (T (x)) Q(dθ), plays a key role in determining the local sensitivity in the direction given by Q, of the inferences for given observed data x. This depends on Q and the worst case is given by and note that θ(x) is the maximum likelihood estimator (MLE) in this case as well as the relative belief estimate. Notice that when (8) is small, so there is an indication of a prior-data conflict existing, then m T (T (x)) is relatively small when compared to other values of m T (t) which are not influenced by the data. This implies that the prior is having a big influence relative to the data and so a lack of robustness can be expected.
This phenomenon is well-illustrated in the following examples where ancillaries play no role because of Basu's theorem.

Example 3. Location normal model.
Suppose that x = (x 1 , . . . , x n ) is a sample from the N (μ, 1) distribution with μ ∼ N (μ 0 , σ 2 0 ). Then M T is given by T (x) =x ∼ N (μ 0 , 1/n + σ 2 0 ). When Q is the N (μ 1 , σ 2 1 ) distribution, then M Q,T is given byx ∼ N (μ 1 , 1/n + σ 2 1 ). This implies that    and, as a function of (μ 1 , σ 2 1 ) this is maximized when μ 1 =x, σ 2 1 = 0. Notice that this supremum converges to ∞ asx → ±∞ and such values correspond to prior-data conflict with respect to the N (μ 0 , σ 2 0 ) prior. Now, consider a numerical example. A sample of size n = 20 was generated from the N (0, 1) distribution obtainingx = 0.2591. When the base prior is N (0.5, 1) then (8) equals 0.8141 and accordingly there is no indication of any prior-data conflict. Also, sup Q (m Q (x)/m(x)) = 4.7109 which seems modest as it describes the worst case robustness behavior. In Table 2 some values of m Q (x)/m(x) are recorded when Q is a N (μ 1 , σ 2 1 ) distribution for various values of μ 1 and σ 2 1 as these might be expected to be realistic directions in which to perturb the base prior. In all cases the value of m Q (x)/m(x) is quite modest and the maximum value of (9) is 1.0534. Overall it can be concluded here that the analysis is robust to local perturbations of the prior. Now consider an example where there is prior-data conflict. In this case a sample of n = 20 is generated from a N (4, 1) distribution obtainingx = 4.0867 and the same base prior is used. The value of (8) is 0.0005 and so there is a strong indication of prior-data conflict. Furthermore, sup Q (m Q (x)/m(x)) = 2096.85 which certainly indicates a lack of robustness. In Table 3 some values of m Q (x)/m(x) are recorded when Q is a N (μ 1 , σ 2 1 ) distribution for various values of μ 1 and σ 2 1 . It is seen that the value of m Q (x)/m(x) can be relatively large and the maximum value of (9) is 468.86. So it can be concluded that the analysis based on the model, prior and observed data, will not be robust to local perturbations of the prior when there is prior-data conflict.

Example 4. Bernoulli model.
Suppose that x = (x 1 , . . . , x n ) is a sample from a Bernoulli(θ) and the prior is θ ∼ beta(α 0 , β 0 ) for some choice of (α 0 , β 0 ). A minimal sufficient statistic is T (x) =  n i=1 x i ∼ binomial(n, θ) and then To illustrate the relationship between prior-data conflict and robustness, consider a numerical example. Suppose that α 0 = 5 and β 0 = 20. Generating a sample of size n = 20 from the Bernoulli(0.25) gave the value nx = 3. In this case (8) equals 0.7100 and there is no indication of any prior-data conflict. Also, sup Q (m Q (x)/m(x)) = 1.4211 which indicates that the inferences will be generally robust to small deviations. If m Q (x)/m(x) is computed for various Q, where Q is a beta(α 1 , β 1 ), then in all cases it is readily seen that this ratio is quite reasonable in value as indeed it is bounded above by 1.4211.
A sample of n = 20 was also generated from a Bernoulli(0.9) with the same prior being used. In this case nx = 17 and (8) equals 6.2×10 −6 , so there is a strong indication of prior-data conflict. Also, sup Q (m Q (x)/m(x)) = 46396.43 which indicates that the inferences will be generally not be robust to small deviations. Table 4 provides some values of m Q (x)/m(x) for Q given by a beta(α 1 , β 1 ) for various choices of (α 1 , β 1 ) and there are several large values. Now consider the case when θ = (θ 1 , θ 2 ) ∈ Θ 1 × Θ 2 so the prior factors as π(θ) = π 2 (θ 2 | θ 1 )π 1 (θ 1 ). Presumably the conditional prior π 2 (· | θ 1 ) and the marginal prior π 1 are elicited and the goal is inference about some ψ = Ψ(θ). It is then preferable to check the prior by checking each individual component for prior-data conflict as this leads to more information about where a conflict exists when it does.
In general, it is not clear how to check the individual components but in certain contexts a particular structure holds that allows for this. Suppose that all ancillaries are independent of the minimal sufficient statistic and so can be ignored. The more general situation is covered in Evans and Moshonov (2006).
As discussed in Evans and Moshonov (2006), suppose there is a statistic V (T ) such that the marginal distribution of V (T ) is dependent only on θ 1 . Such a statistic is referred to as being ancillary for θ 2 given θ 1 . Naturally we want V (T ) to be a maximal ancillary for θ 2 given θ 1 . An appropriate tail probability for checking π 1 is then given by as M V (T ) does not depend on π 2 (· | θ 1 ). A natural order is to check π 1 first and then check π 2 (· | θ 1 ) for prior-data conflict, whenever no prior-data conflict is found for π 1 . The appropriate tail probability for checking π 2 (· | θ 1 ) is given by Note that this is assessing whether or not π 2 (· | θ 1 ) is a suitable prior for θ 2 among those θ 1 values deemed to be suitable according to the prior π 1 . If (11) were to be used before (10), then it would not be possible to assess if a failure was due to where π 1 was placing the bulk of its mass or was caused by where the conditional priors were placing their mass. Notice that so prior-data conflict with either π 1 or π 2 (· | θ 1 ) could lead to large values of the ratio on the left for certain choices of Q. When only the conditional prior of θ 2 given θ 1 is perturbed, then m Q,V (T ) (V (T (x))) = m V (T ) (V (T (x))).
Letting f θ1,V denote the density of V, then where RB 1 (· | V (T (x))) gives the relative belief ratios for θ 1 based on having observed V (T (x)). The right-hand side gives the worst-case behavior of the second factor in (12). Now consider the robustness of relative belief inferences for a general ψ = Ψ(θ). The following result generalizes Propositions 4 and 7 as we consider a general perturbation to the prior, namely, Π = (1− )Π + Q and the proof is the same as that of Proposition 7.
The factor RB Q,Ψ (ψ | x)−RB Ψ (ψ | x) can be big simply because we choose a prior Q that is very different than Π. For example, RB Ψ (ψ | x) may be big (small) because there is considerable evidence in favor of (against) ψ being the true value and we can choose a prior Q that doesn't (does) place mass near ψ. As such, it makes sense to standardize the derivative by dividing by this factor and this leaves the robustness determined again by m Q (x)/m(x).
Suppose now that Q and Π have the same marginal for ζ = Ξ(θ). Then, and the right-hand side gives the worst-case behavior of the first factor in (12) when Ξ(θ) = θ 1 , which is related to prior-data conflict with the prior on θ 2 .
The following is a standard example where priors are specified hierarchically.
x > 0 and 0 otherwise). Then T (x) = (x, ||x −x1|| 2 ) is a minimal sufficient statistic for the model. Note that the prior is chosen by eliciting values for μ 0 , τ 2 0 , α 0 , β 0 and so there is interest in how sensitive inferences are to perturbations in each component separately. The posterior distribution of (μ, σ 2 ) is given by . Consider first inferences for ψ = Ψ(θ) = σ 2 and note that V (T (x)) = ||x − x1|| 2 is ancillary given ψ and its distribution depends on ψ. Therefore, the prior on σ 2 is checked first using the prior predictive for V (T (x)). An easy calculation gives that the prior distribution of s 2 = V (T (x))/(n − 1) is (β 0 /α 0 )F (n − 1, 2α 0 ) and this specifies (10). While the results of Section 4.1 apply here, consider the behavior of the relative belief ratio RB 1 (σ 2 | V (T (x))) which is based on only observing V (T (x)) rather than T (x). By Proposition 4 this has Gâteaux derivative depending on m Q,V (T ) (V (T (x)))/m V (T ) (V (T (x))). Notice, however, that relative belief ratios accumulate evidence in a simple way. For any statistic V (T (x)), then where the first factor gives the evidence obtained after observing V (T (x)) and the second factor gives the evidence obtained after observing T (x) having already observed ))] with the same interpretation for the factors. As such, a lack of robustness of RB 1 (σ 2 | V (T (x))), which can be connected to prior-data conflict through (10), implies a lack of robustness for RB 1 (σ 2 | x).
When no prior-data conflict is obtained for the prior on σ 2 , then it makes sense to look for prior-data conflict with the prior on μ which is typically the parameter of primary interest. So now consider perturbations to the prior on μ and the relationship to prior-data conflict with this prior. The conditional distribution of T (x) given V (T (x)) is given by the conditional prior predictive ofx given s 2 which is distributed as μ 0 +  Table 5: The ratio m Q,V (T ) (V (T (x)))/m V (T ) (V (T (x))) in Example 5 when there is no conflict with the prior on σ 2 .
A sample of size n = 20 was generated from the N (0, 1) distribution obtaininḡ x = −0.1066, s 2 = 0.9087. So there should be no prior-data conflict with the prior on σ 2 . Indeed, (10) equals 0.7626 so there is no indication of any problems with the prior on σ 2 . Values of m Q,V (T ) (V (T (x)))/m V (T ) (V (T (x))) are recorded in Table 5 when the marginal prior on σ 2 is perturbed by a gamma rate (α 1 , β 1 ) distribution for various values of α 1 and β 1 . In all cases, the ratio is small and indicates robustness to local perturbations of the prior on σ 2 . Note that the worst case behavior, over all possible directions, is given by the maximized relative belief ratio for σ 2 based on V (T (x)) which occurs at σ 2 = s 2 and equals In this case RB 1 (s 2 | V (T (x))) = 1.7479.
Next a sample of size n = 20 from the N (0, 25) was generated obtainingx = 0.0950, s 2 = 23.9593. So there is clearly prior-data conflict with the prior on σ 2 . This is reflected in the value of (10) which equals 0.64 × 10 −5 . Table 6 shows that there is a serious lack of robustness. The worst case behavior is given by RB 1 (s 2 | V (T (x))) = 40484.68.  Table 6: The ratio m Q,V (T ) (V (T (x)))/m V (T ) (V (T (x))) in Example 5 when there is conflict with the prior on σ 2 .  Table 7: The ratio m Q,V (T ) (V (T (x)))/m V (T ) (V (T (x))) in Example 5 when there is conflict with the prior on μ but not with the prior on σ 2 .
It is also relevant to consider what happens concerning the robustness of inferences about σ 2 when there is prior-data conflict with the prior on μ but not with the prior on σ 2 . A sample of n = 20 was generated from the N (10, 1) distribution obtaininḡ x = 9.7041, s 2 = 1.0082, so there is clearly prior-data conflict with the prior on μ but not with the prior on σ 2 . The value of (10) equals 0.6460 which gives no reason to doubt the relevance of the prior on σ 2 . Table 7 shows that m Q,V (T ) (V (T (x)))/m V (T ) ((T (x))) is small and indicates robustness to local perturbations of the prior on σ 2 . The worst case behavior is given by RB 1 (s 2 | V (T (x))) = 1.7218. This reinforces the claim that the tail probabilities (10) and (11) are measuring different aspects of the data conflicting with the prior. Now consider perturbations to the prior on μ with the prior on σ 2 fixed. A sample of n = 20 was generated from a N (0, 1) obtainingx = −0.1066, s 2 = 0.9087 so there is clearly no prior-data conflict with either component. This is reflected in the value of (11) which equals 0.9150. Table 8 shows that the first factor m Q,T (T (x) | V (T (x)))/ m T (T (x) | V (T (x))) in (12) is small when the conditional prior on μ is perturbed by N (μ 1 , τ 2 1 ) priors and thus demonstrates robustness to perturbations in these directions. The worst case behavior is given by  Table 9: The ratio m Q,T (T (x) | V (T (x)))/m T (T (x) | V (T (x))) in Example 5 when there is conflict with the prior on σ 2 but not with the prior on μ.  The ratio m Q,T (T (x) | V (T (x)))/m T (T (x) | V (T (x))) in Example 5 when there is no conflict with the prior on σ 2 but there is with the prior on μ. Table 9 gives some values of m Q,T (T (x) | V (T (x)))/m T (T (x) | V (T (x))) when a sample of n = 20 was generated from a N (0, 25), obtainingx = 0.0950, s 2 = 23.9593. So in this case there is prior-data conflict with the prior on σ 2 but not with the prior on μ. The value of (11) equals 0.9150 which gives no indication of prior-data conflict with the prior on μ. The tabulated values also indicate no serious robustness concerns as does ∞ 0 RB((x, σ 2 ) | x) Π 1 (dσ −2 ) = 4.5838. This also reinforces the claim that the tail probabilities (10) and (11) are measuring different aspects of the data conflicting with the prior. Table 10 gives some values of m Q,T (T (x) | V (T (x)))/m T (T (x) | V (T (x))) when a sample of n = 20 was generated from a N (10, 1) obtainingx = 9.7941, s 2 = 1.0082. So in this case there is prior-data conflict with the prior on μ but not with the prior on σ 2 . The value of (11) equals 0.1691×10 −9 which gives a clear indication of prior-data conflict with the prior on μ. In this case the tabulated values indicate a clear lack of robustness with respect to the prior on μ. Also, ∞ 0 RB((x, σ 2 ) | x) Π 1 (dσ −2 ) = 8,046,933,962 indicates that the worst case behavior with respect to robustness is terrible.
Note that RB(θ(x) | x) and the similar ratios discussed in this section, do not measure robustness directly through the values they assume. Rather, the results of this section show that, when prior-data conflict exists, these ratios tend to be larger than when there is no prior-data conflict. Since these values influence relevant derivatives, this indicates an increased sensitivity to the choice of the prior when such a conflict arises. As such these ratios are not to be interpreted as measures of surprise as discussed in Section 4.7.2 of Berger (1985).

Conclusions
Several optimal robustness results have been derived here for relative belief inferences. These results provide support for these inferences for estimation and hypothesis assess-ment. Even though relative belief inferences have optimal robustness properties with respect to choice of prior, this does not guarantee that they are robust in practice. The issue of practical robustness in a given problem is seen to be connected with whether or not there is prior-data conflict. With no prior-data conflict the inferences are robust to small changes in the prior, at least in the sense measured here. This adds support to the point-of-view that checking for prior-data conflict is an essential aspect of good statistical practice.
It is interesting that the worst case behavior of the measure of sensitivity is associated with the maximized value of a relative belief ratio. The actual maximum value attained is meaningless, however, as there is no way to calibrate this as opposed to calibrating the relative belief ratio at a fixed value via the strength. The relative belief estimate is consistent, however, and the relative belief ratio at this value will, at least in the continuous case, converge to infinity. So large values would seem to be associated with high evidence in favor. What has been shown here is that large values can be associated with prior-data conflict and a lack of robustness rather than providing high evidence. When prior-data conflict is encountered the prior can be modified, following Evans and Jang (2011c), to avoid this. While objections can be raised to taking such a step, it seems necessary if we want to report a valid characterization of the evidence obtained.