Measuring local sensitivity in Bayesian inference using a new class of metrics

Abstract The local sensitivity analysis is recognized for its computational simplicity, and potential use in multi-dimensional and complex problems. Unfortunately, its major drawback is its asymptotic behavior where the prior to posterior convergence in terms of the standard metrics (and also computed by Fréchet derivative) used as a local sensitivity measure is not appropriate. The constructed local sensitivity measures do not converge to zero, and even diverge for the most multidimensional classes of prior distributions. Restricting the classes of priors or using other -divergence metrics have been proposed as the ways to resolve this issue which were not successful. We overcome this issue, by proposing a new flexible class of metrics so-called credible metrics whose asymptotic behavior is far more promising and no restrictions are required to impose. Using these metrics, the stability of Bayesian inference to the structure of the prior distribution will be then investigated. Under appropriate condition, we present a uniform bound in a sense that a close credible metric a priori will give a close credible metric a posteriori. As a result, we do not get the sort of divergence based on other metrics. We finally show that the posterior predictive distributions are more stable and robust.


Introduction
Robust Bayesian analysis is the study of sensitivity of Bayesian answers to uncertain inputs such as sampling model, prior distribution, or loss function, or any combination of them. There are several reasons to examine the robustness of Bayesian answers to the above misspecification: foundational motivation, practical Bayesian motivation, and acceptance of Bayesian analysis (Smith 2010;Wasserman 1992;Insua et al. 2016).
Sensitivity analysis can be divided into two broad categories, global and local sensitivity. The common approach to assessing sensitivity is to measure the size of the class of posteriors (or perhaps just a particular posterior quantity) that arises from a specified class of priors. This is referred to as global sensitivity analysis (Daneshkhah and Bedford 2008). The global sensitivity analysis does not rely on perturbation lying within a given parametric family (Smith and Daneshkhah 2010). Alternatively, an appropriate divergence measure is applied to first specify a neighborhood system around each model. The bounds are then computed for the maximum deviation in the inference that could be obtained by a model in this neighborhood. If this deviation is small then the model is considered to be robust (Gustafson and Wasserman 1995;Smith and Rigat 2012).
The fact that global analyses often entail a large and complex computational problem (Daneshkhah, Hosseinian-Far, and Chatrabgoun 2017) has led to the local sensitivity analyses, originally introduced by Gustafson and Wasserman (1995), and developed further by Gustafson et al. (1996). The idea of a local analysis is to examine the rate at which the posterior changes, relative to the prior. In local sensitivity analysis, a chosen base prior distribution is perturbed using a finite parametrized modification. Hence, measures which are 'functionally close' to the chosen/elicited prior are considered and the behavior of the posterior functional forms, under infinitesimal departures from the prior, are studied.
The local sensitivity analysis is recognized for its computational simplicity and its potential use in multi-dimensional and similar complex problems where global robustness investigation may be difficult (Daneshkhah 2004). The major drawback of this approach is about the asymptotic behavior. It is reasonable that in most cases the influence of prior distribution on the posterior quantities becomes less important as the sample size tends to infinity. In this article, we study asymptotic behavior of the local sensitivity methods which measure the effect of infinitesimal perturbations of the prior on the posterior quantities (Basu 2000;Gustafson and Wasserman 1995). We assume x n ¼ ðx 1 , :::, x n Þ, n ! 1 is a random sample with observed sample densities pðx n jhÞ, where h ¼ ðh 1 , :::, h k Þ: Let P be the set of all probability measures on the parameter space, H, and given a prior distribution pðhÞ, we denote pðh j xÞ as the corresponding posterior distribution. We denote T : P ! T some quantity of interest. For instance, the whole posterior distribution can be derived by taking TðpÞ ¼ pðh j xÞ and T ¼ P: We denote the predictive distribution of a new observation x Ã n by pðx Ã n jx n Þ ¼ ð pðx Ã n jh, x n Þpðhjx n Þdh: Gustafson and Wasserman (1995) define the local sensitivity of a prior p in the direction of another prior w as where w is the perturbed prior distribution, and d 1 and d 2 denote total variation distance unless otherwise stated. We denote the overall sensitivity by sðp, C; x n Þ ¼ sup w2C sðp, w; x n Þ, for some class of priors C & P: It was shown that under mild regularity conditions, sðp, PÞ (sðp, CÞ for many classes of C) increases at rate n k 2 (Gustafson and Wasserman 1995). Therefore, if we use this quantity as a diagnostic measure, the posterior distribution becomes increasingly sensitive to the chosen prior distribution as the sample size becomes very large. This is because, P consists of the unreasonable prior distributions (e.g., priors which put all the mass at one point, or priors that have very noisy behavior at their tails). Restricting the class of priors to a subset C R of P was then proposed by Gustafson et al. (1996). However, they showed that the mentioned issue still remains as long as p is an interior point of C R with respect to the density ratio metric. This is a very severe constraint on any prior family, but despite this, the type of divergence discussed above will still occur under this prior family constraint.
The parametric prior distributions as the restricted class was another solution to tackle the aforementioned issue proposed in Gustafson and Wasserman (1995). The corresponding Bayesian inference under any prior in this class is still rather unsatisfactory. Since, the local sensitivity measure will then depend on a prior lying in a particular parametric family which should be avoided. A similar asymptotic behavior will be also observed even if other /-divergence distances or the geometric perturbation are used for d 1 and d 2 given in Eq. (1).
In this paper, we examine prior to posterior convergence and the sensitivity issues mentioned above in terms of a new class of metrics called credible metric. Section 2 is dedicated to introduce this metric which its asymptotic behavior is more desirable, and if it is used as a local sensitivity measure, does not require us to restrict ourselves to a particular class of priors. We also present some preliminary definitions, theorems and lemmas which will be used to study the asymptotic behavior of this metric in Sec. 3. The predictive performance of this metric is investigated in Sec. 4. We show the computed metric in terms of the posterior predictive distributions would be more stable and robust compared to the metric derived in terms of the posterior distributions.

A new class of metrics
In this section, we introduce a new class of metrics called credible metrics. We illustrate that the local sensitivity measures based on these metrics for the posterior distributions are more stable in the sense that they at least do not diverge as we obtain more data. We first present some notations and preliminarily results regarding the total variation distance which are required to introduce our new class of metrics.
We denote the total variation metric on probability distributions ðP, WÞ, defined over a common r-algebra C on a parameter space H, as follows: This metric can be also written in terms of the respective densities of p and w as follows In addition to the above property, this metric is also invariant with respect to transformations ð in the following sense. If ð : h ! h 0 , h 2 H and h 0 2 H 0 , is bijective and measurable and ðPðhÞ, WðhÞÞ and ðPðh 0 Þ, Wðh 0 ÞÞ are probability distributions defined on h and h 0 ¼ ððhÞ, then dðPðhÞ, WðhÞÞ ¼ dðPðh 0 Þ, Wðh 0 ÞÞ It can be also shown that for a fixed known family of sample distributions, the total variation distance between two predictive distributions is no larger than the distance between their prior distributions as discussed in (Daneshkhah 2004). In other words, dðPðXÞ, WðXÞÞ dðPðhÞ, WðhÞÞ where PðXÞ and WðXÞ are probability distributions associated with the following density functions Despite all these nice properties, using the following example we verify the results reported by Gustafson and Wasserman (1995) that this distance cannot converge as n ! 1: Example 2.1. Suppose X 1 , X 2 , :::, X n is a random sample from a standard normal distribution, Nðh, 1Þ: Let us define S i ðnÞ ¼ n À 1 2 P ni j¼nðiÀ1Þþ1 X j : It can be easily concluded that for two different prior densities p j ðhÞ, j ¼ 1, 2, and 8n > 0, p j ðhjX ðnÞ ¼ x ðnÞ Þ ¼ p j ðhjS 1 ðnÞ ¼ s 1 ðnÞÞ, where x ðnÞ ¼ fx 1 , x 2 , :::, x n g: We also have that p j ðs 2 ðnÞjh 0 Þp j ðh 0 js 1 ðnÞÞdh 0 : Since p j ðs 2 ðnÞjh 0 Þ¼p j ðs 2 ð1ÞjhÞ$Nðh 0 ,1Þ, then p j ðs 2 ðnÞjx ðnÞ Þ¼ Ð h 0 p j ðs 2 ð1Þjh 0 Þp j ðh 0 js 1 ðnÞÞdh 0 , where h 0 ¼n 1 2 h: Thus, in a sense, the problem of predictive densities does not appear to depend on the number of observations, n. In particular, dðp 1 ðs 2 ðnÞjx ðnÞ Þ,p 2 ðs 2 ðnÞjx ðnÞ ÞÞ does not depend on n. It can then be concluded that for all n>0, dðp 1 ðhjx ðnÞ Þ,p 2 ðhjx ðnÞ ÞÞ!dðp 1 ðs 2 ðnÞjx ðnÞ Þ,p 2 ðs 2 ðnÞjx ðnÞ ÞÞ¼dðp 1 ðx 2 jx 1 Þ,p 2 ðx 2 jx 1 ÞÞ: This looks counter-intuitive, since we know whatever the prior for h, in this circumstance, given x ðnÞ , n À 1 2 ðh À xÞ tends to the standard normal density, the posterior densities will be close to one another, spiking near x: On the other hand, we know that the variation metric is scale invariant, and we need to see the difference between the posterior densities appropriately magnified up onto the region to which h converges. However, if we have some way of fixing the scale of the deviation, then this is not so. For example, dðp 1 ðx nþ1 jx ðnÞ Þ, p 2 ðx nþ1 jx ðnÞ ÞÞ ¼ where Bð x, dÞ is an open ball with its center at x and diameter d, lðBð x, dÞÞ is its dominating measure, and gðdÞ ¼ Ð h6 2Bð x, dÞ p 1 ðhjx ðnÞ Þdh ¼ Ð h6 2Bð x, dÞ p 2 ðhjx ðnÞ Þdh: Therefore, one step ahead, prediction of the next observation certainly converges. These predictions will be stable if prediction about h is stable. So stability in terms of this (and in many other metrics) is consistent with ideas about Bayesian Sufficiency. Using posterior predictive distributions or making inference in terms of Bayesian predictive measures have been supported by several researchers (Cowell 1996;Smith 2010).
Before introducing the credible metric and study its asymptotic behavior, we need to present some further notations and definitions.
Let PjA and QjA denote respectively the conditional probability distributions associated with P and Q given an event A 2 C, P(A) > 0 and define where C is a common rÀ algebra on the parameter (or sample) space H. Note that this is a pseudometric (i.e., all the metric axioms hold other than d A½p ðP, Finally denote by PðPÞ þ the set of all probability measures with the same support as P.
Lemma 2.2. If A is P-conditioning then d A ð:, :Þ is a metric on PðPÞ þ : Proof. Let P, Q, R 2 PðPÞ þ : Since dð:, :Þ is a metric on PðPÞ þ , we can then conclude that P 6 ¼ Q and d A ðP, QÞ ! dðP, QÞ > 0: Furthermore, since d A½P ðP, QÞ is a pseudometric for all A½P 2 A, we have both Finally, again since d A½P ðP, QÞ is a pseudometric for all A½P 2 A, Note that this result does not rely on dð:, :Þ being the variation metric. In particular, it works with the Hellinger metric as well.
w To clarify the nature of this new class of metric, we make a few remarks in the lemmas below.
Lemma 2.3. If P is discrete and A contains all two point sets {i, j}, then the d A ðP, QÞ neighborhoods of P are contained in DeRobertis density ratio spheres Proof. Suppose, without loss of generality that Clearly d fi, jg ðP, QÞ is increasing in c ! 1: Thus, for discrete variables, the topology defined by such a metric is at least as refined as topology defined by density ratio spheres. Now assume that A ¼ C: Note that, for any set A 2 C which implies 2d A ðP, QÞ exp f2sg À 1: Thus we have proved the following lemma.
Lemma 2.4. Suppose probability measures P and Q have respective densities p and q with respect to the same dominating measures, and are strictly positive on their shared support. Then if, for all > 0, there exist (small) values of sðÞ > 0, and if h 2 A, A 2 C, j log pðhÞ À log qðhÞj s then d A ðP, QÞ : It is clear therefore that although these metrics are much fiercer than the variation metric, the open sets around P are rich, provided that A does not contain sets which are too improbable. The following lemma presents a partial converse of this result.
Lemma 2.5. Suppose probability measures P and Q (P 6 ¼ Q) have respective continuous densities p and q with respect to the same dominating measure, non zero on their shared support. For all s > 0, write A L ðsÞ ¼ fh : log pðhÞ À log qðhÞ Àsg, A M ðsÞ ¼ fh : j log pðhÞ À log qðhÞj < sg: Suppose there exists a value of g > 0 such that, for all s < g, minfPðA U ðsÞÞ, PðA L ðsÞÞg > 0: Then, for all > 0, there exists a value s > 0 and a set CðsÞ & H, P(C) > 0, d C ðP, QÞ ! ð1 À e Às Þ: Proof. First note that ðpðhÞ À qðhÞÞdh ! ð1 À e Às ÞPðA U Þ, and jpðhÞ À qðhÞjdh s: Therefore, f8s, minflðA U ðsÞÞ, lðA L ðsÞÞg ¼ 0, then P ¼ Q is in contradiction to our hypothesis. Hence, provided s is small enough, say d < g, we have minflðA U ðsÞÞ, lðA L ðsÞÞg > 0, which, since p is strictly positive in turn implies minfPðA U ðsÞÞ, PðA L ðsÞÞg > 0: It follows from the above that both A U and A L are such that PðA U ðsÞÞ À QðA U ðsÞÞ > 0 and PðA L ðsÞÞ À QðA L ðsÞÞ > 0: As a result, when PðA U Þ À QðA U Þ ! QðA L Þ À PðA L Þ, one can then choose any subset B U of A U such that PðB U Þ À QðB U Þ ¼ QðA L Þ À PðA L Þ: This is clearly possible if P and Q are continuous. On the other hand if PðA L Þ À QðA L Þ ! QðA U Þ À PðA U Þ, then one can choose any subset B L of A L such that PðB L Þ À QðB L Þ ¼ QðA U Þ À PðA U Þ: Therefore, under the conditions given above, we can construct subsets B U , B L such that B U ¼ fh : log pðhÞ À log qðhÞ ! sg and B L ¼ fh : log pðhÞ À log qðhÞ Àsg, which implies d C ðP, QÞ ! ð1 À e Às Þ as required.
w It should be noted the metric developed above when A ¼ C is not really new, and it essentially demands that the log-densities of two distributions are close everywhere. Furthermore, this is not that practical, because it demands proportionate closeness in the tails of the density and it would be unrealistic to expect such levels of subjective certainty on the sets with very small probability. In the following lemma, we demonstrate that the sets that have large prior probability do not affect the topology of d A ðP, QÞ: which for fixed values of c is continuous at zero and equal to zero when d ¼ 0.
w It can be concluded that, when the limits are considered, we will gain nothing over the variation metric by including the sets with higher than a threshold probability. It is the distances associated with small sets A which might contribute to something new. However, when we learn through Bayes rule, typically, as our sample increases in size, the posterior densities associated with different priors will tend to concentrate around the same small open balls. It follows that there may be considerable gain by restricting our attention to the whole space together with small open balls. This provokes the following definition.
Definition 2.7. Call d A ðP, QÞ ¼ d ᭝jC ðP, QÞ the ðd, CÞ-credibility metric if where Bðh 0 ; dÞ is a Euclidean open ball with center at h 0 and diameter d, and dð:, :Þ is the total variation metric.
By writing d ᭝ ðP, QÞ ¼ d ᭝jH ðP, QÞ, we shall see that, provided that the space of densities we consider is smooth enough, this metric gives the sort of limiting results we require. Furthermore, the type of smoothness conditions we need to impose, seems relatively benign and plausible from a subject perspective.
Explicitly, we can write d Bðh 0 ;dÞ ðP, QÞ ¼ dðP j Bðh 0 ; dÞ, Q j Bðh 0 ; dÞÞ: We show that, within a set A a sufficiently "small" ball Bðh 0 , dÞ is not active in d A ðP, QÞ, provided the log-densities of P and Q are defined and continuous at h 0 : So, P and Q can be very different in variation metric and still be closed under this conditional metric. All we require is that both are sufficiently smooth.
Proof. The first assertion follows, since for all h 2 Bðh 0 ; dÞ, it can be easily shown that j log pðhÞ À log pðh 0 Þj < x () pðhÞ pðh 0 Þ À 1 < e x À 1: To prove the second assertion, note that and substituting the first result gives e Àx À 1 < ð PðBðh 0 ;dÞÞ pðh 0 ÞlðBðh 0 ;dÞÞ À 1Þ < e x À 1, which rearranges to the given expression. The last two inequalities hold, simply by substituting q for p. w One immediate consequence of these inequalities is that they hold if and only if the corresponding conditions hold for the posterior distribution of a shared sampling model, and the log-likelihood is smooth and continuous at h 0 : Therefore, j log pðh j xÞ À log pðh 0 j xÞj ¼ j log pðhÞ þ log pðx j hÞ þ log ð pðhÞpðx j hÞdh À log ð pðhÞpðx j hÞdh À ð log pðh 0 Þ þ log pðx j h 0 ÞÞj j log pðhÞ À log pðh 0 Þj þ j log pðx j hÞ À log pðx j h 0 Þj, so that, for all x 0 > 0, provided d is chosen small enough, for all h 2 Bðh 0 ; dÞ j log pðx j hÞ À log pðx j h 0 Þj < x 0 , and we obtain analogous inequalities for the posterior densities in Bðh 0 ; dÞ: It means that, with a continuity condition on the likelihood, prior closeness with respect to this metric guarantees posterior closeness. We use this fact in the next section.  (11), and d is defined in Corollary 2.10. This is very useful and particularly implies that two strictly positive unimodal bounded prior densities with sub-exponential tails will look locally similar in the sense of this metric. In Sec. 3, we will use this to relate the metric above to well-known results about the robust families of priors. We could also link this to the Gustafson's ideas to restrict the class of prior distributions into a parameterized class of priors (as also closely discussed in Daneshkhah 2004). In a practical setting, it would be challenging to assert the condition of this corollary, which makes strong statements about the tail behavior of a prior density. Fortunately, the required uniform continuity for the convergence of our metric can be obtained, provided closeness for sets Bðh 0 ; dÞ for which pðh 0 Þ ! c > 0: Corollary 2.11. Suppose that P has a continuous bounded density p at all points h 0 , such that pðh 0 Þ ! c p > 0, and all distributions Q have a continuous bounded density q at all points h 0 , such that qðh 0 Þ ! c > 0. Suppose the sets D p ¼ fh 0 : pðh 0 Þ ! c p > 0g and D q ¼ fh 0 : qðh 0 Þ ! c q > 0g are compact. Then, for all > 0 there exists a value of g such that for all sets Bðh 0 : dÞ, h 0 2 D p [ D q , whenever d < g, d Bðh 0 ;dÞ ðP, QÞ < : Proof. The required uniform continuity is immediate from the compactness of the sets and the continuity and boundedness of p and q. Thus, for small open sets in a credibility set, with sufficient smoothness assumptions, we can expect all associated variation distances to be small a priori. Therefore, we may be able to assert densities which are close and do not wobble too much (see Daneshkhah (2004) for more details).

Sensitivity analysis using g-credibility metrics
The usefulness of the credibility metrics arises from the following plausible observation.
Theorem 3.1. Suppose P Ã and Q Ã are the posterior distributions associated with P and Q respectively after we observe that h 2 B 2 A. Then, if A is closed under intersection, Ag A, and because d A ðP Ã , Q Ã Þ ¼ d A ðP j fh 2 Bg, Q j fh 2 BgÞ ¼ d A\B ðP, QÞ, the result is now immediate by definition. w This means that under an extended variation metric, learning about h directly cannot increase neighborhoods: in particular the Fr echet derivative (used as the local sensitivity measure) always reduces as zero-one information about h arrives. This is in strong contrast to the use of the ordinary variation metric for which this is untrue in general (Gustafson and Wasserman 1995). In particular, if our experiment indicates that h 2 Bðh 0 ; dÞ and d ! 0 then, under the conditions of the Corollaries 2.10 and 2.11, the Fr echet derivative does not diverge, and is bounded.
There are more problems here when we learn through a sample distribution. Daneshkhah (2004) reported that prior small credibility closeness gives rise to posterior credibility closeness with a likelihood continuous at all the relevant h 0 : We next show that the variation distance between posterior cannot explode, if we use closed priors that equals with smooth priors.
Theorem 3.2. Suppose for all c > 0 there exists a value ᭝ such that, for all d < ᭝ and Q such that d A ðP, QÞ < g, QðB c Þ < c, where B ¼ [ m i¼1 Bðh 0 i ; dÞ and for all x > 0, and all fi : 1 i mg, j log pðh i Þ À log pðh 0 i Þj < x, and j log pðx j h i Þ À log pðx j h 0 i Þj < x; then for all > 0, dðP j x, Q j xÞ < : Pðh i 2 Bðh 0 i ; dÞ j xÞð exp ð2xÞ À 1Þ: Moreover Similarly, I 3 i Qðh i 2 Bðh 0 i ; dÞÞð exp ð2xÞ À 1Þ: Finally fj1 À exp fð log pðh i Þ À log pðh 0 i ÞÞ À ð log qðh i Þ À log qðh 0 i ÞÞjg ðe 2x À 1Þ Now, note that by hypothesis Thus, I 2 i ðdÞ e 2x fe 2x À 1g: As a result, dðP j x, Q j xÞ ðe 2x À 1ÞfR m i¼1 Pðh i 2 Bðh 0 i ; dÞ j xÞ þ Qðh i 2 Bðh 0 i ; dÞ j xÞ þ e 2x Â Ã g þ 2c ðe 2x À 1Þf2m þ me 2x g þ 2c ¼ : By hypothesis, the function on the right hand side of the inequality above can be made as small as we like by choosing ᭝ small enough when required. Therefore, contrary to the assertion that it is necessary to restrict our class of prior distributions into a parametrized family of distributions, we can work with a general class of priors here. It is just needed to work with an appropriate extended variation metric presented above.
Example 3.3. Let the prior densities pðhÞ and qðhÞ be Beta distributions with the following density functions pðh j aÞ / h a 1 À1 ð1 À hÞ a 2 À1 , qðh j bÞ / h b 1 À1 ð1 À hÞ b 2 À1 , and the corresponding posterior distributions for a sample drawn from a Binomial distribution with size n is given by p n ðh j a, xÞ / h ða 1 þxÞÀ1 ð1 À hÞ ðnþa 2 ÀxÞÀ1 , q n ðh j b, xÞ / h ðb 1 þxÞÀ1 ð1 À hÞ ðnþb 2 ÀxÞÀ1 , where x is the number of successes observed in the sample. The posterior mean and variance of the Beta distribution, p n ðh j a, xÞ are respectively given by h 0 ¼ ða 1 þ xÞ ða 1 þ a 2 Þ þ n , r 2 n ¼ a 1 a 2 ða 1 þ a 2 þ nÞ 2 ða 1 þ a 2 þ n þ 1Þ As discussed above, the conventional local sensitivity measures, as introduced in Basu (2000), Gustafson and Wasserman (1995), and Gomez-Deniz and Calderin-Ojeda (2010), do not converge for the large sample size. For example, the local sensitivity measure, defined in Eq. (1), under mild regularity conditions, increases at rate n k 2 for many classes of prior distributions, where k is the dimension of the parameter space. In this example, we illustrate that the asymptotic behavior of the local sensitivity measure, developed in this paper using the credible metric is more promising, and will converge for the large sample size.

Credible metrics between posterior predictive distributions
Theorem 3.2 can be adapted for posterior predictive distributions. However, working with these distributions is quite difficult due to the complex computation, but by using them, we can avoid of the priors with unstable behaviors (the ones with too much wobble). We present similar results as given in the previous section for posterior predictive distributions. First, we should show that dðpðz j xÞ, qðz j xÞÞ is a lower bound for dðpðh j xÞ, qðh j xÞÞ: That means, dðpðz j xÞ, qðz j xÞÞ dðpðh j xÞ, qðh j xÞÞ, where pðz j xÞ ¼ Ð h pðz j hÞpðh j xÞdh: For this purpose, we use the total variation distance as follows, dðpðz j xÞ, qðz j xÞÞ ¼ 1 2 ð z jpðz j xÞ À qðz j xÞjdz It is trivial to show that (see also the assumptions mentioned in Theorem 4.1) In the following theorem, we will show that as n ! 1 (or equivalently as D ! 0), dðpðz j xÞ, qðz j xÞÞ will then become very small (and bounded).
Theorem 4.1. Suppose the likelihood function pðx j hÞ is bounded by M, and suppose that any prior distribution P has a differentiable log density with derivative D log pðhÞ bounded by N, i.e., there exists N > 0 such that for all h, jD log pðhÞj N, and for any other arbitrary prior distribution, Q and for all c > 0 there exists D > 0 such that for all d D, QðB c ðh 0 ðxÞ; dÞÞ < c, where h 0 ðxÞ denote an estimation such as maximum likelihood. Then, for all > 0, there exists a D > 0 such that for all d D, dðpðz j xÞ, qðz j xÞÞ < : Proof. We can write the equation below jpðz j xÞ À qðz j xÞj ¼ ð h2Bðh 0 ðxÞ;dÞ pðz j hÞjpðh j xÞ À qðh j xÞjdh þ ð h6 2Bðh 0 ðxÞ;dÞ pðz j hÞjpðh j xÞ À qðh j xÞjdh, where Bðh 0 ðxÞ; dÞ is an open ball with its center at h 0 ðxÞ and diameter d. It can be easily concluded that Ð h6 2Bðh 0 ðxÞ;dÞ pðz j hÞjpðh j xÞ À qðh j xÞjdh 2Mc: By the hypothesis in Corollary 2.10, we can say that for all x > 0 there exists D > 0 such that for all d < D, j log pðhÞ À log qðhÞj < x, where p and q denote the densities associated with P and Q respectively. Therefore, by the results taken from Theorem 3.2, the following inequality can be achieved ð h2Bðh 0 ðxÞ;dÞ pðz j hÞjpðh j xÞ À qðh j xÞjdh Me 2x e 2x À 1 f g : Therefore, jpðz j xÞ À qðz j xÞj M 2c þ e 2x e 2x À 1 f g È É ¼ , as required.

Discussion
In this paper we present a new local sensitivity measure in terms of the credibility metrics. We have shown that these metrics asymptotically behave better. We have argued that the corresponding Fr echet derivative similar to the derivatives studied by Gustafson et al. (1996) does not tend to zero. However, we do have uniform boundedness under appropriate conditions. That means a close credible metric a priori will give a close credible metric a posteriori. Therefore, we do not get the sort of divergence derived with the total variation metric as discussed in Gustafson and Wasserman (1995). It is important to investigate how the local sensitivity measure proposed in this paper is applicable to Bayesian networks. However, the proposed likelihood (multinomial distributions) and prior distribution (Dirichlet or product of Dirichlet's) for Bayesian networks with discrete variables would provide the conditions (especially, continuity condition) in the theorems and lemmas presented in this paper. Nevertheless, this still needs to be formally investigated. Smith and Daneshkhah (2010) developed new explicit total variation bounds on the posterior density as the function of closeness of the base prior to the approximating one (selected from a class of priors very similar to the one proposed in this paper) used and certain summary statistics of the calculated posterior density. It was illustrated that the approximating posterior density often converges to the base (or genuine) posterior as the number of sample point increases and the proposed bounds would allow them to identify when the posterior approximation might not.
Another inspiring work, which require further practical works, is related to investigate the asymptotic behavior of the local sensitivity measures (or closeness distances), and compare it with the closeness distances reported in Smith and Daneshkhah (2010). It should be noted that the local sensitivity measures introduced in this paper could be usually expressed in terms of difference between logarithms of the posterior densities. In many cases, this difference would ensure that the Hellinger distance (or total variation bounds as proposed in in Wright and Smith (2018), and thereby the corresponding local sensitivity measure will least be bounded for large enough sample sizes as shown in this paper.
The local sensitivity analysis, as studied in this paper and other relevant works, would be also very useful to answer the following questions, which are commonly raised when modeling multivariate data with complex dependency using Bayesian hierarchical models (Roos et al. 2015), Bayesian network, or Bayesian network pair-copula models (Chatrabgoun et al. 2018). It is of great importance to investigate whether the network structure that is learned from data would be robust with respect to changes of the directionality of some specific arrows. Roos et al. (2015) examine local sensitivity of the Bayesian hierarchical models by developing a new local sensitivity framework, known as -local sensitivity. The next important problem is to study whether the local conditional distribution/probability associated with the specified node would be robust with respect to the changes to its prior distribution or to the changes to the local conditional distribution of another node. However, this problem is addressed in Daneshkhah (2004), Smith and Daneshkhah (2010), and Wright and Smith (2018), but there still exist areas for continued development. In particular, it is of great importance to examine the behavior of the posterior distribution associated with the parameters of any node robust with respect