Epistemic confidence in the observed confidence interval

We define confidence to be epistemic if it applies to an observed confidence interval. Epistemic confidence is unavailable—or even denied—in orthodox frequentist inference, as the confidence level is understood to apply to the procedure. Yet there are obvious practical and psychological needs to think about the uncertainty in the observed interval. We extend the Dutch Book argument used in the classical Bayesian justification of subjective probability to a stronger market‐based version, which prevents external agents from exploiting unused information in any relevant subset. We previously showed that confidence is an extended likelihood, and the likelihood principle states that the likelihood contains all the information in the data, hence leaving no relevant subset. Intuitively, this implies that confidence associated with the full likelihood is protected from the Dutch Book, and hence is epistemic. Our goal is to validate this intuitive notion through theoretical backing and practical illustrations.


INTRODUCTION
Given data Y = y of arbitrary size or complexity, generated from a model p  (y) indexed with the scalar parameter of interest , a confidence interval CI(y) is computed with coverage probability We are interested in the epistemic confidence, defined as the sense of confidence in the observed CI (y).For simplicity, we shall often drop the explicit dependence on y from the CI.Arguably, this is what we want from a CI, but the orthodox frequentist view is emphatic that the probability  applies not to the observed interval CI(y), but to the procedure.In the confidence interval theory, the coverage probability is called the confidence level.So, in the frequentist theory, "confidence" has no separate meaning from probability as well as no epistemic property.Strictly speaking, we do not automatically have 95% confidence in the observed 95% CI.Schweder and Hjort (2016) and Schweder (2018) have been strong proponents of interpreting confidence as "epistemic probability."However, their view is not commonly accepted.Traditionally, only the Bayesians have no problem in stating that their subjective probability is epistemic.How do they achieve that?Is there a way to make the non-Bayesian confidence epistemic?Our aim is to show a way to achieve that.
Frequentists interpret probability as either a long-term frequency or a propensity of the generating mechanism.So, for them, unique events, such as the next toss of a coin or the true status of an observed CI, do not have a probability.On the other hand, Bayesians can attach their subjective probability to such unique events.But what does "attach" mean?One standard interpretation is made based on a logical device called the Dutch Book.As classically proposed by Frank Ramsey (1926) and Bruno de Finetti (1931), your subjective probability of an event E is defined as the personal betting price that you put on the event.Though subjective, the price is not arbitrary, but it follows a normative rational consideration; it is a price that is protected from the Dutch Book, i.e., no external agent can make a risk-free profit off you.Let us call your prices for a collection of bets a betting strategy.Then it is irrational to use a betting strategy that is guaranteed to lose.
Thus we conceptually define confidence to be epistemic if it is protected from the Dutch Book, but crucially we assume that there is a betting market of a crowd of independent and intelligent players.In this market, bets are like a commodity with supply and demand from among the players.Assuming a perfect market condition-for instance, full competition, perfect information and no transaction cost-in accordance with the Arrow-Debreu theorem (Arrow & Debreu, 1954), there is an equilibrium price at which there is balance between supply and demand."Perfect information" means all players have access to the generic data y and the statistical model p  (y).For the betting market in particular, the fundamental theorem of asset pricing (Ross, 1976) states that, assuming a statistical model, the Dutch Book cannot be made if the price is determined by the objective probability.
It is worth emphasizing the difference between our setup and the classical Dutch Book argument used to establish the subjective Bayesian probability.In the latter, because it does not presume the betting market, bets are made only between two persons, you and me.To avoid the Dutch Book, you have to make your bets internally consistent by following probability laws.However, even if your bets are internally consistent, if your prices do not match the market prices, I can make a risk-free profit by playing between you and the market; see Example 1. So, the presence of the market imposes a stronger requirement for epistemic probability.We shall avoid the terms "subjective" and "objective"; one might consider "epistemic" to be subjective since it refers to a personal decision-making based on a unique event, but the market consideration makes it impersonal.
Our question is when or under what condition the confidence, as measured by the coverage probability, apply to the observed interval.One way to judge this is whether you are willing to bet on the true status of the CI using the confidence level as your personal price.Normatively, this should be the case if you know there is no better price.Intuitively, this is when you are sure that you have used all the available information in the data, so nobody can exploit you, that is, construct a Dutch Book against you.Theoretically, to construct the Dutch Book, an external agent must exploit unused information in the form of a relevant subset, conditional on which they can get a different coverage probability.Pawitan and Lee (2021) showed that the confidence is an extended likelihood (Lee et al., 2017).The extended likelihood principle (Bjørnstad, 1996) states that the extended likelihood contains all the information in the data.Intuitively, this implies that the extended likelihood leaves no relevant subset, and is thus protected from the Dutch Book.In other words, we can attach the degree of confidence to the observed CI, that is, confidence is epistemic, provided it is associated with the full likelihood.Our aim is to establish the theoretical justification for this intuitive notion and to provide clear illustrative examples.
To summarize briefly and highlight the plan of the paper, we describe three key concepts: relevant subset, confidence, and ancillary statistic.We prove the main theorem that there are no relevant subsets if confidence is associated with the full likelihood.This condition is easily satisfied if the confidence is based on a sufficient statistic.When there is no sufficient statistic, but there exists a maximal ancillary statistic, then this ancillary defines relevant subsets; the confidence is conditional on the ancillary, but there are no further relevant subsets.

Relevant subsets
Intuitively, we could use the coverage probability  as a betting price if there is no better price given the data at hand.So the question is, are there any features of the data that can be used to improve the price?If they exist, such features are said to be relevant.Formally, a statistic R(y) is defined to be relevant (cf.Buehler, 1959) if the conditional coverage probability given R(y) is nontrivially biased in one direction.That is, for a positive bias, there is  > 0 free of  and some y, such that (1) If it exists, the feature R(y) can be used to construct a Dutch Book: Suppose you and I are betting, and I notice that the event R(y) occurs.If you set the price at , then I would buy the bet from you and then sell it in the betting market at  +  to make a risk-free profit of .Similarly, for the negative bias, the relevant R(y) has the property (2) Technically, R(y) induces subsets of the sample space, known as the "relevant subsets," so the terms "relevant statistic" and "relevant subset" are interchangeable.If there is a relevant subset, the confidence level  is not epistemic.Conversely, if there are no relevant subsets, the betting price determined by the confidence level is protected from the Dutch Book.So, mathematically, we establish epistemic confidence by showing that it corresponds to a coverage probability that is free of relevant subsets.
Example 1.Let y ≡ (y 1 , y 2 ) be an iid sample from a uniform distribution on { − 1, ,  + 1}, where the parameter  is an integer.Let y (1) and y (2) be the minimum and maximum values of y 1 and y 2 .We can show that the confidence interval CI(y) ≡ [y (1) , y (2) ] has a coverage probability For example, on observing y (1) = 3 and y (2) = 5, the interval [3, 5] is formally a 78% CI for .But, if we ponder a bit, in this case we can actually be sure that the true  = 4. So, the probability of 7/9 is clearly a wrong price for this interval.This is a typical example justifying the frequentist objection to attaching the coverage probability as a sense of confidence in an observed CI.
Here the range R ≡ R(y) ≡ y (2) − y (1) is relevant.If R = 2 we know for sure that  is equal to the midpoint of the interval, so the CI will always be correct.But if R = 0, the CI is equal to the point y 1 , and it falls with equal probability at the integers { − 1, ,  + 1}.So, for all , In the betting market, the range information will be used by the intelligent players to settle prices at these conditional probabilities.For example, if y 1 = 3 and y 2 = 5, the intelligent players will not use 7/9 as the price and will instead use 1.00.So, the information can be used to construct a Dutch Book against anyone who ignores R. How do we know that there is a relevant subset in this case?Moreover, given R, how do we know if there is no further relevant subset?
To contrast with the classical Ramsey-de Finetti Dutch Book argument, suppose y 1 = y 2 = 3. If, for whatever subjective reasons, you set the price 7/9 for [ ∈ CI], you are being internally consistent as long as you set the price 2/9 for [ ∉ CI], since the two numbers constitute a valid probability measure.Internal consistency means that I cannot make a risk-free profit from you based on this single realization of y.Even if I know that 1/3 is a better price, I cannot take any advantage of you because there is no betting market.So 7/9 is a valid subjective probability.

Confidence distribution
Let t ≡ T(y) be a statistic for , and define the right-side p-value function C m (; t) ≡ P  (T ≥ t). (3) Assuming that, for each t, it behaves formally like a proper cumulative distribution function, C m (; t) is called the confidence distribution of .The subscript m is used to indicate that it is a "marginal" confidence, as it depends on the marginal distribution of T. For continuous T, at the true parameter, the random variable C m (; T) is standard uniform.For continuous , the corresponding confidence density is The functions C m (; t) and c m () across  are realized statistics, which depend on both the data and the model, but not on the true unknown parameter  0 .We can view the confidence distribution simply as the collection of p-values or CIs.We define to convey the "confidence of  belonging in the CI." Fisher (1930" Fisher ( , 1933) ) called C m (; t) the fiducial distribution of , but he required T to be sufficient.However, the recent definition of the confidence distribution (e.g., Schweder & Hjort, 2016, p. 58) requires only C m (; T) to be uniform at the true parameter, thus guaranteeing a correct coverage probability.Lemma 1 below establishes when Fisher's fiducial probability C m (; t) becomes a frequentist coverage probability, which requires T to be continuous.When T is discrete, the equality is only achieved asymptotically; see the Appendix A2 for an example.Assume Condition 1 in Section 2.3 that for any  ∈ (0, 1), the quantile function q  () of T is a strictly increasing function of .Then the frequentist procedure based on T gives a -level CI defined by for some  2 >  1 > 0 with  2 −  1 = , to have a coverage probability Here the coverage probability is a frequentist probability based on the distribution of unobserved future data T, whereas given observed data t, the confidence is for the observed interval CI(t) based on the confidence density of .The confidence becomes Thus, we have the following lemma.
Lemma 1.Under Condition 1 in Section 2.3, where CI(t) is the observed interval of confidence procedure CI(T) defined in ( 6).
However, as shown in Example 1, a correct coverage probability does not rule out relevant subsets.This means that the current definition of confidence distribution does not guarantee epistemic confidence.The key step is to define a confidence distribution that uses the full information.Motivated by the Bayesian formulation and Efron (1993), let us define the implied prior as where m(t) cancels out all the terms not involving  in c m (; t)∕L(; t).Then define the full confidence density as The subscript f is now used to indicate that the confidence density is associated with the full likelihood based on the whole data.When necessary for clarity, the dependence of the confidence density and the likelihood on t and on the whole data y will be made explicit.c f () is defined only up to a constant term to allow it to integrate to one.Obviously, if T is sufficient, then c m () = c f (), but in general they are not equal.In Section 3, we show a more convenient way to construct c f ().
The confidence function parallel to (5) can be denoted by C f (⋅).Thus, the full confidence density looks like a Bayesian posterior.However, the implied prior is not subjectively selected, and can be improper or data-dependent.

Main theorem
The full confidence density c f () can be used in general to compute the degree of confidence  to any observed CI(y) as The CI has a coverage probability, which may or may not be equal to .We say that c f () has no relevant subsets, if there is no R(y) such that the conditional coverage probability is biased in one direction according to (1) or (2).For our main theorem, we assume the following regularity conditions.
Condition 2. There exists a function g() > 0 free of y, such that for any given Y = y, where E |y (⋅) is the expectation under confidence density c(; y), and c 0 (; y) is the implied prior.
If the implied prior does not depend on the data, c 0 (; y) = c 0 (), then the choice g() = c 0 () leads both sides of (10) to be 1.Thus, Condition 2 holds for any data-free implied prior even if it is improper.If the implied prior is data-dependent, g() would be a function which diverges near the boundary of Θ.We shall illustrate this in Example 3 in Section 3.3.For the general single-parameter exponential family, where the parameter space Θ and the sample space of T are identical, a sufficient condition for Condition 2 is The quantity 1 − P =t (T ∈ CI(t)) can be viewed as the significance level for testing the null hypothesis  = t with acceptance region CI(t) and P  ( ∈ CI(T)) is the frequentist coverage probability.This inequality states the usual relationship between the hypothesis testing and the confidence interval.
Theorem 1.Consider the full confidence density c f () ∝ c 0 ()L(; y) with c 0 () being the implied prior ( 8).Let  be the degree of confidence for the observed CI(y) such that Under Conditions 1 and 2, c f () has no relevant subsets.
Proof.We first prove the positively biased case, which presumes that there exists a positively biased relevant subset R. Equation (1) can be expressed as Consider a function g() > 0 from Condition 2, then we have On both sides the integrands are nonnegative, so the order of integration can be interchanged.Then the left-hand-side becomes while the right-hand-side becomes dy.
Since  +  > , m(y) > 0, and E |y ( g() c 0 (;y) ) , we get A < B, which is a contradiction.Hence there is no positively biased relevant subset.Now suppose that there exists a negatively biased relevant subset R * .Let R = (R * ) C be the complementary set of R * , then which leads to Hence R becomes a positively biased relevant subset, which is shown above to lead to a contradiction.Therefore, overall there is no relevant subset.▪ Note that we now have two ways of computing the price of an observed CI: using C f ( ∈ CI) or using P  ( ∈ CI).The latter has the desired coverage probability, but not guaranteed to be free of relevant subsets; the former is free of relevant subset, but not guaranteed to match the coverage probability.If the two are equal, we have a confidence that corresponds to a coverage probability free of relevant subsets, hence epistemic.If T is sufficient and satisfies Condition 1, Lemma 1 implies that the frequentist CI satisfies =  for all  and y.
Thus, we can summarize the first key result in the following corollary: Corollary 1.Under Conditions 1 and 2, if T is sufficient, the confidence based on c m (; t) has a correct coverage probability and no relevant subsets, hence it is epistemic.
We note that P  ( ∈ CI(Y )) = C f ( ∈ CI(y)) holds asymptotically, regardless whether y is continuous or discrete.Corollary 1 specifies the conditions where it is true in finite samples.
If c 0 () is a proper probability density that does not depend on y, then c f () is a Bayesian posterior density, shown already by Robinson's (1979), proposition 7.4, not to have relevant subsets.For proper priors, Condition 2 trivially holds, so the theorem extends his result to improper and data-dependent priors.Moreover, there is a significant difference in the interpretation.If you use a proper but arbitrary c 0 () that is not the same as the implied prior, and there is a betting market, your price  will differ from the market price.So, as illustrated in Example 1, I can construct a Dutch Book against you.While, assuming no betting market, the theorem is meaningful only for two people betting repeatedly against each other, with gains or losses expressed in terms of expected value or long-term average.This is the setting described by Buehler (1959) and Robinson (1979).Crucially, the presence of relevant subsets does not guarantee an external agent a risk-free profit from a single bet.So, it does not satisfy our original definition of epistemic confidence.Lindley (1958) showed that, assuming T is sufficient, Fisher's fiducial probability-hence the marginal confidence-is equal to the Bayesian posterior if and only if the family p  (y) is transformable to a location family.However, his proof assumed c 0 () to be free of y.Condition 2 of the main theorem allows c 0 () to depend on the data, so our result is not limited to the location family.

Ancillary statistics
The current definition of confidence distribution (e.g.Schweder & Hjort, 2016, p. 58) only requires C m (; T) to follow uniform distribution.However, if T is not sufficient, the marginal confidence is not epistemic, because it does not use the full likelihood, so it is not guaranteed free of relevant subsets.Limiting ourselves to models with sufficient statistics to get epistemic confidence is overly restrictive, since sufficient statistics exist at arbitrary sample sizes in the full exponential family only (Pawitan, 2001, section 4.9).Using nonsufficient statistics implies a potential loss of efficiency and epistemic property.Further progress depends on the ancillary statistic, a feature of the data whose distribution is free of the unknown parameter (Ghosh et al., 2010).We first have a parallel development for the conditional confidence distribution given the ancillary A(y) = a: C c (; t|a) ≡ P  (T ≥ t|a) and c c (; t|a) ≡ C c (; t|a)∕.
We have immediately the following corollary from Lemma 1. Condition 1 needs a little modification, where it refers to the conditional statistic T|a for each a.
Corollary 2. Under Condition 1, where CI is the confidence interval based on the conditional distribution of T|a.
Furthermore, define the implied prior as where m(t, a) cancels out all the terms not involving  in c c (; t|a)∕L(; t|a).As before, the full confidence is c f () ∝ c 0 ()L(; y).Suppose T(y) = t is not sufficient but (t, a) is, where a is an ancillary statistic.In this case, a is called an ancillary complement, and in a qualitative sense it is a maximal ancillary, because Thus, conditioning a non-sufficient statistic by a maximal ancillary has recovered the lost information and restored the full-data likelihood.In particular, the conditional confidence becomes the full confidence: c c (; t|a) = c f ().Note that ( 14) holds for any maximal ancillary, so if a maximal ancillary exists, then the full likelihood is automatically equal to the conditional likelihood given any maximal ancillary statistic.In its sampling theory form, when t is the MLE θ, full information can be recovered from p  ( θ|a), whose approximation was studied by Barndorff-Nielsen (1983).
In conditional inference (Reid, 1995), we condition on the ancillary to make our inference more "relevant" to the data at hand; in other words, more epistemic.But this is typically stated on an intuitive basis; the following corollary provides a mathematical justification.Since we already condition on A(y), a further relevant subset R(y) is such that the conditional probability P  ( ∈ CI|A(y), R(y)) is nontrivially biased in one direction from P  ( ∈ CI|A(y)) in the same manner as (1).Now we can state our second key result: Corollary 3. If A(y) = a is maximal ancillary for T(y), and CI is constructed from the conditional confidence density based on T|a, then under Conditions 1 and 2, the conditional confidence C c ( ∈ CI; t|a) has a correct coverage probability and no further relevant subsets.Hence the conditional confidence is epistemic.
Because of ( 14), the confidence is epistemic for any choice of the maximal ancillary.However, maximal ancillary may not be unique; this is an issue traditionally considered most problematic in conditional inference.If it is not unique, then the conditional coverage probability might depend upon the choice.However, this does not affect the guaranteed absence of relevant subset in the corollary.We discuss this further in Section 4 and illustrate with an example in the Appendix A3.

Computation of epistemic confidence
Our theory indicates that we get epistemic confidence from the full confidence density c f () ∝ c 0 ()L(; y).The corresponding coverage probability is either a marginal probability or a conditional probability given a maximal ancillary.The full likelihood L(; y) is almost always easy to compute.However, in order to get a correct coverage, the implied prior c 0 () must be computed using ( 8) or ( 13); in practice these can be difficult to evaluate.We illustrate through a series of examples some suitable approximations of c 0 () that are simpler to compute.Suppose, for sample size n = 1, there is a statistic t 1 ≡ T(y 1 ) that satisfies Condition 1, that is, it allows us to construct a valid confidence density c m (, t 1 ).The statistic t 1 trivially exists if y 1 itself leads to a valid confidence density.Then we can compute c 0 () based on c m (; t 1 )∕L(; t 1 ).First consider the case when c 0 () is free of the data.From the updating formula in section 3 of Pawitan and Lee (2021), the confidence density based on the whole data is Once c 0 () is available, (15) is highly convenient, since it is computationally straightforward.More importantly, as shown in some examples below, formula (15) works even when there is no sufficient estimate from the whole data for n > 1; see location-family model in Section 3.2 and the curved exponential model in Example 4.
When c 0 () depends on the data, it matters which y i is used to compute it.In this case the updating formula is only an approximation.As long as the contribution of log c 0 () to log c f () is of order O(1∕n), we expect a close approximation as illustrated in Example 3.

Simple model
Example 1 (continued).Based on y 1 alone, we have c(; y 1 ) ∝ L(; y 1 ) = 1 for  ∈ {y 1 − 1, y 1 , y 1 + 1}, so the implied prior c 0 () = 1 for all .The full likelihood based on (y 1 , y 2 ) is so, the full confidence density is c f () ∝ L().For example, if y 1 = 3 and y 2 = 5, we do have 100% confidence that  = 4.And if y 1 = y 2 = 3, we only have 33.3% confidence for  = 4, though we have 100% confidence for  ∈ {2, 3, 4}.The MLE of  is not unique, but we can choose θ = y as the MLE.It is not sufficient, but (y, R) is, so R is a maximal ancillary.Indeed the full confidence values match the conditional probabilities given the range R. Furthermore, from Corollary 3, there is no further relevant subset, so the confidence is epistemic.

Location family model
Suppose y 1 ,…, y n are an iid sample from the location family with density where f (⋅) is an arbitrary but known density.Based on y 1 alone, so the implied prior is c 0 () = 1.So, using formula (15), the full confidence density is This is a remarkably simple way to arrive at the confidence density of  and epistemic CIs without having to find the MLE and its distribution.Without further specifications, the MLE T ≡ θ is not sufficient, so the marginal p-value P  (T ≥ t) will not yield the full confidence.The distribution of the residuals (y i − ) are free of , so the set of differences (y i − y j )'s are ancillary.In his classic paper, Fisher (1934) showed that , where a is the set of differences from the order statistics y (1) ,…, y (n) .This means that the conditional likelihood based on θ|a matches the full likelihood ( 16), and the confidence level of CIs based on (16) will match the conditional coverage probability.Indeed, here ( θ, a) is sufficient and a is maximal ancillary.Note however that ( 16) does not require any explicit knowledge or formula of the ancillary statistic.

Exponential family model
Let y 1 ,…, y n be an iid sample from the exponential family with log-density The MLE is sufficient if J = 1, but not if J > 1.In the latter case, the family is called the curved exponential family.By Theorem 1, when J = 1 confidence statements based on the MLE will be epistemic.Our theory covers the continuous case in order to get exact coverage probabilities.Many important members are discrete, which is more complicated because the definition of the p-value is not unique, and the coverage probability function is guaranteed not to match any chosen confidence level.We discuss an example in Appendix A2.The standard evaluation of the confidence requires the tail probability of the distribution of the MLE, which in general has no closed form formula. Barndorff-Nielsen's (1983) approximate conditional density of the MLE θ is given by where the MLE is the solution of A ′ () = ∑ i ∑ j h ′ j ()t j (y i ), a is the maximal ancillary and k is a normalizing constant that is free of .For J = 1 and the canonical parameter h 1 () = , the ancillary is null, and the approximation leads to the right-side p-value where Z is the standard normal variate and with w = 2 log{L( θ)∕L()} and I( θ) the observed Fisher information.From the p-value we can get the corresponding confidence density and the implied prior.
Example 2. Let y = (y 1 ,…, y n ) be an iid sample from the gamma distribution with mean one and shape parameter .The density is given by y −1 i e −y i , so we have an exponential family model with To use formula (15), we first find the implied prior density using t 1 ≡ t(y 1 ) alone: where c 1 () = {P  (T 1 ≥ t 1 )}∕ and L(; t 1 ) = p  (y 1 ).The probability P  (T 1 ≥ t 1 ) is an incomplete gamma integral, which is computed numerically.The implied prior is shown in Figure 1a.So from ( 15), we get the confidence density For an example with n = 5 and ∑ i t(y i ) = −5.8791,which corresponds to the MLE θ = 3, the confidence density is given by the solid line in Figure 1b.The normalized likelihood function is also shown by the dashed line, which is quite distinct from the confidence density.
By comparison, to get the marginal confidence density based on the p-value formula ( 19), we need where θ is the solution of with () ≡  log Γ()∕, and the observed Fisher information is For the data example, the MLE θ has to be computed numerically.The circle points in Figure 1b are the marginal confidence density based on the same sample above.As expected, this tracks almost exactly the one given by formula ( 21).The corresponding implied prior based on c m ()∕L() is given in Figure 1a, also closely matching the implied prior based on n = 1.
Example 3.This is an example where c 0 () is data dependent.Let y = (y 1 ,…, y n ) be iid sample from N(, ) for  > 0. The log-density is given by distribution for T(y).For n = 1, T(y 1 ) = t 1 = y 2 1 is sufficient, and Note that the log-density is not of the form (11); here Condition 2 holds using g() =  −3∕2 e ∕2 .Since the implied prior is data-dependent, the full confidence density depends on which y i is used to compute the implied prior: In Figure 2, for n = 3, we compare c fi () using three different versions of c 0 () based on y i for i = 1, 2, 3. Two datasets are shown, where the first has a small variance and the second a large variance.These are also compared with the marginal confidence c m ().As shown in the figure, even for such a small dataset, the effect of the data dependence in this case is negligible.
Example 4. This example of a curved exponential model illustrates complex cases, where the MLE is not sufficient.Let y 1 ,…, y n be iid sample from N(,  2 ) for  > 0.
Here ( ∑ y 2 i , ∑ y i ) is minimal sufficient, and the likelihood function is ) .
F I G U R E 2 c fi () for i = 1, 2, 3 and c m () (circles) based on (a) y = (0.9, 1, 1.5) and (b) y = (0.1, 1, 5).In each panel, all three curves c fi are actually drawn in solid line, but we can only see one curve because they track each other very closely.
The MLE is given by First consider the confidence distribution based on y 1 , ) .
We can see immediately that if we use y 1 as the statistic, the term inside the bracket converges to −1 as  → ∞, and the confidence distribution goes to 1 − Φ(−1) = 0.84.Hence y 1 does not satisfy Condition 1.However, we can show that t 1 ≡ |y 1 | does lead to a valid confidence distribution: ) .
After taking derivatives to get c m1 (; t 1 ) and L(; t 1 ), the implied prior based on t 1 is The updating formula (15) gives the full confidence density In Appendix A1 we show: (i) we get the same implied prior if we start with the MLE θ1 based on y 1 alone, or with the conditional θ1 |a 1 .So even though t 1 or θ1 are not sufficient, they still lead to a valid implied prior for the full confidence; (ii) the conditional confidence density derived using Barndorff-Nielsen's formula (18) also gives the same implied prior; (iii) the confidence c m (; θ1 ,…, θn . However, it is not epistemic because it does not use the full likelihood, so there is a potential loss of information.
To compare with exact theoretical results, we refer to Hinkley (1977), who derived the conditional density of w =  −1 ( ∑ given the ancillary as where where F a (w) = ∫ w 0 p(u|a)du.Thus, the confidence density becomes so that the implied prior becomes c 0 (; t|a) ∝ c c (; t|a)∕L(; t|a) ∝ 1∕, which is again the same as the result from t 1 .Thus, we have As numerical illustrations, we compare the exact conditional p-value P  (T > t|a) based on ( 23) for testing H 0 :  = 1, the corresponding full confidence C f () at  = 1 and the p-value based on the score test.The latter was computed using the observed Fisher information, suggested by Hinkley (1977) to have good conditional properties.For Figure 3a, we generate 100 datasets with n = 5 from N(,  2 ) at  = 1.2.The full confidence C f ( ≤ 1) is computed using the implied prior c 0 () ∝ 1∕, and a constant prior c 0 () ∝ 1. Panel (b) shows the result for n = 10.The full confidence with the implied prior c 0 () ∝ 1∕ agrees with the exact conditional p-value.The use of a non-implied constant prior c 0 () ∝ 1 generates over-optimistic p-values, particularly for small values.While Hinkley's (1977) score test appears correct on average, in these small samples, it has a poor conditional property.

DISCUSSION
We have described a concept of epistemic confidence for an observed confidence interval.Fisher tried to achieve the same purpose with the fiducial probability, but the use of the word "probability" had generated much confusion and controversies, so the concept of fiducial probability has been practically abandoned.However, the confidence concept is mainstream, although it comes with a frequentist interpretation only, so it applies not to the observed interval but to the procedure.The confidence may not be a probability but an extended likelihood (Pawitan & Lee, 2021), whose ratio is meaningful in hypothesis testing and statistical inferences (Lee & Bjørnstad, 2013).The confidence is logically distinct from the classical likelihood.Our results show that we can turn a classical likelihood into a confidence density by multiplying it with an implied prior.Furthermore, we get epistemic confidence by establishing the absence of relevance subsets.Schweder and Hjort (2016) and Schweder (2018) have been strong proponents of interpreting confidence as "epistemic probability."We are in general agreement with their sentiment, but it is unclear which version of probability this is.The only established and accepted epistemic probability is the Bayesian probability, but in their writing, the confidence concept is clearly non-Bayesian.Our use of the Dutch Book defines normatively the epistemic value of the confidence while staying within the non-Bayesian framework.
Conditional inference (Reid, 1995) has traditionally been the main area of statistics that tries to address the epistemic content of confidence intervals.However, it goes half-way to the end goal of epistemic confidence that Fisher would want.The general lack of unique maximal ancillary is a great stumbling block, where it is then possible to come up with distinct relevant subsets with distinct conditional coverage probabilities.This raises an unanswerable question: What is then the "proper" confidence for the observed interval?Our logical tool of the betting market overcomes this problem-in this case, the market cannot settle in an unambiguous price.But Corollary 3 still holds in the sense that you are still protected from the Dutch Book.We discuss this further with an example Appendix A3. ) .
The implied prior based on θ1 is where L(; θ1 ) is the likelihood function based on θ1 .
The conditional confidence distribution based on θ1 |a 1 is which is now a valid confidence distribution, with density .
The implied prior is c 0 (; θ1 |a 1 ) ∝ c c (; θ1 |a 1 )∕L(; y 1 ) ∝  −1 , the same as the one derived based on θ1 .On the other hand, if we construct the full confidence densities by then the resulting confidence density depends on the choice of y i .In this case we should consider c fi (; θi , y (−i) ) as an approximation to c f (; y). Figure A1 plots n confidence densities c fi (; θi , y (−i) ) (solid) and c f (; y) (circle) with y i from N(1, 1).As shown in (b), when n becomes large, the difference becomes negligible and c fi (; θi , y (−i) ) gets closer to c f (; y) (circle).So There is a loss of information caused by using c mi (), due to the sign of y i as captured by L(; a i ).This is negligible even in small samples; see Figure A1.However, the marginal confidence has a larger loss of information, as shown in both Figure A1a,b.Figure A2 plots the logarithms of implied prior c 0m1 (; θ1 ) ∝ 1∕ (dotted) and q 1 (; y 1 ) ∝ c m1 (; θ1 )∕L(; y 1 ) (solid), properly scaled.Here the function q 1 is not bounded in y 1 , because the information on  in L(; y 1 ) ∝ L(; θ1 |a 1 ) and L(; θ1 ) differ.F I G U R E A2 log(c 0m1 (; θ1 )) (dotted) and log(q 1 (; y 1 )) (solid) as a function of y 1 .
It is also possible to compute the conditional confidence density by using Barndorff-Nielsen's formula as given in the main text, and to show that we end up with the same implied prior c 0 () = 1∕.Firstly, the likelihood ratio is given by ) , where I  (⋅, ⋅) is the regularized incomplete beta function and B(⋅, ⋅) is the beta function.
Figure A3 shows the coverage probabilities of the 95% two-sided confidence procedure based on the mid p-value of θ for binomial models and negative binomial models.We can see that the coverage probabilities fluctuate around 0.95 but they are not consistently biased in one direction.Moreover, as n or y becomes larger, the difference between the coverage probability and the confidence becomes smaller.In discrete case, it is not possible to access the exact objective coverage probability of the CI procedure.Here the confidence is a consistent estimate likelihood function under the conditional model given Y 1 = 1 or given Y 2 = 1, so conditioning on each ancillary recovers the full likelihood and each ancillary is maximal.Now consider using the MLE itself as a "CI."Conditional on the ancillaries, the probability that the MLE is correct is These conditional "coverage probabilities" are indeed distinct from each other.However, comparing the conditional coverage probabilities given Y 1 to that given Y 2 , there is no consistent non-trivial bias in one direction across .So if you use Y 1 as the ancillary, you cannot construct further relevant subsets based on Y 2 .This is the essence of our remark after Corollary 3 that the lack of unique maximal ancillary does not affect the validity of the corollary.Unfortunately, in this example, the p-value is not defined because the parameter  can be an unordered label.So it is not possible to compute any version of confidence function or any implied prior.In the continuous case, we define CI to satisfy P  ( ∈ CI) =  for all .However, in discrete cases, it is often not possible for the coverage probabilities to be same for all , which violates the condition of Theorem 1. Fisher (1973, chapter III) suggested that for problems such as this, the structure is not sufficient to allow an unambiguous probability-based inference, so only the likelihood is available.
a regular exponential family with sufficient statistic T(y) = ∑ i y 2 i .The marginal confidence density c m () can be computed based on the non-central  2 F I G U E 1 (a) Implied priors of  computed from (20) (solid) and (19) (•).Both are normalized such that they are equal to one at the MLE (•).(b) The normalized likelihood function (dashed) and the confidence densities based on a sample with n = 5 using (21) (solid) and (19) (•).(a) Implied prior; (b) Confidence density.