Bounds on the probability of radically different opinions

We establish bounds on the probability that two different agents, who share an initial opinion expressed as a probability distribution on an abstract probability space, given two different sources of information, may come to radically different opinions regarding the conditional probability of the same event.


Introduction
Let A ∈ F be an event in some probability space (Ω, F, P ), and let for some p ∈ [0, 1]. Following [DDM95], we interpret X and Y as the opinions of two experts about the probability of A given different sources of information G and H, assuming the experts agree on some initial assignment of probability P to events in F. We use the term coherent, as in [DDM95], for (X, Y ) as in (1.1) or (1.2), or for the joint distribution of such (X, Y ) on [0, 1] 2 . This term has been used with several other meanings in the theory of subjective probability, risk assessment, and reliability. But we use it here only in the sense above, for two or more conditional probabilities of some common event in a probability space. Note the obvious reflection symmetry that if (X, Y ) is coherent then so are (Y, X), (1 − X, 1 − Y ), and (1 − Y, 1 − X). (1.3) Coherent opinions (X, Y, . . .) based on information represented by an increasing sequence of σ-fields form a martingale. The notion of a coherent family of random variables also includes reversed martingales, and martingales relative to a directed index set [DP80a,Kho02]. As remarked in [DDM95,p.284], with just a change of notation (X, Y ) ↔ (Π 1 , Π 2 ), If X and Y are both produced by "experts", then one should not expect them to be wildly different. For example, it would seem paradoxical if, with X say uniform on [0, 1], one always had Y = 1 − X. This suggests that not all joint distributions on [0, 1] 2 for (X, Y ) are coherent.
Indeed, it follows easily from Proposition 2.1 below that Research of KB was supported in part by Simons Foundation Grant 506732.
This suggests the rough idea that coherent opinions cannot be too negatively dependent. However, elementary examples in [DDM95,§4.1] show that for any prescribed value of EX = EY = P (A) ∈ (0, 1), the correlation between coherent opinions X and Y about A can take any value in (−1, 1]. Consider for instance, for δ ∈ (0, 1), the distribution of (X, Y ) concentrated on the three points (1 − δ, 1 − δ) and (0, 1 − δ) and (1 − δ, 0), with This example from [DP80a] gives a pair of coherent opinions (X, Y ) about the event A = (X = Y ), with correlation ρ(X, Y ) = −δ which can be any value in (−1, 0). The idea expressed above, that coherent opinions X and Y should not be too radically different, leads to the following precise problem, posed in [Bur09] and [Pit14]: for 0 ≤ δ ≤ 1, evaluate For m, n = 1, 2, 3, . . . consider also ε m×n (δ) = ε n×m (δ) defined by restricting the above supremum to m × n coherent (X, Y ), meaning that X takes at most m and Y at most n possible values. Let ε finite (δ) := sup m,n ε m×n (δ), which is the supremum in (1.5) restricted to (X, Y ) with a finite number of possible values. Each of these functions of δ is evidently non-decreasing and bounded above by 1. Then for all δ ∈ [0, 1] (1.6) The first inequality is due to the example (1.4). The second and third are obvious, and the last is by elementary construction of n × n coherent (X n , Y n ) with |X n − X| + |Y n − Y | ≤ 2/n for any coherent (X, Y ). We use the notation x ∧ y := min(x, y) and x ∨ y := max(x, y), and either 1 A or 1(A) for an indicator function whose value is 1 if A and 0 else.
Proposition 1.1. There are the following evaluations and bounds: for δ ∈ [0, 1] and n ≥ 2, The bounds (1.6) and (1.9) were given in [Bur09,  , 1] come from the coherent 1 × 2 distribution of (X, Y ) with equal probability 1 2 at the points ( 1 2 , 0) and ( 1 2 , 1) ∈ [0, 1] 2 . That is where B p for 0 ≤ p ≤ 1 denotes a random variable with the Bernoulli(p) distribution P (B p = 1) = p and P (B p = 0) = 1 − p. (1.11) For δ ∈ (0, 1 2 ), Claim 1.2 is that equality holds in all the easy inequalities (1.6). The first of these equalities is proved here as (1.8). Equality in the second inequality of (1.6) for δ ∈ (0, 1 2 ) is much less obvious. The proof of this in [BP19] is at present quite long and difficult, by recursive reduction of m and n for m × n coherent (X, Y ), until the problem is reduced to the 2 × 2 case treated here by (1.8). We hope this exposition of the easier evaluations in Proposition 1.1 might provoke someone to find a simpler proof of Claim 1.2. Note from (1.7) and (1.8) that each of the functions ε 1×n and ε 2×2 is continuous on each of the intervals [0, 1 2 ) and [ 1 2 , 1], but has an upward jump to 1 at δ = 1 2 , as shown in Figure 1. If Claim 1.2 is accepted for δ ∈ (0, 1 2 ), then ε(δ) too jumps up to 1 at 1 2 . Some further interpretations of these maximal probability functions, without assuming Claim 1.2, are presented in the following proposition, which is a specialization of Corollary 4.3 below. This involves the usual notion of stochastic ordering of real random variables V and W , that for all real x. This is well known to be equivalent to existence of a coupling of V and W on a common probability space with P (V ≤ W ) = 1, and again to Ef (V ) ≤ Ef (W ) for all bounded increasing f . Proposition 1.3. The definition (1.5) of ε(δ) implies that for all coherent (X, Y ), for a random variable ∆ with 0 ≤ ∆ ≤ 1/2. Moreover: • For each δ ∈ [0, 1] there is a coherent (X, Y ) which attains equality in (1.12).
• For all coherent (X, Y ), |X − Y | ≤ d 1 − ∆. In particular, for r > 0 (1.13) For m × n coherent (X, Y ), the same conclusions hold, with the distribution of ∆ m×n on [0, 1 2 ] defined by (1.12) with ε m×n (δ) in place of ε(δ). The second inequality in (1.13) uses the upper bound in (1.9), followed by exact evaluation of the integral. Accepting Claim 1.2 gives a slightly smaller integral involving an incomplete beta function. For instance, for r = 1 these upper bounds on E|X − Y | are 0.75 = 3 4 > 3 2 + log 4 − log 9 ≈ 0.68907. Corollary 2.5 shows that the supremum of E|X − Y | over all coherent (X, Y ) is actually 1 2 . The rest of this article is organized as follows. Section 2 recalls some background related to Proposition 1.1, which is proved in Section 3. Section 4 recalls some known characterizations of coherent distributions of (X, Y ). For reasons we do not understand well, these general characterizations seem to be of little help in establishing the evaluations of ε(δ) discussed above, or in settling a number of related problems about coherent distributions, which we present in Section 5. So much is left to be understood about the limitations on coherent opinions.

Background
Let (X i , i ∈ I) be a finite collection of random variables defined on some common probability space (Ω, F, P ), and suppose that each X i is the conditional expectation of some integrable random variable X * given some sub-σ-field F i of F: (2.1) Doob's well known bounds for tail probabilities and moments of the distributions of max i∈I X i and max i∈I |X i |, for either an increasing or decreasing family of σ-fields, and extensions of these inequalities to families of σ-fields indexed by a directed set I, with suitable conditional independence conditions, play a central role in the theory of martingale convergence. See for instance [Kho02,HLOST16] and [Osȩ17] for recent refinements of Doob's inequalities, and further references. For the diameter of a martingale max i,j∈I there is no difficulty in bounding tail probabilities and moments, with an additional factor of 2 to a suitable power. But finer results with best constants for the diameter have also been obtained in [DGM09,Osȩ15].
Much less is known about limitations on the distributions of such maximal variables for finite collections of σ-fields (F i , i ∈ I) without conditions of nesting or conditional independence. We focus here on joint distributions of the random vector of values of this martingale on singleton subsets of I. Assuming the basic probability space is sufficiently rich, there is a random variable U with uniform distribution on [0, 1], with U independent of X * and F I . Then X * can be be replaced by the indicator random variable 1(U ≤ X * ). So there is no loss of generality in supposing X * = 1(A) is the indicator of some event A with P (A) = p ∈ [0, 1]. It follows that each X i is the conditional probability of A given F i : Then either (X i , i ∈ I) or its joint distribution on [0, 1] I will be called coherent. Besides EX = EY , another necessary condition for a pair (X, Y ) to be coherent is provided by the following simplification and extension of [DDM95, Theorem 5.2]. See also Proposition 4.1 for some conditions that are both necessary and sufficient for (X, Y ) to be coherent.
Proposition 2.1. Consider a pair of real-valued random variables (X, Y ) and assume that there exist disjoint intervals G and H and Borel sets G ⊆ G and H ⊆ H such that the For disjoint intervals G = G = [0, a) and H = H = (b, 1], Proposition 2.1 yields: This corrects the claim above [DDM95, Theorem 5.2] that (2.5) alone makes (X, Y ) not coherent. (This is false if P (Y > b) = 0; take a = 1 4 , b = 3 4 and X = Y = 1 2 ). The following construction of a coherent distribution of n variables (X 1 , . . . , X n ) was used in [DP80a] to build counterexamples in the theory of almost sure convergence of martingales relative to directed sets.
Example 2.3. (The (n, p)-daisy, with n petals and a Bernoulli(p) center) [DP80a]. Let A, A 1 , . . . , A n be a measurable partition of Ω with For 1 ≤ i ≤ n let F i be the σ-field generated by A ∪ A i . Then set To explain the daisy mnemonic, imagine Ω is the union of n + 1 parts of a daisy flower, with center A of area p, surrounded by n petals A i of equal areas, with total petal area 1 − p. For each petal A i , an ith petal observer learns whether or not a point picked at random from the daisy area has fallen in (the center A or their petal A i ), or in some other petal. Each petal observer's conditional probability X i of A is then as in (2.6). The sequence of n variables (X 1 , . . . , X n ) is both coherent and exchangeable, with constant expectation p: • given A the sequence (X 1 , . . . , X n ) is identically equal to the constant p n ; • given the complement A c , the sequence (X 1 , . . . , X n ) is p n times an indicator sequence with a single 1 at a uniformly distributed index in {1, . . . , n}.
The (n, p)-daisy example was designed to make max 1≤i≤n X i = p n , a constant, as large as possible with EX i ≡ p. As observed in [DP80b,p.224], this p n is the largest possible essential infimum of values of max i X i for any coherent distribution of (X 1 , . . . , X n ) with EX i ≡ p. This special property involves the n-petal daisy in the solution in various extremal problems for coherent opinions. For instance, (X, Y ) = (X 1 , X 2 ) derived from the (2, p) daisy with p = (1 − δ)/(1 + δ), so p 2 = 1 − δ, is the coherent pair in (1.4). This provides the lower bound for ε 2×2 (δ) in (1.6), which according to (1.8) is attained with equality for δ ∈ [0, 1 2 ). Also: Proposition 2.4. [DP80b] For every coherent distribution of ( . (2.7) Moreover, this bound is attained by taking (X 1 , . . . , X n−1 ) to be the (n − 1, p)-daisy sequence, and X n = 1 A , the Bernoulli(p) indicator of the daisy center.
For example, if (X 1 , . . . , X n ) is the (n, p) daisy sequence, the left hand side of (2.7) is p n in (2.6), which is strictly less than the right side of (2.7). Proposition 2.4 implies: with equality in the first inequality if X = p and Y d = B p as in (1.11).
Proof. Take n = 2 in (2.7) and use As noted below Proposition 1.3, the bound (2.8) is better than what is obtained by integration of the least upper bounds (1.5) on tail probabilities of |X − Y |. Combine (2.8) with Markov's inequality to see that . (2.9) But without restricting p to be close to 0 or 1, this does not reduce the upper bound of (1.12). See also Problems 5.3 and 5.4.
so Markov's inequality gives (3.1) The more general assertion of the lemma follows by conditioning on X.
Turning to consideration of (1.8), we start with a lemma of independent interest, which controls the variability of P (A | G) as a function of G with P (G) > 0 by a bound that does not depend on A. We work here with the elementary conditional probability which is the number P (A | G) := P (AG)/P (G) rather than a random variable. Let G H := GH c ∪ G c H denote the symmetric difference of G and H.

3.2)
Consequently, for each 0 ≤ δ ≤ 1, (P (G) + P (H)). Then 3) follow easily. To check the inequality in (3.4), observe that for fixed p, q, r the difference of fractions in the middle is obviously maximized by taking a = 1, c = 0. That done, the difference is a linear function of b, whose maximum over 0 ≤ b ≤ 1 is attained either at b = 0 or at b = 1, when the inequality is obvious.
It is easily checked that for p, q, r as above, with p + q > 0 and q + r > 0, there is equality in (3.4) iff one of the following three conditions holds, where in each case the condition on G, H, and A should be understood modulo events of probability 0: • either p > 0, q = 0, r > 0, a = 1, b = c = 0, meaning G ∩ H = ∅ and A = G; • or p = 0, q > 0, r > 0, a = 0, b = 1, c = 0, meaning G ⊆ H and A = G; • or p > 0, q > 0, r = 0, a = 1, b = c = 0, meaning H ⊆ G and A = GH c .
Consequently, there is equality in (3.2) iff one of these three conditions holds, either exactly as above or with G and H switched.
Lemma 3.3. Suppose that X = P (A | X) and Y = P (A | Y ) have discrete distributions. Fix 0 < δ < 1/2, and suppose that for each pair of possible (x, y) of (X, Y ) with |y − x| ≥ 1 − δ there is no other such pair (x , y ) with either x = x or y = y. Then Proof. Application of (3.3) gives for each pair (x, y) with |y − x| ≥ 1 − δ The assumption is that as (x, y) ranges over pairs (x, y) with |y − x| ≥ 1 − δ, the events (X = x) are disjoint, and so are the events (Y = y). So (3.5) follows by summation of (3.6) over such (x, y).
Proof of (1.8). In view of (1.7), and the examples (1.10) and (1.4), it is enough to establish (3.5) for 2 × 2 coherent (X, Y ) whose possible values are contained in the 4 corners of a rectangle R := [x 1 , x 2 ] × [y 1 , y 2 ] ⊆ [0, 1] 2 with x 1 < x 2 and y 1 < y 2 . Fix 0 < δ < 1 2 . Then {(x, y) : |y − x| ≥ 1 − δ} = T ∪ T for right triangles T and T in the upper left and lower right corners of [0, 1] 2 . If neither T nor T contains two corners on the same side of R, then (3.5) holds by the above lemma. Otherwise, by the reflection symmetries (1.3), it is enough to discuss the case when T contains the two left corners of R. Then T contains no more corners of R; for that would make Finally, for R with two left corners in T and two right corners not in T ∪ T , replacing (X, Y ) by (X, EY ) gives a 2 × 1 example with the same P (|X − Y | ≥ 1 − δ), which is at most δ by (1.7).
Proof of (1.9). This argument from [Pit14] was presented in [Bur16, Theorem 18.1], but is included here for the reader's convenience. The lower bound in (1.9) is obvious from (1.6). For the upper bound, it is enough to discuss the case δ ∈ [0, 1 2 ). Observe that (3.7)

Coherent distributions
The following proposition summarizes a number of known characterizations of the set of coherent distributions of (X, Y ), due to [DP80b], [GKRS91] and [DDM95].
Proposition 4.1. Let (X, Y ) be a pair of random variables defined on a probability space (Ω, F, P ), on which there is also defined a random variable U with uniform distribution, independent of (X, Y ). Then the following conditions are equivalent: (i) The joint law of (X, Y ) is coherent. either for all bounded measurable g, or for all bounded continuous g.
(iv) EX = EY = p for some 0 ≤ p ≤ 1, and Proof. Condition (i) is just (ii) for Z an indicator variable, while (ii) for 0 ≤ Z ≤ 1 implies (iii) for φ(X, Y ) = E(Z | X, Y ). Assuming (iii), (ii) holds with Z = 1(U ≤ φ(X, Y )) for the uniform [0, 1] variable U independent of (X, Y ). So  Proof. To check convexity, suppose that (X i , i ∈ I) is subject to the extension of (4.1). That is for some additional index * / ∈ I and X * = Z ∈ [0, 1], for all bounded continuous g and i ∈ I, (4.4) and the same for Y = (Y i , i ∈ I * ) instead of X, with I * := I ∪ { * }. Construct these random vectors X and Y on a common probability space with a Bernoulli(p) variable B p , with X, Y and B p independent. Let W := B p X + (1 − B p )Y , so the law of W is the mixture of laws of X and Y with weights p and 1 − p. Then (4.4) for X and Y implies (4.4) for W . The proof of sequential compactness is similar. Define X to be a subsequential limit in distribution of some sequence of random vectors X n := (X n,i , i ∈ I * ) subject to (4.4) for each n, to deduce (4.4) for X by bounded convergence.
Corollary 4.3. Let C be a non-empty set of distributions of X = (X i , i ∈ I) on R I that is compact in the topology of weak convergence, such as coherent distributions of X on [0, 1] I . Let G(x) := sup C P (g(X) ≤ x) for some particular continuous function g, and x ∈ R, where the sup C is over X with a distribution in C. Then is the cumulative distribution function of a random variable γ which is stochastically smaller than g(X) for every distribution of X in C: γ ≤ d g(X).
Proof. By definition of G(x), for each fixed x there exists a sequence of random vectors X n with distributions in C such that F n (x) := P (g(X n ) ≤ x) ↑ G(x). By compactness of C, it may be supposed that X n d → X, meaning the distribution of X n converges to that of some X ∈ C. That implies g(X n ) d → g(X). Let F (x) := P (g(X) ≤ x). Since F n (x) and F (x) are the probabilities assigned by the laws of g(X n ) and g(X) to the closed set (−∞, x], [Bil95,Theorem 29.1] gives For (ii), the only property of a cumulative distribution function that is not an obvious property of G is right continuity. To see this, take x n ↓ x and X n with P (g(X n ) ≤ x) = F n (x) such that F n (x n ) = G(x n ), and X n d → X with distribution in C. Let F (x) := P (g(X) ≤ x). Then for each fixed m, by the same result of [Bil95], Returning to discussion of a just pair random variables (X, Y ) with values in [0, 1] 2 , as in Proposition 4.1, suppose further that X and Y are independent, with EX = EY = p. Then the inequality (4.3) becomes A check on this claim is to try to confirm it first with additional assumptions, such as independence of X and Y , using (4.5). But this does not seem easy. It leads rather to: Conjecture 5.2. If (X, Y ) is coherent, and X and Y are independent, then Equality is attained in (5.1) for independent X and Y with The method of proof of (1.8) establishes (5.1) for 2 × 2 laws of (X, Y ). But like Claim 1.2, the extension of (5.1) to general distributions of X and Y seems quite challenging. The problems solved by (1.8) for t(X, Y ) = 1(|X − Y | ≥ 1 − δ) and by the case n = 2 of (2.7) for t(X, Y ) = X ∨ Y , are instances of the following more general problem, with further variants as above, assuming X and Y are independent.
Problem 5.3. [DP80b, p.224] Given some target function t(X, Y ) defined on [0, 1] 2 , evaluate sup C Et(X, Y ), the supremum of Et(X, Y ) as the law of (X, Y ) ranges over the set C of coherent laws on [0, 1] 2 . Or the same for C(p), coherent laws of (X, Y ) with EX = EY = p.
This problem seems to be open even for XY , or |X − Y | r for r = 1, when (1.13) gives only a crude upper bound. Another instance of this problem is to evaluate ε(δ, p) := sup For each δ ∈ (0, 1), examples of coherent (X, Y ) with are the 2 × 2 example (1.4), say (X δ , Y δ ), its reflection (1 − X δ , 1 − Y δ ), and any mixture of these two laws, which is a 4 × 4 law in C(p) for p between p(δ) and 1 − p(δ). So p(δ) ≤ ε(δ, p) ≤ ε(δ) for p between p(δ) and 1 − p(δ). (5.5) If Claim 1.2 is accepted, both inequalities are equalities for δ ∈ (0, 1 2 ]. But that leaves open: Problem 5.4. Find ε(δ, p) for δ ∈ (0, 1 2 ], and p not covered by (5.5). For a bounded upper semicontinuous t, such as the indicator of a closed set, the sup C Et(X, Y ) will be attained at a distribution of (X, Y ) in ext(C), the set of extreme points of the compact, convex set C of coherent distributions [BS91]. This leads to: For the particular target functions t involved in (2.7) and in Claim 1.2, the sup C Et(X, Y ) is attained by 2 × 2 distributions of (X, Y ). Hence the following: Conjecture 5.6. Every extreme coherent law of (X, Y ) is a 2 × 2 law.
Let M be the convex, compact subset of C comprising laws of two term martingales (X, Y ), with X = E(Y | X), Y ∈ [0, 1]. It is elementary and well known that ext(M) is the set of 1 × 2 laws of (p, Y p ) for two-valued Y p ∈ [0, 1] with E(Y p ) = p. But the extension of this result conjectured above does not seem obvious. It may be relatively easy to settle whether or not every extreme m × n coherent (X, Y ) is actually 2 × 2, for some small m and n. In view of (4.3), for any particular t, the evaluation of sup C Et(X, Y ), with restriction to a fixed set of m values for X and n values for Y , is a linear programming problem, with a finite number of constraints depending on the given values. This problem may be solved by modern programming techniques, at least for small m and n. By solving such 2 × 3 problems, a solution might be found which is not attained by any 2 × 2 coherent law. Then Conjecture 5.6 would be false. On the other hand, if Conjecture 5.6 is true, that would increase interest in the structure of 2 × 2 extreme laws. The following proposition is easily proved using (4.3): Proposition 5.7. For each a rectangle R = [x 1 , x 2 ] × [y 1 , y 2 ] ⊆ [0, 1] 2 , let C 2×2 (R) denote the set of coherent laws of (X, Y ) on the corners of R. Then • C 2×2 (R) is non-empty iff R intersects the diagonal {(p, p), 0 ≤ p ≤ 1}, that is iff x 1 ∨ y 1 ≤ x 2 ∧ y 2 . • If x 1 ∨ y 1 = x 2 ∧ y 2 = p, then (p, p) is a corner of R, and the unique law in C 2×2 (R) is degenerate with X = Y = p. • If x 1 ∨ y 1 < x 2 ∧ y 2 , the set ext C 2×2 (R) of extreme points of the convex set C 2×2 (R) is identical to the set of all extreme coherent laws supported by the set of corners of R. This set of laws ext C 2×2 (R) forms a convex polygon in a 2-dimensional affine subspace of the set of probability distributions on those corners, with at least 2 and at most 8 vertices.
Examples show that the number of vertices of this polygon varies as a function of the rectangle R, from 2 if R is pushed into a corner of [0, 1] 2 , to at least 6 for some more central locations.
Regardless of the status of Conjecture 5.6, this leads to: