A Coupling Proof of Convex Ordering for Compound Distributions

In this paper, we give an alternative proof of the fact that, when compounding a nonnegative probability distribution, convex ordering between the distributions of the number of summands implies convex ordering between the resulting compound distributions. Although this is a classical textbook result in risk theory, our proof exhibits a concrete coupling between the compound distributions being compared, using the representation of one-period discrete martingale laws as a mixture of the corresponding extremal measures.


Introduction
Consider an i.i.d. sequence of nonnegative random variables X = (X i ) i 1 , and two integer-valued random variables M, N , independent from the sequence X . Assume that E(X 1 ) < +∞, E(M ) < +∞, E(N ) < +∞, and that a comparison between M and N holds with respect to the convex 1 ordering: M ≺ cx N . We then have the following comparison between the compound variables X 1 + · · · + X M and X 1 + · · · + X N with respect to the convex ordering: (1) X 1 + · · · + X M ≺ cx X 1 + · · · + X N .
This is a classical result (see e.g. Theorem 4.A.9 in [17] or Theorem 4.3.6 in [13]) 2 , useful in the context of risk theory (see e.g. [10], chap. 7) where its interpretation is that compounding with a riskier frequency distribution leads to a riskier aggregated loss distribution. The proof given in [17] is analytical in nature, and consists in showing that, given a non-decreasing convex function f : [0, +∞[→ R, the sequence (u n ) n 0 defined by u n = Ef (X 1 + · · · + X n ) satisfies, for all n 0, the condition u n+2 − u n+1 u n+1 − u n (provided that u n+2 , u n+1 , u n have a finite value) 3 . Here, we give an alternative proof of (1), based on a coupling between the two random variables being compared, which provides an explicit realization of Strassen's condition (see [18] or e.g. [13] section 1.5.2) for convex ordering: we construct a pair of random variables (A, B) such that (2) A d = X 1 + · · · + X M , B d = X 1 + · · · + X N , and A = E(B|σ(A)) a.s.
Our proof relies a representation result which we call a diatomic representation of convex ordering, stated as Theorem 1 in Section 2, where we review several approaches for proving the existence of this representation, including an explicit algorithm in the case of discrete distributions. Section 3 contains the coupling construction leading to (2) and the proof of (1). Finally, in Section 4, we discuss various extensions of these results.

Diatomic representation
Given two real numbers x < y, we define, for all z the normalized barycentric coordinates: In the case x = y, we extend the above definition by setting α x,y (z) = 1 and β x,y (z) = 0. Whenever x z y, both α x,y (z) and β x,y (z) lie in the interval [0, 1], and z can be written as the convex combination of x and y : z = α x,y (z) · x + β x,y (z) · y. Theorem 1. Given two probability distributions µ, ν on R possessing a finite expectation, the comparison µ ≺ cx ν holds if and only if there exists a triple of random variables (V − , U, V + ) defined on the same probability space and such that: We call such a triple (V − , U, V + ) a diatomic representation of the stochastic ordering µ ≺ cx ν.
A more concrete statement of (5) is that Law(V ) = ν, where V is a random variable whose conditional distribution with respect to V − , U, V + is given by: Theorem 1 may not have been stated under this specific form in the mathematical literature, but its content is certainly not new. In the following subsections we review several ways of proving this result.
2.1. Proof of Theorem 1 via Choquet's and Douglas' theorems (for compactly supported measures ν). We prove here the result only in the case of measures µ ≺ cx ν concentrated on some closed interval K.
Consider the following space of measures on R 2 : Here M + (K 2 ) is the space of finite positive Borel measures on R 2 with support contained in K 2 , and C b (R) the space of real-valued continuous and bounded functions on R. In other words, denoting by F the space of functions f a,b : (u, v) ∈ R 2 → a + b(u)(v − u), the above definition reads: It turns out that S K is a non-empty space of probability measures, known as martingale measures in the literature since for (X 1 , X 2 ) ∼ π ∈ S K , the process (X i ) i=1,2 is a martingale on its natural filtration.
Our task amounts to proving that a measure π ∈ S K is represented as a mixture of 'triplet' measures of the form δ z ⊗ [αδ x + βδ y ], where z = αx + βy and α, β are nonnegative and satisfy α + β = 1.
We admit without proof that S K is convex and compact for the weak topology. Choquet's Theorem (see [3] or e.g. [15]) then implies that every measure in S K can be represented as a mixture of extremal measures in S K . So we shall be done as soon as we can prove that the extremal measures in S K are triplet measures concentrated on S K . Let η be such an extremal measure and µ, ν its marginals. We aim at proving that µ is supported on a single point and that ν is supported on at most two points. Striving for a contradiction suppose that µ is not a Dirac measure. Hence, there exists a Borel set E such that µ(E) / ∈ {0, 1}. We consider the L 1 (η) distance between f : (u, v) → 1 E (u) and the functions f a,b ∈ F . Letting (U, V ) be distributed according to η we find Note that this lower bounds only depends on f . We have thus proved that F is not dense in L 1 (η). According to Douglas's theorem [4] (see also [16, Chapter V]) this in contradiction with the fact that η is extremal. Thus η is of type δ u 0 ⊗ν where u 0 is the barycenter of ν, i.e v dν(v) = u 0 . Next, again striving for a contradiction suppose that there exists a partition (A i ) i∈{1,2,3} of R such that ν(A i ) > 0 holds for every i. From Douglas's Theorem again the set F of functions is dense in L 1 (η). In particular any function g c 1 ,c 2 ,c 3 : is clearly linear and onto. It follows that the linear map is onto as well, a contradiction. Therefore, extremal measures of S K are of type δ z ⊗ [αδ x + βδ y ] where z = αx + βy, as it was required.

2.2.
Proof of Theorem 1 via Strassen's theorem (general case). Another approach to proving Theorem 1 is to use Strassen's theorem instead of Choquet's and Douglas's theorems: there exists a kernel k : R → P(R) such that µ almost surely k u = k(u) has mean u and it holds µ · k = ν.
(Such kernels are known as dilations or martingale kernels in the literature.) Hence the mixture with weight µ of the measures δ u ⊗ k u defines a probability measure π on R 2 whose marginals are µ and ν. To complete the proof, it remains to check that each measure k u can be represented as a mixture of diatomic measures with mean u. This last fact is a classical step in the proof of Skorokhod's representation theorem (see e.g. [ Remark. The search for martingale kernels k : u → k u is a key question in the field of martingale optimal transport.The first completely canonical method seems to be the left-curtain coupling by Beiglböck and Juillet [1] that is also of particular interest to us. Not only is k u canonical but when µ is diffuse its kernels k u are automatically diatomic (this also holds for the former coupling by Hobson and Neuberger [6] under more general assumptions). This is not the case if µ possesses atoms. However a quantile version of the left-curtain coupling is described in a second paper by the same authors [2] where the martingale measure π directly appears as a mixture over the set [0, 1] of quantile levels ω ∈ [0, 1] of diatomic kernels δ zω ⊗ (α xω,yω (z ω )δ xω + β xω,yω (z ω )δ yω ) where z ω is the ω-quantile of µ. Note that the same can be said of the recent coupling by Jourdain and Margheriti [8].
2.3. Algorithmic proof of Theorem 1 (for finitely supported µ, ν). We now describe an explicit algorithmic construction, inspired by [1,2], leading to a diatomic decomposition in the case where both µ and ν are finitely supported probability measures. This algorithm is used to produce the simulation shown in Fig. 1.
The reason why the above algorithms stops lies in the fact that the transformation performed on µ * and ν * keeps the comparison µ * ≺ cx ν * valid throughout the execution of the algorithm (see the proof of Lemma 2.8 in [1]), with µ * and ν * having equal total mass. As a consequence, as long as µ * and ν * do not have zero total mass, the comparaison µ * ≺ cx ν * ensures that a triple (v j− , u i , v j+ ) satisfying conditions a.-b.-c. exists. Finally, since at each step at least one of the three numbers ν * (v j− ), µ * (u i ), ν * (v j+ ) is set to zero, the total mass of both µ * and ν * must reach zero after a finite number of steps.
Remark. If, in the loop part of the algorithm, one systematically choses the unique triple (v − , u, v + ) such that u is the leftmost point in the support of µ * (such a choice is always possible, see the proof of Lemma 2.8 in [1]), the end-result of the algorithm is the so-called left-curtain coupling.

Coupling construction
We now describe the coupling construction leading to our proof of (1). Consider a triple (N − , M, N + ) as in Theorem 1, with µ = Law(M ) and ν = Law(N ), and an i.i.d. sequence X = (X i ) i 1 independent from (N − , M, N + ). For all integer k 1, we let S k = k i=1 X i , with the convention that S 0 = 0. Finally, we let F = σ(M, N − , N + , S N − , S N + ) and G = σ(M, N − , N + , S M , S N − , S N + ). Note that G = F ∨ σ(S M ).
We start our construction by setting: Next, we specify B by the requirement that, conditional upon G, the distribution of B is: Note that α S N − ,S N + (A) and β S N − ,S N + (A) do indeed lie in the interval [0, 1] thanks to the assumption that the random variables X i are nonnegative, so that We now proceed to checking that all three properties listed in (2) are satisfied by the random variables A and B. From (6), it is immediate that A has the required distribution. Moreover, from the definition of α and β, one has the identity which rewrites as: whence, taking the conditional expectation E(·|σ(A)) on both sides, and using the fact that σ(A) s., as required by (2).
To conclude the proof, it remains to check that B d = S N . By construction, the conditional distribution of B given F can be written as: Now observe that, by symmetry, given integers n − m n + such that n − < n + , we have: from which we deduce that 4 E α Sn − ,Sn + (S m ) σ(S n − , S n + ) = α n − ,n + (m) a.s. and E β Sn − ,Sn + (S m ) σ(S n − , S n + ) = β n − ,n + (m) a.s.
As a consequence, the conditional distribution of B given F is none but: From the fact that the sequence X = (X i ) i 1 is independent from (N − , M, N + ), and from condition (5), we deduce that which concludes the proof. 4 In the case where n− = m = n+, these identities are still (obviously) valid.

Final remarks and extensions
4.1. Continuous time. We note that the coupling construction described in Section 3 can be extended in continuous time. For instance, let N = (N t ) t 0 be a standard Poisson process and S ≺ cx T nonnegative integrable random variables independent from N . Then one has N S ≺ cx N T , and it is straightforward to extend our approach to define a pair of random variables (A, B) such that The same approach still works in exactly the same way if we consider an integrable subordinator instead of a Poisson process.

4.2.
Exchangeable random variables. If the sequence of random variables (X i ) i 1 is assumed to be exchangeable instead of i.i.d., the coupling construction described in Section 3 works in exactly the same way. (The classical proof found in [17,13] also works in this case.) Note that, in the case of an infinite exchangeable sequence of random variables, one can directly deduce (1) from the i.i.d. case, using the De Finetti representation of such a sequence as a mixture of (distributions of) i.i.d. sequences, and the characterization of (1) through the inequality (9) Ef (X 1 + · · · + X M ) Ef (X 1 + · · · + X N ) for all convex functions f . On the other hand, if M and N are assumed to have finite support, say {0, 1, . . . , q}, and one considers a finite exchangeable sequence of random variables X 1 , . . . , X q , the extension of De Finetti's theorem to this case (see [12,7]) leads in general to a signed mixture of i.i.d. sequences, so one cannot integrate the inequality (9) with respect to the mixing measure in order to directly deduce (1).

4.3.
Increasing convex ordering. Assume that a comparison between M and N holds with respect to the increasing convex ordering: M ≺ icx N . We then have the following modified version of (1): To deduce (10) from (1), we note that the comparison M ≺ icx N implies that there exists an integer-valued 5 random variable N 0 such that M ≺ st N 0 and N 0 ≺ cx N , where ≺ st denotes the usual stochastic ordering. Given an 5 The existence of a random variable N0 such that M ≺ st N0 and N0 ≺cx N is a classical and easily proved decomposition result for the increasing convex order. That N0 can, in addition, be chosen to be integer-valued is less standard. One possible proof of this fact is that, when µ ≺ icx ν, there exists a kernel k : R → P(R) such that µ almost surely k u = k(u) has mean u and it holds µ · k = ν. In turn, k u can be written as ≺cx ν and a coupling of the corresponding random variables can also be easily deduced. Alternatively for another proof one can consider Kellerer's kernels defined in [11, §2.1] for connecting µ ≺ icx ν.
i.i.d. sequence X = (X i ) i 1 independent from M, N 0 , N , the fact that the X i s are nonnegative random variables, combined with M ≺ st N 0 , yields the comparison X 1 + · · · + X M ≺ st X 1 + · · · + X N 0 . Then, using (1), we deduce that nonnegative random variables, and, in addition to M ≺ icx N , assume that X i ≺ icx Y i . We then have the following extension of (10): which is a straightforward consequence of (10) and of the fact that the increasing convex ordering is preserved by convolution. (1) when the X i are not positive random variables. We let M ≡ 1 and N ∼ 1