A universal approach to matching marginals and sums

For a given set of random variables X1, . . . , Xd we seek as large a family as possible of random variables Y1, . . . , Yd such that the marginal laws and the laws of the sums match: Yi d =Xi and ∑ i Yi d = ∑ i Xi. Under the assumption that X1, . . . , Xd are identically distributed but not necessarily independent, using a symmetry-balancing approach we provide a universal construction with sufficient symmetry to satisfy the more stringent requirement that, for any symmetric function g, g(Y ) d = g(X). The same ideas are shown to extend to the non-identically but “similarly” distributed case.


Introduction
While the multivariate normal distribution is widely used for its tractability and general applicability, there are many situations of great significance where departure from normality is necessary. Such deviation may exist at the univariate level but may rest solely with the dependence structure at hand. The latter refers to situations where the marginal distributions are all normal without the multivariate distribution being normal. Modelling such multivariate distributions is often achieved with the use of copulas. The flexibility afforded by such a construction is limitless allowing, in the bivariate case, a dependence structure that covers the entire spectrum from co-monotonicity (perfect positive dependence) to counter-monotonicity (perfect negative dependence). See Sklar (1959).
Beyond the normal setting, modelling dependencies between stochastic variables is of interest to many areas of applications, notably insurance and finance. While the assumption of independence is technically convenient, in reality it usually does not hold, and one often resorts to copulas to generate more realistic dependence structures in a variety of fields of application.
In a recent paper, GH (2020), the authors produced a characterisation, by means of mean square expansions, of all multivariate distributions whose marginals and sums coincide with those of a set of independent random variables that belong to the same Meixner class. This characterisation is shown to enable specific constructions via finite (truncated) expansions and appropriate compensations. The Meixner class was identified in Meixner (1934) as the family of distributions for which the generating function of the associated orthogonal polynomials takes a specific form. It is made up of five types of distributions: normal, generalised gamma, generalised Poisson, generalised (negative) binomial and generalised hypergeometric. See Eagleson (1964); GH (2020) for more details.
In this paper, we propose a universal construction that goes beyond the Meixner class of distributions, beyond the independent setting and beyond the limitations of the finite expansion method.
Given random variables X 1 , . . . , X d , we seek random variables Y 1 , . . . , Y d , such that, construction we use is universal in the sense that it produces a large family of copulas that meet the above requirements irrespective of the marginals under consideration, albeit under the assumption that the random variables X 1 , . . . , X d are identically distributed.
A construction in the non-identically distributed case is given in Section 4. Section 2 contains the main (copula) construction and Section 3 provides the sought answer in the case of identically distributed random variables.
To the best of our knowledge the only known construction, other than those given in our recent paper, GH (2020), is due to Stoyanov (2013).
While the above example provides one specific construction in the two-dimensional Gaussian and independent case, it does not shed any light on whether other solutions exist and how to construct them, nor does it fulfil the aspiration to extend the problem beyond the Gaussian and independent case.
In this paper, we propose to answer the question of matching marginals and sums in considerable generality. After listing a few basic facts that guide our discovery, we develop in Section 2 a symmetry-balancing approach that delivers sufficient symmetry to satisfy the more stringent requirement that, for any symmetric function g, g(Y 1 , . . . , Y d ) d = g(X 1 , . . . , X d ). The construction is universal in that it applies to any (X 1 , . . . , X d ) as long as X 1 , . . . , X d admit a joint density f .

Matching marginals and sums
While ϕ defines a pair that matches the law of g(X 1 , X 2 ), for all symmetric functions g, it does not necessarily preserve the marginal laws of f . To do so we need to also "compensate" in the x 1 and x 2 directions. We call this a symmetry-balancing approach.
The proposed construction is universal in that it is described through the use of a "copula perturbation" that can then be applied to any distribution. It offers a continuum of settings that can be used to model a wide range of dependence structures, through perturbations of varying types and sizes.
Let c be the copula density of f . For any measurable γ : [0, 1] × [0, 1] → [0, +∞), the function θ γ described in the figure below, and supposed to be non-negative, is a copula density we call the octal copula.
Furthermore, for any bounded and symmetric g, The proof is given in Theorem 2.3 in the more general setting of the d-dimensional case, d ≥ 2.
We note that the Stoyanov example has a copula that is of the octal form.
• id denotes the identity function.
• G(R d ) and G([0, 1] d ), or simply G d , denote the sets of symmetric real-valued functions g on R d and [0, 1] d , respectively; that is for any β ∈ S d , g • σ β = g.
The next lemma shows that we can essentially partition the hypercube into 2 d d! regions that all map onto the reference region ∆(0, id).
A necessary step in the construction is the embedding of the (d − 1)-dimensional hypercube as a hyperplane in the d-dimensional hypercube and how the corresponding partitions carry across. To that end, we need the following notations and results.
with the obvious adjustments for the cases k = 1 and k = d.
Since Ξ d has Lebesgue measure zero, any integral over [0, 1] d can be taken to mean Theorem 2.3. Let c be a density on [0, 1] d , U be a multivariate random variable with density c, γ be an integrable non-negative function on ∆(0, id) and ε : Assume (D) and let V be a multivariate random variable with density θ ε,γ .
where α and β are defined as in Lemma 2.2. It follows that which concludes the proof.
Theorem 2.5. Further to the setting of Theorem 2.3, we assume that γ is bounded from above and away from 0; i.e. we assume that inf(γ) = inf u∈∆(0,id) γ(u) > 0 and sup(γ) = sup Proof. The sufficiency of both statements was shown in Theorem 2.3. k)), q 0 = 0 and q d = 1/2. We reason by contradiction and assume that {j ∈ [d] : λ j = 0} = ∅. Then γ(ω j (q, r))dr ≥ j:λj <0 λ j sup(γ)(q j − q j−1 ) + j:λj >0 λ j inf(γ)(q j − q j−1 ) and shrinking q j − q j−1 whenever λ j < 0 (and therefore expanding it whenever λ j > 0) shows that the right hand side can be made strictly positive, thus contradicting the assumption that the left hand side is nil. Of course, if there is no j such that λ j > 0, then the inequality can be reversed and the left hand side shown to be strictly negative, leading to a contradiction.
Since lim r→0 Leb(∆ j (0, id, r)) = 0, for j ∈ {2, . . . , d}, and γ is bounded, by making r approach 0, the second term in the right hand side can be made as small as we want, while the first term is strictly positive. It follows that the left hand side can be made strictly positive thus contradicting the fact that it must be nil for all r. We deduce that λ 1 must be nil and (2.5) becomes ∀ r ∈ (0, 1), d j=2 λ j 1 r ∆j (0,id,r) γ(ω j (v,r))dv = 0.
Again, we assume (wlog) that λ 2 > 0. Then r))dv and the second term in the right hand side can be made as small as we want, while the first term is strictly positive. It follows that the left hand side can be made strictly positive thus showing that λ 2 must be nil. We continue this way, adjusting (2.5) by increasing powers of r, to prove that λ 1 = . . . = λ d−1 = 0 and finally that λ d = 0 since Proof. Sufficiency is immediate. We prove necessity by induction on d. First we observe that for any β ∈ S d , for any k ∈ we obtain the required identity. We shall therefore prove that, for any fixed β ∈ S d , the condition ∀ k ∈ [d], ∀ a ∈ {0, 1} d−1 , ε(ω k (a, 0), β) + ε(ω k (a, 1), β) = 0 implies the desired statement. As β is fixed throughout, we write ζ(α) for ε(α, β).
The case d = 2 can easily be checked. Suppose the necessity true for d − 1. Then setting the first component in α and a to 0 reduces the dimensionality of the problem by 1 and leads to Similarly, setting the first component in α and a to 1 again reduces the dimensionality of the problem by 1 and leads to (ω 1 (0, 1)). Now taking k = 1 and a = 0 leads to ζ(ω 1 (0, 1)) = −ζ(0) and concludes the proof.

The case of identically distributed random variables
We are now ready to deal with the case of d identically distributed arbitrary random variables. We stress here that we do not assume that the random variables are independent.
Proposition 3.1. Suppose that X 1 , . . . , X d are identically distributed, that X = (X 1 , . . . , X d ) has copula density c, marginal distribution function Φ and marginal density φ, so that its density is For any (ε, γ) satisfying conditions (G) and (C 0 ) of Theorem 2.3, generates a random variable Y = (Y 1 , . . . , Y d ) that satisfies the requirements that, for Then U has density c and h ∈ G([0, 1] d ). Letting V be a random variable with density θ ε,γ , Example 3.2. Let Φ be the distribution function and φ be the density of the standard normal distribution. Then for any γ ≤ 1, is the density of a d-dimensional random variable Y for which all (d − 1)-dimensional marginals are independent and identically distributed standard normal random variables, Y 1 + . . . + Y d is normal with mean 0 and variance d, and Y is non-Gaussian.

The case of non-identically distributed random variables
Can the above construction extend to the case of non-identically distributed (and non-independent) random variables? To answer this question, we return to the twodimensional case. Let s 1 , s 2 and s 12 be the reflections s 1 (u 1 , u 2 ) = (1 − u 1 , u 2 ), s 2 (u 1 , u 2 ) = (u 1 , 1 − u 2 ) and s 12 (u 1 , u 2 ) = (u 2 , u 1 ).
These three involutions are such that s 1 s 2 = s 2 s 1 , s 1 s 12 = s 12 s 2 and s 2 s 12 = s 12 s 1 . It follows that they generate a finite group R = {id, s 1 , s 2 , s 12 , s 1 s 2 , s 1 s 12 , s 2 s 12 , s 1 s 2 s 12 }, the dihedral group of order 8.
Each element of R corresponds to one of the eight regions ∆(α, β), and ε(α, β) of Section 2 is simply (−1) |s| , where |s| is the word length of s, that is the number of generators in the decomposition of s (modulo 2).
for an appropriate reference region ∆, one for which {s(∆), s ∈ R} forms a measurable partition of [0, 1] 2 . The next example illustrates the difficulty we face in general.
In the above example the identity s 1 s 12 = s 12 s 2 fails resulting in R being infinite. This identity translates in the language of the previous sections to σ β (τ α (u)) = τ σ β (α) (σ β (u)) which was crucial in our construction.
In order to retain the identity s 1 s 12 = s 12 s 2 we introduce the following notion.
Definition 4.2. Two random variables are said to be similarly distributed if their distribution functions Φ 1 and Φ 2 , assumed to be continuous and strictly increasing (on some interval), satisfy the identity Note that if a random variable X has a strictly increasing and continuous (on some interval) distribution function Φ, then Ψ(x) = Φ −1 (1−Φ(x)) is the only strictly decreasing and continuous measure-preserving map of X: Ψ(X) d = X. Proposition 4.3. Two identically distributed random variables are necessarily similarly distributed and two symmetrical distributions around the same median are similarly distributed.
While it is possible to approach this situation via copulas, other than in the identically distributed case, the resulting θ turns out to depend on Φ 1 and Φ 2 making it not universal and therefore less desirable. Instead, we apply the symmetry-balancing approach directly to the density.