Dependence uncertainty bounds for the energy score and the multivariate Gini mean difference

The energy distance and energy scores became important tools in multivariate statistics and multivariate probabilistic forecasting in recent years. They are both based on the expected distance of two independent samples. In this paper we study dependence uncertainty bounds for these quantities under the assumption that we know the marginals but do not know the dependence structure. We find some interesting sharp analytic bounds, where one of them is obtained for an unusual spherically symmetric copula. These results should help to better understand the sensitivity of these measures to misspecifications in the copula.


Introduction
In recent years the so-called energy distance became a famous tool in multivariate statistics used e.g., for goodness-of-fit tests and many other things. For a good overview over this topic we refer to Szekely and Rizzo (2017). Similar concepts have been suggested in the theory of multivariate probabilistic forecasting, where the so-called energy score has been suggested as a strictly proper scoring rule for multivariate distributions in the fundamental paper of Gneiting and Raftery (2007). Both concepts rely on functionals that are based on expected distances of independent copies of random vectors. This is related to the multivariate Gini mean difference, which has been studied in detail in Koshevoy and Mosler (1997). In the univariate case the Gini mean difference is a well-known measure of spread of distributions or inequality in case of income distributions, see e.g., Yitzhaki et al. (2003) for an overview.
In goodness-of-fit testing as well as in probabilistic forecasting one is interested in detecting misspecifications of stochastic models. Therefore it is an important question how sensitive the used functionals react to which kind of misspecification. Pinson and Tastu (2013) studied the discrimination ability of the energy score for the case of multivariate normal distributions. Based on simulation studies they conclude that the discrimination ability of the energy score may be limited when focusing on the dependence structure of multivariate probabilistic forecasts, but to the best of our knowledge there has been no general study of this problem so far for general distributions.
In this paper we want to study this problem of so-called dependence uncertainty bounds for such quantities like the energy score and the Gini mean difference. By dependence uncertainty bounds we mean here bounds for a functional of a multivariate distribution under the assumption that we only know the marginal distributions but do not know the dependence structure, i.e., we do not know the copula. The study of such uncertainty bounds has a long history going back to Höffding (1940) and Fréchet (1951). They considered this problem for correlation coefficients and for the value of cumulative distribution functions. In the meanwhile there is a vast literature on this topic for many kinds of functionals. For an overview see Puccetti and Wang (2015). Very often the extremal positive dependence is given by the comonotone copula, in particular if the functional is an expectation of a supermodular function, as has been shown in Tchen (1980) and Rüschendorf (1983). It is typically more complicated to find the extremal negative dependence, even in the case of expectations of functions and thus linear functionals of the distributions, which is the case for most problems considered in the literature. An example of a non-linear problem is the case of finding the solution of an optimal stopping problem that was considered in Müller and Rüschendorf (2001). In such a case of a non-linear problem the characterization of the extremal dependence can be very different from the case of a linear problem.
In this paper we also deal with a non-linear problem, but it will turn out that still the comonotone copula will typically lead to the extremal positive dependence. But for the extremal negative dependence we find in some cases a very interesting solution based on a spherical symmetric copula. This is an interesting copula, which does not seem to be well-known in the dependence modelling community.
The paper is organized as follows. In Section 2, we recall the definitions of the various concepts. We also introduce some important notation that will be used throughout the manuscript and present the problem that is considered in this paper. In Section 3, we focus on the expected distance between two multivariate distributions and its sensitivity to dependence uncertainty. Finally in Section 4, we provide a number of results on the dependence uncertainty bounds on the energy score. Section 5 concludes with a number of open questions that are left for future research.
2 Energy score and Gini mean difference Let X,X be independent copies of a d-dimensional random vector with cumulative distribution function (cdf) and Y ,Ỹ be independent copies of a random vector with cdf G. We define the expected distance between two independent d-dimensional samples of F and G as where we identify the cdfs F and G with the corresponding probability measures and denote as usual by x 2 i the Euclidian distance. For an observation y, we similarly define by identifying y with the one-point measure in y The energy distance between two distributions F and G is defined as This is a distance between probability distributions, as it can be shown that E(F, G) ≥ 0 for all F, G and that E(F, G) = 0 if and only if F = G. For details of this concept and applications we refer to the overview article of Szekely and Rizzo (2017). A strongly related concept is the so-called energy score for a distributional forecast F and an observation y, which is given by This can be generalized by introducing a parameter β ∈ (0, 2) as already considered in the fundamental paper of Gneiting and Raftery (2007).
Definition 1. For β ∈ (0, 2), the generalized expected distance between two independent d-dimensional samples of F and G is defined as Notice that the Gini mean difference M (F ) = S(F, F ) is the mean of F ♦F and more general M β (F, G) = E(Z β ) is the corresponding moment of order β.
Example 4. In case of standard uniform distributions U, V on (0, 1) for F = G we get for Z the density f Z (z) = 2 − 2z on [0, 1] and thus with the special cases S(F, F ) = M (F ) = EZ = 1 3 and in the limiting case when β = 2 we get EZ 2 = 1 6 = 2var(U ).

Dependence Uncertainty Bounds
We want to study how sensitive these quantities are with respect to the dependence information in the joint distribution. Therefore we investigate bounds for such expressions given that we only know the marginals of F and G. As usual we denote the marginals by By Sklar's theorem we can write the joint cumulative distribution F of X in the form for come copula C, see e.g., Nelsen (2007). We denote by C the set of all possible copulas and by F = F(F 1 , . . . , F d ) the so-called Fréchet class of all multivariate distributions with given marginals F 1 , . . . , F d . The well-known Fréchet bounds are denoted by and C + and C − will be the corresponding copulas, typically called the comonotonic and countermonotonic copula, where one has to take into account that C − is only a copula for d = 2.
Similarly, for a joint cumulative distribution function G of X, we denote by G, the Fréchet class of all multivariate distributions with given marginals G 1 , . . . , G d .
We first study the corresponding dependence uncertainty bounds on the generalized expected distance between two independent d-dimensional samples from F and G. These bounds can then be written as when both samples come from the same distribution F , and by when the multivariate distributions F and G are not identical.
For a given observation y, we then study dependence uncertainty bounds for its generalized energy score by considering The optimizations in (3), (4) and (5) are over the Fréchet class F and G. In fact, given that the marginal distributions are given, the uncertainty bounds can also been considered as solutions of optimization problems over the class C of all copulas.

Dependence uncertainty bounds for S β
In this section, we provide analytic bounds for the expressions (3) and (4). To do so, we first look at some fundamental properties of S β and ES β .
It is well-known (see for instance Gneiting and Raftery (2007)) that the generalized energy distance E β is a distance for β ∈ (0, 2), i.e., E β (F, G) ≥ 0 for all F, G and therefore also ES β (F, y) ≥ 0 for all F and y. Moreover, this implies by definition (1) also that It is also well-known that ES β is a proper scoring rule, meaning that Note also that ES β (F, F ) = 1 2 S β (F, F ). We can derive the following lemma on the concavity of S β (F, F ).
Proof. Indeed, S β is linear in F and G and therefore we get for α ∈ (0, 1) from (6) that

Lower bound on S β (F, G)
For finding a minimum value of S β (F, G) the following representation is going to be helpful. where where we know the marginals of (Z 2 1 , . . . , Z 2 d ) and the function f has the following properties: as a concave function of the sum it is submodular, i.e., −f is supermodular. For the definition and properties of supermodular functions and their relevance for inequalities of expectations in case of distributions with given marginals we refer to Chapter 3 in Müller and Stoyan (2002).
From this we can derive the following lower bound.
Theorem 6. For any random vector X and Y with cumulative cdfs F and G we get the following lower bound: for some standard uniform random variable U .
In case of identical marginals F 1 = . . . = F d and G 1 = . . . = G d this bound is sharp and is obtained for the upper Fréchet bounds F = F + and G = G + . Furthermore, in this case it reduces to Proof. According to (8) we can write S(F, G) = Ef (Z 2 1 , . . . , Z 2 d ) for a submodular function f . It follows from Tchen (1980) that a lower bound for Ef (Z 2 1 , . . . , Z 2 d ) is obtained by assuming that the copula of (Z 2 1 , . . . , Z 2 d ) is given by the upper Fréchet bound C + , or equivalently the copula of (Z 1 , . . . , Z d ). This means that we can assume that Z 2 i = ((F i ♦G i ) −1 (U )) 2 for some fixed uniform U and from this the first assertion immediately follows.
If all marginals of F and G are the same, then we have for F + and G + that X = (X 1 , . . . , X 1 ) and Y = (Y 1 , . . . , Y 1 ) and thus we get in (8) also the equality Z 1 = . . . = Z d and therefore this lower bound is attained and reduces to Note that S β (F + , G + ) may not be a lower bound when the marginals of F and G are not identical. Consider X = (4U 1 , U 1 ) and Y = (U 2 , 4U 2 ) for two independent uniform U 1 , U 2 . Then Z 1 is large if U 1 is large and Z 2 is large if U 2 is large and they are far away from being comonotone. The support of (Z 1 , Z 2 ) is given in the left panel of Figure 1 and does not correspond of the support of the upper Fréchet bound. The lower bound is obtained for something different from F + and G + . In fact, if one takes for F the lower Fréchet bound F = F − and G = G + instead, i.e., X = (4U 1 , 1 − U 1 ) and Y = (U 2 , 4U 2 ), then we get more positively correlated Z 1 and Z 2 as depicted in the right panel of Figure 1. Indeed we get S(F − , G + ) ≈ 2.48 < S(F + , G + ) ≈ 2.55.

Upper bound on S β (F, G)
It seems to be much more difficult to find a sharp upper bound for S β (F, G), as it is a notoriously difficult problem to find a strongest possible negative dependence in the sense of maximizing the expression in (8). But we can easily derive an upper bound via Jensen's inequality.
Theorem 7. For any random vector X and Y with cumulative cdfs F and G we get the following upper bound: Proof. Applying Jensen's inequality to (8) we get that for any multivariate distributions F and G,

Upper and lower bounds on S β (F, G) for copulas
In the special case of uniform marginal distributions, the class F is simply the class of all copulas and we are able to derive explicit expressions of the lower and upper bounds obtained in Theorems 6 and 7. Specifically, from Example 4, we have expression (2) and we get that EZ 2 i = 1/6 . We can then immediately derive the following consequence.
Corollary 8. For copulas C 1 , C 2 we get the following bounds: and in particular for β = 1, Example 9. One could conjecture that a sharp upper bound for copulas is obtained by S β (C − , C + ). We will now give an explicit counterexample for the important case of d = 2 and β = 1. Using the invariance under rotation we can easily derive S(C − , C + ) in this case from the expected distance of two points on the axes.

S(C
Now let us consider the copula C , defined as the distribution of the following random vector (U 1 , U 2 ) with U 1 ∼ U (0, 1) and Thus the support of the copula C consists of two parallel line segments as displayed in Figure 2.
Then we get by a similar computation Thus S(C + , C ) > S(C − , C + ). We notice, however, that this inequality is reversed for the energy distance. We get E(C − , C + ) ≈ 0.069 > 0.064 ≈ E(C − , C ) even though S(C − , C + ) < S(C + , C ), as S(C + , C + ) ≈ 0.471 is significantly smaller than S(C , C ) ≈ 0.4985. Therefore it is still an open problem whether for the energy distance E(C − , C + ) maximizes E(C 1 , C 2 ) among all copulas.

Upper bound on S(F, F ) for copulas
Under the assumption of equal copulas C = C 1 = C 2 , we can improve the upper bounds in Corollary 8 and even find sharp bounds for S(C, C) in the case of dimension d = 2 and d = 3 for some interesting copulas, which can be called spherical symmetric copulas. These do not seem to be very well-known in the community working on dependence modelling and copulas, but they have been considered from time to time in the statistics literature, e.g., in Eaton (1981), Schwarz (1985) and Perlman and Wellner (2011). Perlman and Wellner (2011) show that in dimensions d = 2 and d = 3 there are spherical symmetric random vectors X whose marginals are uniformly distributed on [−1, 1]. In dimension d = 2, this distribution has the density and in dimension d = 3 this is given by the uniform distribution on the sphere of a unit ball. Notice that the bivariate case can be obtained as the twodimensional marginals of the three-dimensional case. Transforming the marginals to uniform distributions on [0, 1] via the transformation X i → (X i + 1)/2 we get copulas called spherical symmetric copulas, which we will denote by C • . In Figure 3 we show a discrete approximation of the bivariate spherical symmetric copula. Notice that the density is unbounded, going to infinity at the boundary of the support, and therefore in the discrete approximation there are many points there as one expects for a bivariate projection of points uniformly scattered on the sphere of a ball.
Moreover, we will use results that were obtained independently in Mattner (1993), Theorem 2 and as main result in Buja, Logan, Reeds, and Shepp (1994). In our notation their results can be stated as follows.
, where X * ∼ F * is given as follows. If d ≥ 3 then X * is uniformly distributed on the sphere of a unit ball. If d = 2 then 2/3 · X * has the density given in equation (10).
From this result we can derive the following improved bounds for copulas.
Theorem 11. In dimension d = 2 it holds for any copula C that In dimension d = 3 it holds for any copula C that Proof. We define the shift T (X 1 , . . . , X d ) = (X 1 −1/2, . . . , X d −1/2) that transforms marginals from uniform on (0, 1) to uniform on (−1/2, 1/2). As obviously this shift does not affect the functional S, and therefore we can replace the copulas by distributions F with uniform marginals on (−1/2, 1/2). For any such random vector X ∼ F we get For any copula C, and X ∼ C and Y ∼ C independent, we thus obtain that E 12 d T X 2 = 1 and thus from Theorem 10, with S(F * , F * ) as described in Theorem 10. In case d = 2 and d = 3 we get equality for C = C • and the corresponding values for S(F * , F * ) are computed in Buja et al. (1994), there denoted as M d . They are given by and M 3 = 4 3 .
Thus we derive for d = 2 that and for d = 3 that We also get an improved bound for d ≥ 4 from Theorem 10 but then it is not sharp, as the uniform distribution on the sphere no longer has uniform marginals. Indeed, Perlman and Wellner (2011) show that there cannot exist any spherical symmetric copula in dimension d ≥ 4 and therefore we do not know, how the copula C that maximizes C → S(C, C) looks like. We conjecture that it in some sense will be close to spherical symmetry. The improved bound that we can derive from (11) and Theorem 10 in case d = 4 is whereas the bound from Jensen's inequality for d = 4 given in Corollary 8 is 2/3 ≈ 0.816. For higher dimensions d the difference between the bounds derived from Jensen's inequality in Corollary 8 and the better ones derived from (11) and Theorem 10 become smaller and smaller as the latter ones are also approximately d/6 for large d.
Remark 12. We also notice that S(C • , C • ) is not an upper bound for the case that we allow the copulas to be different, as we have an explicit counterexample in Example 9 with

Bounds on the Energy Score
Let F be a distribution and y an observation, we now first study bounds on S β (F, y) in order to obtain bounds on the energy score ES β (F, y).

Bounds on S β (F, y)
First, note that we cannot expect a general upper bound for y → S β (F, y) as S β (F, y) → ∞ for y → ∞. We thus concentrate on studying the lower bound in what follows.
It is clear that as a function of y, the expression of S β (F, y) is small if y is in some sense near the center of the distribution. In the univariate case it is a well-known simple result that y → E|X − y| is minimized for y * := F −1 X (1/2) being the median of the distribution of X.
We can prove a similar lower bound in the multivariate case for the upper Fréchet bound F + , if we assume that the marginals are symmetric and unimodal. Recall that a univariate distribution with cdf F is called symmetric and unimodal with respect to some µ ∈ R, if F (µ + t) + F (µ − t) = 1 for all t > 0 and F is convex on (−∞, µ) and concave on (µ, ∞). We will need the following simple Lemma for such distributions that we state with a proof here, as we could not find it in the literature.
Lemma 13. Assume that the random variable X has a continuous, unimodal and symmetric distribution with respect to µ. Then it holds for all µ ≤ x < y and all µ ≥ x > y that |X − x| ≤ st |X − y|.
Proof. Denote by F x the cdf of |X − x|, and assume µ ≤ x < y. We have to show that F x (t) ≥ F y (t) for all t ≥ 0. A simple calculation shows F x (t) = F (x + t) − F (x − t). As F is continuous, unimodal and symmetric, it has a density f which is symmetric around µ and decreasing on [µ, ∞). Therefore This implies the assertion for µ ≤ x < y and the case µ ≥ x > y follows then by symmetry.
Theorem 14. Assume that the random vector X has a cdf F with marginals F i that are continuous, unimodal and symmetric with respect to µ i , i = 1, . . . , d.
Then we have for all y ∈ R d S β (F, y) ≥ S β (F + , µ).
Proof. We have where Z 2 i,yi = (X i − y i ) 2 . It follows from Lemma 13 that for all i. Let us denote by (Z + i,yi ) 2 , i = 1, . . . , d comonotone random variables with the same distributions as Z 2 i,yi . Notice that for general y the vector (Z 2 1,y1 , . . . , Z 2 d,y d ) is typically not comonotone, even if F is the comonotonic upper Fréchet bound F = F + . This is the case, however, if y i = µ i is the median for all i = 1, . . . , d. Therefore we get The first inequality follows as in the proof of Theorem 6. The second inequality follows from the fact that for random vectors Z and Z with the same copula C + stochastic ordering Z h ≤ st Z h of the marginals implies multivariate stochastic ordering and thus Ef (Z) ≤ Ef (Z ) for all increasing functions f : R d → R, see e.g., Müller and Stoyan (2002), Theorem 3.3.8.
For copulas we get the following corollary. We denote here by 1 = (1, . . . , 1) a vector with all components being equal to one.
Corollary 15. For all copulas C and all y ∈ [0, 1] d it holds Proof. This follows from Theorem 14, as uniform distributions U on (0, 1) are symmetric and unimodal with respect to 1/2 and In the case of copulas we can also have a look at the case that the observation y is an extreme point, which can be assumed to be without loss of generality y = 0.
Theorem 16. For y = 0 the function C → S β (C, 0) attains its minimum for the upper Fréchet bound C + .
Proof. The proof for the minimum of C → S β (C, 0) follows the same lines as in Theorem 6. If U is a random vector with copula C, then . . , d, and f is a submodular function. As Z = (Z 1 , . . . , Z d ) has the same copula as U , we can conclude that S β (C, 0) ≥ S β (C + , 0).
It is easy to see that it is not true in general that the upper Fréchet bound C + minimizes C → S β (C, y). Due to invariance under rotations the minimum is obtained for the lower Fréchet bound C − if y = (0, 1).

Bounds on ES β (F, y)
Similarly to the above study of S β (F, y), one cannot expect a general upper bound of the energy score as the quantity tends to +∞ for y → ∞. As we have to deal with a difference of two quantities, it is also in general more difficult to find sharp bounds, whereas one easily gets some bounds by bounding each of the two quantities using our previous results.
We now first consider a bounded domain for y. We then characterize the copula that achieves the lower bound.
As y → x − y β 2 is convex for β ≥ 1, we also get that y → S β (F, y) is convex in this case. From Lemma 5, we thus can easily derive the following result.
Considering a copula C and an observation y ∈ [0, 1] d we immediately get the following consequence. With a similar argument we can show that the function C → ES β (C, 0) attains a minimum for a copula that is invariant under permutations, but we are not able to derive an explicit solution for the moment.

Conclusions
We investigated dependence uncertainty bounds on the energy score and for related functionals. The obtained results indicate that indeed these functionals seem not to be very sensitive to the dependence structure as one can see e.g., from the inequality 1 3 √ d ≤ S(C 1 , C 2 ) ≤ d 6 in Corollary 8, which holds for all copulas C 1 , C 2 . Notice that we get the even closer sharp bounds √ 2 3 ≤ S(C, C) ≤ π 6 in Theorem 11 for the case d = 2 if we restrict to the case of equal copulas. Therefore our results support the corresponding claim in Pinson and Tastu (2013) which was based on a simulation study using multivariate normal distributions. However, many questions remain open and we hope to stimulate research on this topic that we consider as important. For example, we are not able to find explicitly the copula that achieves the lower bound of the energy score. We are only able to provide a partial characterization of it in Theorem 19. We are also working on the problem of finding the numerical solution of this optimization problem by using a variant of the swapping algorithm that was used in Puccetti (2017) for a related problem. First results indicate that the solution seems to be a copula with a very unusual shape, but these results will be reported in a forthcoming paper.