Bounds on Integrals with Respect to Multivariate Copulas

Finding upper and lower bounds to integrals with respect to copulas is a quite prominent problem in applied probability. In their 2014 paper, Hofer and Iaco showed how particular two dimensional copulas are related to optimal solutions of the two dimensional assignment problem. Using this, they managed to approximate integrals with respect to two dimensional copulas. In this paper, we will further illuminate this connection, extend it to d-dimensional copulas and therefore generalize the method from Hofer and Iaco to arbitrary dimensions. We also provide convergence statements. As an example, we consider three dimensional dependence measures.


Introduction
A multidimensional distribution function with uniform marginals is called a copula. In contrast to the simplied approach of quantifying risk and dependence by single numbers like Spearman's ρ or Kendall's τ, modeling with copulas makes it possible to describe and encapsulate the entire dependence structure between random variables. On the other hand, an obvious downside of copulas is that, unlike simple concordance measures, they are often hard to treat analytically, especially in dimensions higher than two. Hence, instead of modeling the actual dependence structure, it is naturally interesting to ask for a "worst case" respectively a "best case" behaviour. In this paper, we propose a exible method to approximate those extremal cases.
Assume that we are given a d-dimensional random vector (X , . . . , X d ) and a function f : R d → R that describes the quantity associated with (X , . . . , X d ) which we wish to optimize. We further assume that the marginal distributions of X , . . . , X d are known and the dependence structure (i.e. the common distribution function) is completely unknown. This assumption is called dependence uncertainty and it is widely used in applications, mainly because compared to nding the dependence structure, information about the marginal laws can be relatively easily obtained. By Sklar's Theorem it is always possible to reduce this maximization respectively minimization to a similar problem involving uniformly distributed random variables X , . . . , X d [14]. Therefore, we are justi ed in restricting our focus to nding copulas C min and Cmax such that [ , ] (1) and (2) are special cases of the Monge-Kantorovich problem. This problem is very well studied in the case d = , however, due to its complexity, most analytic approaches to the Monge-Kantorovich problem in higher dimensions are restricted to particular situations. For example, one of Rüschendorf's many contributions to this eld considered the case where f is a ∆-monotone function [18]. A more exible, numerical take on this optimization problem that had a signi cant impact in recent years is the rearrangement algorithm, introduced by Puccetti and Rüschendorf [15]. This algorithm is impressively e cient in approximating the desired bounds even in high dimensions and thus su ces for most real world applications. The price for this is that it only works when f is a supermodular function and that convergence is not guaranteed. However, the cases where the algorithm does not converge are quite pathological and can be circumvented in practice. For a detailed description of how the rearrangement algorithm can be used to tackle optimization problems and also examples of non-convergence, see [16]. In two dimensions, Hofer and Iacó [7] combined the spirit of optimization theory with rigorous structural considerations and developed an algorithm that converges to the minimal respectively maximal values of equations (1) and (2) for any continuous function f . Their method connects optimality with a particular class of copulas, the Shu es of Min. We state their main results in Theorem 2.2 in section .

Mathematical Foundations
As stated previously, a d-copula is a d-dimensional distribution function on [ , ] d with uniform marginals. )) de nes a copula. We write C d for the set of all d-copulas. We already mentioned that there is a class of copulas, which is closely related to assignment problems, the Shu es of Min (or Shu es of M). As the name suggests, these are obtained by a suitable rearrangement of the probability mass of the upper Fréchet-Hoe ding bound, or Min copula, M(x , . . . , x d ) := min(x , . . . , x d ).
In two dimensions, C is a Shu e of Min parametrized by n ∈ N, a permutation π on { , . . . , n} and a function ω : { , . . . , n} → {− , } if C distributes the mass n uniformly spread along the diagonal respectively antidiagonal of [ i− n , i n ] × [ π(i)− n , π(i) n ] whenever ω(i) = respectively ω(i) = − . The original, two dimensional de nition is slightly more general and goes back to [12]. In higher dimensions, there are several versions of Shu es of Min (see e.g. [5] for a discussion). A basic property of these Shu es is that they are dense in the set of all copulas with respect to weak convergence. For more details and a survey of di erent metrics also see [5].
A concept which has proven very useful when solving two dimensional Monge-Kantorovich problems is that of c-cyclical monotonicity. It is a famous result in optimal transport theory that, under mild assumptions on c, a probability measure µ is optimal for the two dimensional Monge-Kantorovich problem if and only if it is concentrated on a c-cyclically monotone set. This optimality result follows from a dual formulation of the problem, for which we refer to the book of Villani [20].
Finding a similar statement for dimensions higher than two has been an open problem for many years. Griessler .
(ii) Any nite measure α concentrated on nitely many points in Γ is a (with respect to c) cost-minimizing transport plan between its marginals; i.e. if α has the same marginals as α, then cdα ≤ cdα .
They were also able to show that for any measurable cost function c, a measure µ which is optimal for the multidimensional Monge-Kantorovich problem is always concentrated on some c-cyclically monotone set. Griessler [6] recently showed the converse statement under more assumptions on c: If the cost function c is continuous and bounded by a sum of integrable functions, any measure which is concentrated on a ccyclically monotone set is an optimal solution to the multidimensional Monge-Kantorovich problem.
Next we give a short overview of assignment problems. The mathematical formulation of a Linear Sum Assignment Problem is the following: min x∈R n×n n i= n j= a ij x ij (4) subject to x ij ∈ { , }.
The matrix (a ij ) is also called the objective function and the set of all x ∈ R n×n which ful ll all the constraints is called the feasible region. It is not hard to see that a Linear Sum Assignment Problem can equivalently be written in the form where Sn denotes the set of all permutations of { , . . . , n}. Although the feasible region of this problem is actually n -dimensional, with n being the number of objects, we will refer to this version of the assignment problem as the "two dimensional assignment problem (2-AP)" since one can interpret this as matching two di erent kinds of objects. The assignment problem is, at least for the two dimensional case, very well-studied.
We are now ready to state the main result from [7] that connects Shu es of Min and assignment problems to integrals with respect to copulas.

Theorem 2.2. Let f be a continuous function on [ , ]
and let the partition I n for any n be given as Then de ne f max Now a copula C max n which ful lls [ , ] f max is given as a shu e of Min with parameters (n, π * , ) where π * is the permutation which solves the assignment problem max π∈Sn n i= a iπ(i) .
Moreover, the maximal value of (8) is given as Furthermore, Iacó, Thonhauser and Tichy [8] showed that the sequence of maximizers C max n converges, at least along some subsequence, to a maximizer Cmax of the problem

Main Results
As we will see, some structural analogies are destroyed in the d-dimensional case, which is why a direct application of the method from [7] is not possible. For our main result, an extension of Theorem 2.2 to arbitrary dimensions along with a similar convergence result as in [8], we start by introducing the concept of a multidimensional assignment problem. De Again, "d-dimensional" is meant with respect to the number of di erent object types. The feasible region in this case would actually be n d -dimensional.
Then a copula C min which ful lls distributes uniformly on each square of type I n i the probability mass equal to x * i /n, where (x * i ) i∈I is an optimal solution to the relaxed d-AP with respect to the objective function (a i ) i∈I . By this construction, C min is a so-called checkerboard copula ( [4], [13]).
Here "relaxed" means that we are considering the continuous relaxation of the axial d-AP, i.e., we replace the integer constraint (11) by x i ≥ for all i. Even though a seemingly subtle di erence, this change yields an entirely di erent problem from the perspective of optimization theory. Theorem 3.1 holds for any dimension d and indeed for the case d = we get precisely the method proposed in [7]. This follows from Birkho 's theorem, which states that the two dimensional assignment problem is identical to its continuous relaxation. However, Birkho 's theorem does not hold for any dimension greater than two, which is why it is not possible to restrict the solution space to Shu es of Min in higher dimensions. As a result, the optimizing copula that comes from the assignment problem will, in general, not be given as a Shu e of Min. We refer to [3] for further details about assignment problems.
That means, for dimensions greater than two, the maximizer we nd via this procedure will not have the nicely parametrized form that made Shu es of Min so appealing. On the other hand, working with the relaxed assignment problem instead of the integer problem brings great advantages concerning computability. While the classical integer assignment problems are NP hard, their continuous relaxations lie in P. Here the computational complexity is with respect to the number of objects that should be assigned, or in the context of copulas, the coarseness of the partition of [ , ] d . Proof of Theorem 3.1. By de nition, the value of (12) is given as Notice that the value of (12) does not depend on how the copula C distributes mass inside of each cube I n i , but only on how much mass is placed on each I n i . Hence, we can write x i := µ C min (I n i ), with i ∈ I and are left with the following optimization problem: min i∈I a i x i .
However, we still must encode constraints that ensure that the mass distribution x i actually yields a copula. We recall that there is a one to one correspondence between d-copulas and d-fold stochastic measures, so it su ces to ensure that the measure µ C min is d-fold stochastic. Since we already noted that the value of (12) is independent of the distribution inside the cubes I i , we can assume that µ C min distributes the mass inside of each cube I i uniformly. The d-fold stochastic measures which distribute the mass x i uniformly inside of the cube I i for each i ∈ I are given by the equations This can be seen as follows: It is elementary that each d-fold stochastic measure satis es the conditions (13). So let C ful ll (13) and let ≤ a < b ≤ . Now look for . Without loss of generality let us consider the rst coordinate, it holds ([a, b]).
By standard arguments of measure theory, we can extend this result from intervals to arbitrary measurable sets A.
Hence the measure C min is d-fold stochastic if and only if the constraints (13) are ful lled. But those are exactly the constraints (10) from the d-AP with the right hand side n instead of . Since scaling the right hand side of a linear optimization problem results in a similar scaling of the optimal solution, the optimal probability mass distribution (x i ) is given as (x i ) = n (x * i ) with (x * i ) being the optimal solution to the general d-AP with objective function (a i ).
With the necessary adjustments, Theorem 3.1 is equally valid for a maximization instead of a minimization.
Also in the multidimensional case, it is possible to approximate integrals over continuous functions by a sequence of integrals over functions that are piecewise constant. Now denote by C max n and C min n copulas which ful ll and [ For the proof that the sequence of optimizers converges to an optimizer for the continuous function, we start like in [8] by using Theorem . from [9] to deduce that C max n converges weakly along some subsequence to a copula C min . Now according to Theorem . from [1], any measure which is an optimal solution to a transportation problem is necessarily concentrated on a c-cyclical monotone set. So since C max n is optimal for the transportation problem (14) it must be concentrated on a c-cyclical monotone set. Hence, for any N ∈ N, the N-fold product measure C max,⊗N n is concentrated on the set Sn(N) of points (x ( ) , .
Since f is continuous, we can choose n large enough such that C max,⊗N n is concentrated on the set Sε(N) of points with Since f is continuous, the set Sε(N) is closed. Therefore the limiting measure C ⊗N min is concentrated on Sε(N) for all ε > . Now let ε → and we have that C ⊗N min is concentrated on a set of points with which means that C min is concentrated on a c-cyclically monotone set. Since f is continuous and bounded by a sum of integrable functions, we can apply Theorem . from [6] to deduce that C min is optimal. The proof for the sequence C min n is identical.
In [7], the authors also give a convergence rate under the assumption of Lipschitz continuity. For completeness, we want to mention that this holds in very much the same way for the multidimensional setting.

Corollary 3.3. Let the assumptions of Theorem 3.2 hold and, in addition, assume that f is Lipschitz continuous on [ , ] d with Lipschitz constant L > . Then
Proof Remark 3.4. The method presented generalizes the approach from [7] and furthermore facilitates the computation since the relaxed assignment problem is much easier to solve than the integer one. However, the solution vector is still n d -dimensional. In practice this method can be applied on a standard laptop for n d ≤ and since good approximations are already possible for n ≈ , dimensions up to d = can certainly be handled. So for most practical applications, the rearrangement algorithm by Puccetti and Rüschendorf [16] will, whenever applicable, still be the method of choice. The merit of our approach lies in the generality of the statement. It is not limited to supermodular functions but requires merely continuity and a notion of boundedness. Both of these assumptions might possibly be relaxed as research in the eld of optimal transport progresses. (1) and (2) as Hofer and Iacó did in [7] is in some sense also the notion behind the rearrangement algorithm. In [17], Puccetti and Wang consider rearrangements, show how they are linked to copulas and illustrate that Shu es of Min can be seen as a construction of particular rearrangements. As the authors of [17] mention, the fact that Shu es of Min are dense in the set of copulas could then be used to approximate solutions to copula optimization problems arbitrarily well with Shu es of Min. This would be a direct extension of the method from [7] to general dimensions, however, due to the complexity of the integer assignment problem, this approach is of very limited practical relevance. The rearrangement algorithm, on the other hand, cuts the solution space down to oppositely ordered rearrangements, resulting in the restriction to supermodular functions but also in an enormous increase of e ciency. For more details, we refer to [16]. Remark 3.6. Following the spirit of Bignozzi, Puccetti and Rüschendorf [2] or Lux and Papapantoleon [10], one might also consider including partial information about the distribution by simply adding suitable constraints to the linear program.

Applications . Dependence Measures
A natural application for this technique is the approximation of upper and lower bounds on dependence measures. In the bivariate case, there are well-known and widely used measures such as Spearman's ρ, Kendall's τ, Blomqvists β and Gini's γ. See e.g. [11] for de nitions.
We now focus on a multidimensional version of Spearman's ρ. De ne Here, Π denotes the independence copula, i.e. Π(x , . . . , x d ) = Π d i= x i . It is well-known that ρ(C) is maximal when C = M d , i.e. C is the Min-copula. It is also well-known that plugging in the lower Fréchet-Hoe ding bound (usually denoted by W d ) yields a lower bound on ρ(C): However, since W d is not a copula for d ≥ , it is not a priori clear whether this lower bound is attained or not. Indeed, this has been stated as an open problem in [19]. In 2011, Wang and Wang [21] found an analytical solution to this long unresolved question. They give a formula to explicitly compute for any d ∈ N. Since the formula yields Λ = . × − , it is straightforward that ρ(W ) = − is not attained. We now want to give an alternative, numerical proof for this result. We chose this example because the fact that we know the exact analytical value will help us to validate the convergence of our method. We use Theorem 3.2 to compute strict upper and lower bounds for Λ . In this case the monotonicity of Π facilitates nding the maximum respectively minimum functions in the algorithm as we simply have to evaluate Π at the vertices of the grid cubes that maximize respectively minimize the arguments. An approximated value is also obtained by evaluating Π in the center of each grid cube. Of course, this has to be adapted for other cost functions. Table 1 illustrates the results obtained using the method proposed in this paper with a grid of n ∈ { , , , } sections in each dimension¹ as well as the range the rearrangement algorithm (which uses the same discretization method as our algorithm) computes for a grid of sections in each dimension [16]. .
Note that the range here is not a con dence interval but actually consists of deterministic upper and lower bounds on the true value. Already the lower bound for n = su ces to prove that ρ(C) > − = ρ(W ) for all copulas C. As can be seen, the rearrangement algorithm yields a more precise approximation for the same problem and that in considerably less time, even for higher dimensions d (more details can be found in [16]). However, since the lower bound computed by the rearrangement algorithm does not always have to be satis ed, we see the merit of our method in providing rigorous numerics for the fact that the lower bound − for ρ(C) is not best-possible.
An interesting extension to the minimization of Λ d is considering non-uniform marginal distributions. While the result of Wang and Wang [21] is restricted to identical marginal distributions having an increasing density, Sklar's Theorem allows us to treat any marginal laws by inserting the quantile functions of the desired distributions. Table 2 contains the approximated value as well as upper and lower bounds of for di erent choices of µ , µ , µ . Here M(µ , . . . , µ d ) denotes the set of probability measures on [ , ] d which have marginal distributions µ , . . . , µ d . Distributions with unbounded support require some minor adjustments, however the method is still applicable. Again, the rearrangement algorithm as proposed by Puccetti and Rüschendorf [16] will produce more accurate results in shorter calculation time and is thus probably the preferable choice in applications.

. First-to-default Swaps
In the examples so far, we always minimized the expectation of the product of random variables. The product function is supermodular in the following sense.
Here, x ∧ y resp. x ∨ y means the componentwise minimum resp. maximum of x and y.
Since the rearrangement algorithm can only be applied to approximate the expectation of supermodular functions, it is interesting to look at an example involving a non-supermodular function. We want to consider so called First-to-default Swaps. They can be thought of as an insurance contract for portfolios of risky assets. The protection seller (PS) compensates the losses if one of the assets in the portfolio of the protection buyer (PB) defaults. In turn, the PB pays xed premiums at xed points in time (e.g. quarterly or annually) until the rst default occurs or the maturity of the contract is reached. There are no payments for any event after the rst default or after maturity. We consider a portfolio consisting of three risky assets and use the following assumptions and notations.
The default times τ , τ , and τ of the assets follow an exponential distribution with parameters λ , λ and λ respectively. The notionals of all three assets are and the recovery rates R , R and R describe the amount of money that can be liquidized if the corresponding asset defaults. So the total loss for a default of asset i is ( − R i ). The times of premium payments are denoted by = t < t < · · · < t k = T with T denoting the time of maturity. Note in particular that the rst payment is due at time t = . Finally, we assume there is a constant, risk free interest rate r ≥ . Now the premiums p are given as with C denoting the copula of the distribution function of (τ , τ , τ ) and F − i being the quantile function corresponding to the distribution of τ i . Since our assumptions and the valuation method we want to use are basically the same, we refer to [7] for the precise deduction of the premium heights. We calculate bounds for the minimal as well as for the maximal premium for three payment times t = , t = and t = T = . The results are listed in Table 3.

Remark 4.2.
Note that the integrand in the last example is not continuous. Our method is not restricted to continuous functions but can be applied as long as the integrand f can, with respect to the L norm, be approximated by a (subsequence of a) sequence of functions (fn)n that are constant on the cubes I n i with i ∈ I (as de ned in Theorem 3.1). Hence, by a simple denseness argument, the algorithm actually works for any function f in L ([ , ] d ), which is why we still have valid bounds in our last example. However, the speed of convergence can be very slow for functions with many discontinuities. For example, it can happen that for n < n , the bounds for a discretization of n sections are worse than those for a discretization with n sections. Also the convergence of the sequence of optimizers is not guaranteed if we choose an integrand f which is not continuous.