Discrepancy distances and scenario reduction in twostage stochastic mixed-integer programming

Polyhedral discrepancies are relevant for the quantitative stability 
of mixed-integer two-stage and chance constrained stochastic programs. We 
study the problem of optimal scenario reduction for a discrete probability 
distribution with respect to certain polyhedral discrepancies and develop 
algorithms for determining the optimally reduced distribution approximately. 
Encouraging numerical experience for optimal scenario reduction is provided.


(Communicated by Xiaojun Chen)
Abstract. Polyhedral discrepancies are relevant for the quantitative stability of mixed-integer two-stage and chance constrained stochastic programs. We study the problem of optimal scenario reduction for a discrete probability distribution with respect to certain polyhedral discrepancies and develop algorithms for determining the optimally reduced distribution approximately. Encouraging numerical experience for optimal scenario reduction is provided.
1. Introduction. Two-stage (linear) stochastic programs arise if, for given realizations ξ of a random vector and first-stage decisions x, possible violations of a constraint T x = h(ξ) are compensated by a second-stage decision y(ξ), being nonnegative, satisfying W y(ξ) = h(ξ) − T x with a (fixed) recourse matrix W , and inducing costs q(ξ), y(ξ) . Then the idea is to minimize the sum of the expected recourse cost E( q(ξ), y(ξ) ) and of the first-stage cost c, x with x varying in a constraint set X. If some of the components of the second-stage decision y(ξ) are integer variables, one arrives at two-stage stochastic programs with mixed-integer linear recourse. Such optimization models may be rewritten in the form v(P) := min c, x + where Φ is a mapping from R m1+m2 × R d to the extended reals given by Φ(u, t) := min u 1 , y 1 + u 2 , y 2 : W y 1 +W y 2 = t, y 1 ∈ R m1 + , y 2 ∈ Z m2 Thereby, c ∈ R m , X is a closed subset of R m , Ξ is a (convex) polyhedral subset of R s , P is a Borel probability measure on Ξ, W andW are (d, m 1 )-and (d, m 2 )matrices, respectively, T is a (d, m)-matrix, and q(ξ) ∈ R m1+m2 and h(ξ) ∈ R d depend affinely on ξ ∈ R s .

RENÉ HENRION, CHRISTIAN KÜCHLER AND WERNER RÖMISCH
The standard approach for solving mixed-integer two-stage stochastic programs of the form (1) consists in approximating the original probability distribution P by a discrete probability measure with N atoms ξ i and probabilities p i , i = 1, . . . , N . The resulting stochastic program is equivalent to the mixed-integer program min c, x + N i=1 p i q 1 (ξ i ), y 1i + q 2 (ξ i ), y 2i : x ∈ X, y 1i ∈ R m1 + , y 2i ∈ Z m2 + , where the number of integer variables y 2i increases with N . Since for applied stochastic optimization models (see, e.g., [18,19]) often the dimension m 2 of each y 2i gets large, the mixed-integer program (3) might become huge even for small numbers N of scenarios and practically unsolvable for large N . Thus, in applications, it might be desirable or even inevitable to reduce the number of scenarios such that reasonable solution times for (3) are achieved. Previous work [3,7,8] on scenario reduction for two-stage stochastic programs without integrality requirements suggests to base the reduction process on suitable distances of probability distributions. Roughly speaking, such a probability metric D is suitable if the optimal value v(P) and the solution set of the underlying stochastic program behave continuously in P in terms of the distance D, i.e., whenever P is approximated by some measure Q the estimate |v(P) − v(Q)| ≤ L · D(P, Q) holds for some constant L ≥ 0. If such a continuity condition holds true, it seems reasonable to approximate P by a probability measure Q in terms of D, i.e., to determine a measure Q such that D(P, Q) is small.
As argued in [23], (semi-)metrics with ζ-structure (cf. [21,28]) of the form with F denoting a certain class of Borel measurable functions from Ξ to R, are suitable probability distances for many stochastic programs. By extending the results in [26,23] it is shown in [24] that the class between probability measures P and Q in P r (R s ) is suitable for the mixed-integer program (1) with r = 2 and B being a certain class of (convex) polyhedra in Ξ with a uniformly bounded number of faces. Here, ½ B denotes the characteristic function of the set B, the class F r (Ξ) consists of all continuous functions f : Ξ → R that fulfill the estimates for all ξ,ξ ∈ Ξ, and P r (Ξ) is the set of all Borel probability measures on Ξ having finite absolute moments of order r ≥ 1. However, as pointed out in the Appendix, it seems to be difficult to employ the metrics ζ r,B for scenario reduction directly. An alternative to ζ r,B is the so-called B-discrepancy (cf. [15]) of probability measures P and Q on Ξ for a given system B of Borel subsets of Ξ. Indeed, let us consider the estimate which holds for all probability measures P and Q on Ξ with supports contained in the ball B(0, R) and which may be derived as [26,Corollary 3.2]. Combining (7) with the stability result of [24] we obtain that, under certain conditions, holds for some constants L and C. Observe that the constant C = C(R) only depends on the problem (1) and the radius R. Since we are interested in measuring the distance of two discrete probability measures P and Q with Q's support being a subset of the support of P, both supports are contained in some ball around zero and, thus, the estimate (7) indeed applies. As shown in [26, Proposition 3.1] (see also [24,Proposition 1]), the class B of polyhedra has to contain all sets of the form where h is an affine mapping from R s to R d , x ∈ X, and B is a polyhedron each of whose facets, i.e., (d − 1)-dimensional faces, is parallel to a facet of the cone pos W := {W y 1 : y 1 ∈ R m1 + } or of the unit cube [0, 1] d . In this paper, we consider the particular instance of the sets (9) with d = s and h(ξ) = ξ which corresponds to the situation of mixed-integer two-stage stochastic programs with random right-hand side. The corresponding class B of polyhedra in Ξ is denoted by B poly(W) and the polyhedral discrepancy by α B poly(W) . If every facet of pos W parallels a facet of the unit cube, α B poly(W) coincides with the rectangular discrepancy α B rect that is defined by (6) and the set B rect of all rectangles in R s . The latter discrepancy becomes suitable in case of pure integer recourse, i.e., Φ(u, t) := min{ u, y :W y = t, y ∈ Z m2 + } . Thus, when studying the polyhedral discrepancy for arbitrary matrices W in the following, the rectangular case has not to be considered separately.
Assuming that P follows a uniform distribution on the line segment {(z, z) : z ∈ (0.1, 0.4)}, we define for ε ∈ (0, 0.1) the shifted measure P ε via P ε (A) := P A+ ε −ε for every Borel set A ⊂ Ξ. It follows that v(P ε ) − v(P) = 0.5 for every ε ∈ (0, 0.1). On the other hand, with ε ց 0 the measures P ε converge to P in the weak sense, as well as with respect to the rectangular discrepancy α B rect . Writing problem (10) in the standard form (2), the non-integer part of y is assigned to the recourse matrix With regard to the aforementioned continuity of v(·) w.r.t. α B poly(W) , it is not surprising that α B poly(W) (P, P ε ) = 1 for every ε > 0. Note that it has been shown in [17], that, under certain conditions, α B poly(W) can be estimated against α B rect . However, in our case P is not absolutely continuous w.r.t. the Lebesgue measure on Ξ ⊂ R 2 , and, hence, the result in [17] does not apply.
It is worth mentioning that polyhedral discrepancies also become suitable for chance constrained stochastic programs of the form where p ∈ [0, 1] and c, X, T and h(·) are defined as above. This model can be considered as a multivariate generalization of Value-at-Risk optimization (see, e.g., [13]). The chance constraint may be rewritten as and, hence, under certain regularity conditions (see, e.g., [23, Section 3.3]), optimal values and solution sets of (11) behave continuous with respect to the discrepancy The present paper is organized as follows. In Section 2 we state the problem of optimal scenario reduction with respect to a discrepancy distance α B and decompose it into a combinatorial and a linear optimization problem. Extending our earlier work in [9], we discuss in Section 3 how the coefficients of the linear program may be computed in case of the polyhedral discrepancy α B poly(W) . Algorithms for determining the optimally reduced probability distribution (with respect to α B poly(W) ) are developed in Sections 4 and 5. In Section 6 we provide and discuss numerical results.
2. Scenario reduction. We consider a probability measure P with finite support {ξ 1 , . . . , ξ N } and set p i := P( ξ i ) > 0 for i = 1, . . . , N . Denoting by δ ξ the Dirac-measure placing mass one at the point ξ, the measure P has the form The problem of optimal scenario reduction consists of determining a discrete probability measure Q on R s supported by a subset of {ξ 1 , . . . , ξ N } and deviating from P as little as possible with respect to some distance, in this case a certain discrepancy α B . It can be written as subject to {η 1 , . . . , η n } ⊂ {ξ 1 , . . . , ξ N }, q j ≥ 0 (j = 1, . . . , n), The variables to be optimally adjusted here are the support η = {η 1 , . . . , η n } and the probability weights q = (q 1 , . . . , q n ) of the reduced measure Q, altogether they The optimization problem (13) may be decomposed into an outer problem for determining supp Q = η and an inner problem for choosing the probabilities q j . To this end, we denote by α B (P, (η, q)) the B-discrepancy between P and Q, and by S n the standard simplex in R n : S n := {q ∈ R n : q j ≥ 0, j = 1, . . . , n, n j=1 q j = 1}.
Using this notation, the scenario reduction problem (13) can be written as inf η inf q∈Sn α B (P, (η, q)) : η ⊂ {ξ 1 , . . . , ξ N }, #η = n , with the inner problem inf α B (P, (η, q)) : q ∈ S n (16) for the fixed support η. The combinatorial optimization problem (15) is addressed in Section 5. In the remaining part of this section, we introduce some notation and recall [9, Section 3.1] to show how the inner problem (16) can be formulated as a linear optimization problem and how the dimensionality of the latter can be reduced. When addressing the inner problem (16) for given η we may assume for the sake of notational simplicity, that η = {ξ 1 , . . . , ξ n }. Then, (16) is of the form: The finiteness of P's support allows to define for B ∈ B the critical index set I(B) through the relation Furthermore, we define the system of critical index sets of a system of Borel sets B as

RENÉ HENRION, CHRISTIAN KÜCHLER AND WERNER RÖMISCH
Thus, the B-discrepancy between P and Q can be reformulated as follows: This allows to solve (17) by means of the following linear optimization problem: The number of inequalities may be too large to solve (20) numerically, but whenever two critical index sets share the same intersection with the set {1, . . . , n}, only the right-hand sides of the related inequalities differ. Thus, it is possible to pass to the minimum of all right-hand sides corresponding to the same left-hand side. To this end, we introduce the following reduced system of critical index sets to write problem (20) as Since |I B | ≤ 2 N and |I * B | ≤ 2 n , passing from (20) to (23) indeed drastically reduces the maximum number of inequalities and can make problem (16) tractable for numerical solutions. However, while the linear program (23) can be solved efficiently by available optimization software, at least for moderate values of n, it turns out that the determination of the coefficients I * B , γ J , γ J is more intricate.
3. Determining the coefficients. In our earlier work [9, Section 3.2], it is described how the coefficients I * B , γ J , γ J can be determined computationally in case of the cell discrepancy (where In this section, this approach is extended to the more general polyhedral discrepancies. Given the recourse matrix W , we consider k pairwise linearly independent vectors m 1 , . . . , m k in R s such that every facet of pos W and of the unit cube [0, 1] s is normal relative to m i for one i ∈ {1, . . . , k}. Let us denote by M the (k × s)−matrix whose rows are given by m 1 , . . . , m k , respectively. Then, due to its special form, every polyhedron B in B poly(W) can be written as for suitable k−dimensional vectors a B andā B with a B i ≤ā B i ∈ R ∪ {+∞, −∞} for i = 1, . . . , k. Since B is determined by the vectors a B andā B , we will use the notation a B ,ā B for B in the following.
Analogous to the concept of supporting cells in [9], we will show that it suffices to consider the following supporting polyhedra. Loosely speaking, a polyhedron B ∈ B poly(W) is called supporting if each of its facets contains an element of ξ 1 , . . . , ξ n in a way that B can not be enlarged without changing the intersection of B's interior with ξ 1 , . . . , ξ n , cf. Figure 2. This is formalized by the following definitions. We introduce the sets admits a further representation by two vectors i : where we set for notational convenience m j , ξ ±∞ := ±∞, respectively. Note that condition (26) means that every facet of the polyhedron B is contained in a hyperplane in R s that also contains an element of ξ 1 , . . . , ξ n . (26) is called supporting, if it admits a representation i,ī, such that for every j, l ∈ {1, . . . , k} , j = l, the following relations hold: The set of all supporting polyhedra is defined by The following proposition parallels [9, Prop. 3.1] and shows that every critical index set J corresponds to a maximal supporting polyhedron B whose interior does not contain any ξ i with i ∈ {1, . . . , n} \ J. Proposition 1. For any J ∈ I * B poly(W ) there exists a supporting polyhedron B such that γ J = P(int B) and Proof. We consider an arbitrary J ∈ I * B poly(W ) . ¿From the definition of ϕ(J) in (21) it follows that for any I ∈ ϕ(J) there exists some C ∈ B poly(W ) such that I = I(C) and J = I(C) ∩ {1, . . . , n}. By definition of I(C) we have and, hence, Let a (0) ,ā (0) be a polyhedron attaining the maximum, i.e., γ J = P( a (0) ,ā (0) ) and In addition, due to finiteness of the set ξ 1 , . . . , ξ N , we can assume that these identities are also valid for int a (0) ,ā (0) , i.e., In the following, we will enlarge a (0) ,ā (0) by succesively shifting its facets until it becomes supporting. To this end, we put , t ≥ 0, and consider the polyhedral enlargement a(t),ā (0) of the polyhedron a (0) ,ā (0) . We set τ := sup t ≥ 0 : int a(t),ā (0) ∩ ξ 1 , . . . , ξ n = int a (0) ,ā (0) ∩ ξ 1 , . . . , ξ n .
We repeat the above construction for the coordinateā (0) by defininḡ Again, if τ < ∞, there existsī 1 ∈ {1, . . . , n} such that m 1 , ξī 1 =ā Continuing the construction in this way for the coordinates 2, . . . , k, we arrive at the polyhedron B := a (k) ,ā (k) with (26) and (28) as well as the indices i l ,ī l for These inequalities remain valid for the final vectors a (k) ,ā (k) , due to a Thus, (27) holds and B is supporting. Since both identities of (30) remain valid during the construction of B, it follows that B possesses the asserted properties. Corollary 1. The following identities hold: The inclusion '⊆' in the first identity and the inequality '≤' in the second identity follow directly from Proposition 1. For the reverse direction of the first identity, let B = a B ,ā B ∈ P be given such that (28) holds true for some J ⊆ {1, . . . , n}. Due to finiteness of ξ 1 , . . . , ξ n there exists an ε > 0 such that where each entry of the vectorε ∈ R k is equal to ε. Since a B +ε,ā B −ε ∈ B poly(W ) , we observe Therefore, which provides J ∈ I * B poly(W ) via the definition of I * B poly(W ) . Consequently, also the inclusion '⊇' in the first identity holds true. To verify the relation '≥' in the second identity, let J ∈ I * B poly(W ) and B = a B ,ā B ∈ P be arbitrary, such that (28) holds true. We choose an ε > 0 with and conclude (32). Therefore, I a B +ε,ā B −ε ∈ ϕ(J) (see (21)) and where the last identity follows from (33). This proves the inequality '≥' in the second identity, since a B ,ā B ∈ P was chosen arbitrarily such that (28) holds true.
From Corollary 1 it follows that the set I * B poly(W ) and the upper coefficients γ J for J ∈ I * B poly(W ) can be determined, whenever one knows the system of P of supporting polyhedra. The following proposition shows how the lower coefficients γ J for J ∈ I * B poly(W ) may be computed.
Proposition 2. For all J ∈ I * B poly(W ) , one has γ J = i∈I p i , where I is given by Proof. We consider an arbitrary J ∈ I * B poly(W ) . Completely analogous to (29) in the proof of Proposition 1, it follows that We define a * ,ā * ∈ R k by With Corollary 1 and Proposition 2 at hand, we propose in the next section an algorithm to calculate the coefficients of (23) and to solve the inner problem (16).

4.
Optimal redistribution algorithm. For using numerically the concept of supporting polyhedra, one has to determine the set R defined by (25). Thus, given the matrix W , one has to identify a normal vector for each facet of the convex cone pos W . This transformation of a vertices-based representation of pos W to one based on halfspaces is a well-studied problem for which efficient algorithms are available, e.g. the implementation lrs ([1]) of [2]'s reverse search algorithm, and the implemenation cdd+ ([4]) of [14]'s double description method.
Corollary 1 and Proposition 2 show that the coefficients of the linear inner problem (16), i.e., I * B poly(W ) and γ J , γ J for J ∈ I * B poly(W ) , can be determined by iterating through the set P of supporting polyhedra. With regard to the huge number of potential supporting polyhedra, we propose for this iteration a recursive approach. More precisely, the supporting polyhedra [a,ā] = [(a j ) k j=1 , (ā j ) k j=1 ] are constructed recursively for j = 1, . . . , k, while ensuring at every step j that condition (27) is still fulfilled when the j − th coordinate is added. This is done within FUNCTION Iterate of the following algorithm.

RETURN.
Remark 1. When using Algorithm 1 repeatedly with varying support, e.g., within one of the algorithms mentioned in the next section, it would be desirable to decrease the numerical complexity by using some of the data I * B poly(W ) and γ J , γ J computed for another support. While this is possible for γ J and γ J within Algorithm 4.1. of [9], it is unfeasible inside our Algorithm 1, since γ J and γ J are already determined by the setĪ, that is calculated simultaneously with the construction of J.

5.
Finding an optimal support. In this section, we address the outer problem (15) of choosing an optimal support. This combinatorial represents a specific k-median problem and is hence N P -complete [6]. Nevertheless, when not considering a discrepancy but probability metrics d Fc with ζ-structure defined by (4) and c(ω,ω) = ω −ω , (15) can be formulated as a mixed-integer linear program that can be solved numerically for moderate values of N and n by available optimization software.
Furthermore, heuristical approaches have been developed for probability metrics with ζ-structure, cf., e.g., [7]. We shortly recall their forward and backward algorithms that have been shown to be fast and to provide often nearly optimal solutions. They determine index subsets J [n] and J [N −n] , respectively, of {1, . . . , N }. These sets of cardinality n represent the support of the reduced measure Q. Adapted to our framework of discrepancy distances, the algorithms read as follows.

Algorithm 3. Backward reduction.
Step Step [i]: Step Dealing with discrepancy distances, we meet different problems in solving the combinatorial problem (15). On the one hand, discrepancies do not allow, to the best of our understanding, a reformulation of (15) as a mixed-integer linear program that would be accessible by available solvers. On the other hand, the abovementioned heuristics do not produce nearly optimal results anymore, cf. Figure 5. This is a consequence of the fact, that the maximum in (6) is attained in many different regions, and, thus, the removal or adding of a single point to the support is often a unfavorable precondition for further reduction or expansion. Furthermore, the backward reduction algorithm becomes significantly slower with increasing N , since Algorithm 1 has to determine the optimal redistribution for large values of n, cf. Table 1.
Finally, even for moderate scenario numbers a complete enumeration is very expensive the solution of the inner problem (16) requires higher computational efforts and is not a simple nearest-neighbour projection as in the case of d Fc with c(ω,ω) = ω −ω . Running times of complete enumeration can be found in Table  4.
Since sometimes one may be interested in knowing the achievable minimal discrepancy, we suggest the following approach to solve (15) for moderate values of N and to reduce the time needed by complete enumeration. However, the complexity of (15) quickly increases with the dimension of the problem and real-world stochastic optimization problem are of higher dimension, in general. Consequently, practitioners mostly have to abandon an optimal approximation and the following approach is rather of academic interest. For an approach that may be numerically more tractable for larger problems, we refer to Example 2, where we adopted a Quasi-Monte Carlo method to tackle the outer problem (15).
The following approach is applicable for both the cell discrepancy studied in [9] and the polyhedral discrepancy. Starting point is the computation of an upper boundᾱ n for the achievable minimal discrepancy by using heuristics, e.g. the forward selection algorithm. Since the optimal discrepancy achievable by m points decreases with increasing m, an optimal tuple of n < m elements may not be contained in any m-tuple with a discrepancy exceedingᾱ n . Hence, we can pass through choices u of m ∈ {n + 1, . . . , N − 1} out of N points to determine some u with a discrepancy exceedingᾱ n . Afterwards, to determine the optimal n-tuple, it suffices to evaluate all n-tuples being no subset of any of these u. As soon as we find an n-tuple whose discrepancy falls below the upper boundᾱ n , we can updatē α n and defer the enumeration of n−tuples to repeat the iteration of m−tuples for m ∈ {n + 1, . . . , N − 1} to exclude further m-and n-tuples.
Since m-tuples can be seen as admissible solutions to a relaxation of (15), this procedure is close to a branch and bound approach with iterated breadth-first search (Step [2]) and depth-first search (Step [3]). However, a standard branch and bound approach does not perform well since the solution of the redistribution problem along the branch and bound tree is too expensive. We propose the following Algorithm 4. Branch and bound.
Step [1]: Determine an upper boundᾱ n on the minimal discrepancy achievable by a measure supported by n points. Set U = ∅.
Step [2]: Iterate through some m-tuples w with m ∈ {n + 1, . . . , N − 1}: If w ⊂ u for all u ∈ U : calculate optimal redistribution given the support w and the resulting discrepancy. If the latter exceedsᾱ n : Add w to U .
Step [3]: Iterate through all (remaining) n-tuples w. If w ⊂ u for all u ∈ U : calculate optimal redistribution on w and the resulting discrepancy. If the latter falls belowᾱ n , updateᾱ n and go to Step [2].
The choice of the tested m-tuples within Step [2] is crucial for the performance of the algorithm. In the remaining part of this section we address this question and suggest a heuristic for this breadth-first search.
The following considerations should be taken into account. On the one hand, the number of n−tuples excluded by an m-tuple with a discrepancy exceedingᾱ n increases with increasing m. On the other hand, with decreasing m, it becomes more likely to find such m-tuples. However, the evaluation of all m-tuples becomes quickly too time-consuming when m approaches N/2.
Thus, we suggest the following approach for Step [2]. Once having determined the set U from the evaluation of some N − 1, . . . , i + 1-tuples, we can calculate the number of remaining n-and i-tuples, respectively. This is shown by Lemma 5.1 below. The time needed for the evaluation of a single n-or i-tuple, i.e., the costs of the inner problem (16), can be (roughly) estimated to be proportional to the number of potential supporting polyhedra or cells, i.e., in the case of the cell discrepancy n+s s or i+s s , respectively. We denote the time needed for evaluation of all remaining n-and i-tuples by τ U n and τ U i . If we invest a certain part λ ∈ (0, 1) of the time τ U n in the evaluation of some i-tuples, i.e., we evaluate a fraction κ of all remaining i-tuples such that κ · τ U i = λ · τ U n . This evaluation entails a set U κ ⊃ U . We decide to evaluate all remaining i-tuples if and only if The right-hand side represents the costs of testing all remaining i-tuples, the lefthand side can be interpreted as an extrapolation of the benefit of such a test. Using (35) and the definition of κ, (36) can be written as To this end, we have to calculate the number of remaining n-and i-tuples, given a set U of excluding supersets. This can be done by the following formula that is based on the inclusion-exclusion principle and can be easily proven by induction over m.
Remark 2. Since the evaluation of (38) requires the determination of 2 m − 1 intersections, one could think about using an estimation of (38) that is cheaper to evaluate. Indeed, given an even m < m, (38) can be bounded from below and above by taking the first sum only over k ≤ m and k ≤ m + 1, respectively. However, these so-called Bonferroni inequalities, cf., eg., [5], do not entail a useful estimate of (38) since the sums are strongly fluctuating in m. Furthermore, such estimates do not lead to a significant speed-up, in general, because the condition ½ {|∩i∈I ui|≥n} allows to abort the computation in many cases, anyway.
On the other hand, numerical experiences with substituting the term ½ {|∩i∈I ui|≥n} by ½ {|∩i∈I ui|≥n+j} were encouraging for small values of j. However, we do not pursue this approach further on and evaluate (38) only if m is smaller than a certain threshold ϑ.
We propose the following detailed form for Step [2] of Algorithm 4: Algorithm 5. Breadth-first search heuristics.

Step [2c]:
Go through all already evaluated i-tuples and compare their saved discrepancies withᾱ n . Add tuples with a discrepancy exceedingᾱ n to U .
Step [2d]: If all i-tuples have been already evaluated go to [2b].

Step [2f ]:
Evaluate a fraction κ = λτ U n /τ U i of all i-tuples, save their discrepancies and determine U κ . Update U := U κ .

Step [2j]:
Evaluate all i-tuples, save their discrepancies, and update U .
Go to [2b].   6. Numerical results. All algorithms have been implemented in C++, the halfspace representation of pos W has been determined by Fukuda's cdd+ ( [4]), and the linear program (20) has been solved with CPLEX 10.0 ( [12]). The following numerical results have been realized on a Pentium 4 with 3 GHz CPU and 1 GB RAM. Table 1 shows running times of Algorithm 1 for the optimal redistribution given a fixed support for different problem parameters. In R s , the case k = s stands for the rectangular discrepancy. Running time quickly increases with increasing support size n and number of facets k, due to the fact that the number of potential supporting polyhedra is equal to n+1 2 k . For n = 5, k = 3 and n = 20, k = 12 the latter is equal to 3375 and 7.36 × 10 27 , respectively. The dependency of the running time in terms of the initial number of scenarios N appears to be linear. Table 2 shows the resulting polyhedral and rectangular discrepancies. As one would expect, these are increasing in k and decreasing in n. Furthermore, it has been shown that the majority of Algorithm 1's running time is spent for the determination of the supporting polyhedra, while the time needed to solve the linear program (20) is insignificant. Table 3 illustrates the increase of Forward Selection Algorithm 2's running time, when reducing initial measures supported by N = 100 atoms in R 3 and R 4 w.r.t. the rectangular discrepancy. The growth of the running time is due to the fact that the inner problem, which becomes more complex with increasing dimension and number of remaining scenarios, has to be solved very often in the course of a Forward Selection. Consequently, this heuristic seems to be not appropriate for higher dimensional problems. We refer again to Example 2, where another heuristic is used to tackle the outer problem (15). Table 4 compares the results of a complete enumeration, the branch and bound Algorithm 4, and the Forward Selection Algorithm 2 in the case of the cell discrepancy. The initial measures were supported by N = 20 atoms in R 2 and R 4 , respectively. The percentage values in brackets specify the relative excess of the minimal discrepancy achieved by Forward Selection over the optimal value. The parameters used for the breadth-first search areL = min{N − 1, n + 4}, λ = 0.01, σ = 0.3 and ϑ = 50; the time needed by enumeration is significantly reduced, by up to 95% (R 2 , n = 11).
With regard to the complexity of the inner problem (16), we adopted a heuristic for the outer problem (15), within that (16) has to be solved only once. More precisely, to approximate the initial measure of Example 2, we used a Quasi-Monte Carlo (QMC) approach, based on the first n points of the Halton sequence with bases 2 and 3, cf. [16]. It is not completely clear, to the best of our knowledge, how such a low discrepancy sequence, initially designated to be close to the uniform distribution on the unit cube, should be tranformed to approximate an arbitrary probability distribution. However, there exist approaches for special classes of distributions. For Example 2, we applied the method of Hlawka and Mück [11]. The resulting (uniformly weighted) points are shown for n = 50 on the left side of Figure 4. The right side of Figure 4 shows the probabilities optimally readjusted w.r.t. the rectangular discrepancy by Algorithm 1. The first plot of Figure 5 shows the rectangular discrepancies between the initial distribution of Example 2 and the approximations obtained by QMC and its readjustment, respectively, as well as the corresponding running times (in seconds) of Algorithm 1. The discrepancy can be reduced by Algorithm 1 by up to 50% (Consequently, the reduction of the gap to the minimal discrepancy will be still larger.) The following example illustrates that a such an optimal adjustment w.r.t. to the right discrepancy may significantly improve the approximation quality.
Example 2. Consider a random variable ξ = (ξ 1 , ξ 2 ) taking values in R 2 , some p ∈ [0, 1], L ≥ 0, and the following chance-constrained optimization problem of type (11): This is a prototype model for so-called reservoir constraints with upper and lower level restrictions, as they are applied, for instance, in water management [20] or chemical engineering [10].
We assume that ξ's distribution consists of 1, 000 uniformly weighted points, sampled from the standard normal distribution in R 2 . Due to its simple form and low dimensionality, problem (39) can be solved by enumeration. We compared the optimal value with those obtained by the above-mentioned Quasi-Monte Carlo approach and its readjusted modification, supported by up to n = 50 atoms.
The plots of Figure 5 show the relative deviation of the optimal values obtained by the approximations from the intial optimal value for different parameters of p and L. It can be seen that a QMC approximation that has been adjusted by a call of Algorithm 1 performs in most cases significantly better than its unadjusted counterpart.
Appendix. In this section, we point out why it seems to be difficult to directly employ the extended B-discrepancy ζ r,B (P, Q), introduced in Section 1, for scenario reduction.
Consequently, when dealing with extended discrepancies, the inner problem (16) has the form minimize t subject to q ∈ S n , sup u∈Ur i∈I p i u i − j∈I∩{1,...,n} q j u j ≤ t for all I ∈ I B .
In contrast to (20), the supremum on the left side does not depend monotonously on |I| when I ∩ {1, . . . , n} and q are fixed, cf. Example A.1. Thus, to the best of our understanding, passing from (41) to the reduced system of critical index sets I * B and to an analogue of problem (23) in Section 2 seems to be impossible.