Two-stage stochastic standard quadratic optimization

Two-stage stochastic decision problems are characterized by the property that in stage one part of the decision has to be made before relevant data are observed and the rest of the decision can be made in stage two after data are available. It is generally assumed that while the data missing in stage one are not known, their probability distribution is known. In this paper, we consider a special form of such problems where the objective function is quadratic in the first and second stage decisions and the goal is to minimize the expected quadratic cost function. We assume that the decisions must lie in a standard simplex, but do not require convexity of the uncertain objective. It is well known that such quadratic optimization programs are NP-complete and thus belong to the most complex optimization problems. A viable method to attack such problems is to find bounds for the original complicated problem by solving easier approximate problems. We describe such approximations and show that by exploiting their properties, good but not necessarily optimal solutions can be found. We also present possible applications, such as selecting invited speakers for conferences with the aim to generate a community with high coherence. Finally, systematic numerical results are provided. © 2021 The Authors. Published by Elsevier B.V. This is an open access article under the CC BY license ( http://creativecommons.org/licenses/by/4.0/ )

where Q is an (n × n ) symmetric matrix, and n := z ∈ R n + : e z = 1 with e = [1 , . . . , 1] ∈ R n and denoting transposition. Note that a linear term 2 q z in the objective q (z ) with q ∈ R n can be incorporated easily in homogenous form by replacing Q with the symmetric matrix Q + eq + qe . Thus, a DStQP consists of finding (global) minimizers of a quadratic form over the standard simplex, so it is a global optimization problem which is in fact NP-hard. Even detection of local solutions is NP-hard ( Bomze, 1998 ). Nevertheless, there are several exact procedures which try to exploit favourable data constellations in a systematic way, and to avoid the worstcase behaviour whenever possible. See, among many others ( Bomze & Palagi, 2005;Gondzio & Yıldırım, 2021;Liuzzi, Locatelli, & Piccialli, 2019;Scozzari & Tardella, 2008 ).
On the other hand, the simplicity of the feasible set (the simplest polytope) and objective function (the simplest non-convex function) justify to view DStQP as the simplest of the hard problems ( Bomze, Schachinger, & Ullrich, 2018 ). Seen from the application point of view, this problem class is rich and versatile enough to allow for modeling a huge variety of problems, ranging from biology through economics to classical discrete problems like the maximum-clique problem ( Bomze & de Klerk, 2002;Bomze et al., 20 0 0 ).

Notation
Throughout the text, matrices are denoted by sans-serif capital letters (e.g. O will be the zero matrix and E will be the matrix of all ones), vectors by boldface lower case letters (for example, o will denote the zero vector, and e the vector of all ones) and scalars (real numbers) by lower case letters. Note that for the zero matrix O , the matrix of ones E , the vector of ones e , the i th column vector e i of the identity matrix I , the dimension will always be clear from the context. For two m × n matrices { A , B } we use A • B := trace (A B ) = m i =1 n j=1 A i j B i j as a shorthand notation for their Frobenius product. For a vector x , the i th entry will be denoted by x i . For a matrix X the jth entry in the i th row will be given as X i j . We denote by S n the set of symmetric matrices of order n . We also use the shorthand notation [1 : S] := { 1 , . . . , S } ⊂ N .
The data of a DStQP are fully described by the matrix Q . However, we not always have full information on these parameters. Recent attempts to deal with uncertainty in Q from a robust perspective can be found in Bomze, Kahr, & Leitner (2021a) . Here we want to follow a different approach, but before we want to motivate this study by several examples.
Example 1. The organizer of a conference has to decide about whom to classify as invited speaker and which contributed papers will be accepted for oral presentation. It is assumed that the frictions between speaker i and speaker j are given by a real number Q i j . The (relative) importance given to speaker i is denoted by z i ≥ 0 . The deterministic problem of minimizing the frictions is given by the quadratic problem (1) . The set of contributed speakers as well as their frictions among themselves and between them and the invited speakers are not known at the beginning. This requires to replace the matrix Q by an uncertain matrix ˜ Q (see Section 3.1 for details) which is dissected into the parts A (frictions among the invited speakers), ˜ B (frictions between contributed speakers and invited speakers) and ˜ C (frictions between the contributed speakers). The pertaining decision problem for the organisation of a maximally friction-less conference is as described before.
Example 2. A car dealer has to decide about his portfolio of car types in his sales program. He wants to include as different car types as possible in order to cover the market. Denote by Q i j the similarity between type i and type j. As before, the relative size of type i is denoted by z i ≥ 0 . The problem under full information is again given by problem (1) . If however some types are not yet available and their similarities are modeled as a random ones, we arrive at the decision problem described in Section 3.1 .
Example 3. A social network contains n members. For each pair (i, j) of members, the value Q i j indicates the degree of dissimilarity. In order to find some representatives, the network owner wants to select a number of persons, which are maximally different from each other. In a relaxed form of this problem, he wants to find coefficients z i ≥ 0 such that z Qz is maximal. Notice that maximizing z Qz is equivalent to minimizing z (−Q ) z , which is contained in model (1) , since no assumptions about Q are made.
The stochasticity enters the model if we assume that the similarity of some members is quite known, while the information about the similarity of others is not known, but follows some probability distribution, which will be revealed later. Example 4. The minimal variance portfolio has to be found by investing in two classes of assets. The first group of assets is well known and their covariance matrix is well estimated. A second group of assets is new and their interrelation is not well known. However, part of the investment has to be made immediately, while the rest can be invested later. Notice that due to the lack of information about the second stage assets, it is not guaranteed that the random covariance matrices are almost surely positivesemidefinite. See Section 5.5 for a numerical example.
In this paper, we consider a special form of two-stage stochastic programs which we propose to call two-stage stochastic standard quadratic optimization problems (2(St) 3 QP) where the objective function is quadratic in the first and second stage decisions and the goal is to minimize the expected quadratic cost function. We assume that the decisions must lie in a standard simplex, but do not require convexity of the uncertain objective resulting in a quadratic optimization program which is NP-complete. In order to solve such a difficult decision problem we propose different types of approximations: by bounding the recourse function; by providing an upper bound based on a Pairwise Frank-Wolfe algorithm, an efficient first-order procedure similar to the Infection-Immunization-Dynamics (IID) algorithm (see Rota Buló & Bomze, 2011 ), applied to the scenario-tree formulation; by providing scalable copositive lower and upper bounds applied to the scenariotree formulation; by providing lower bounds based on dissecting the probability measure (see Maggioni & Pflug, 2016 ).
Finally a computational study shows in a systematic way how the different types of bounds perform in terms of computation time and quality of the solution.
The paper is organized as follows. In Section 2 , basic facts on two-stage stochastic optimization problems are recalled and a literature review on their bounds provided. In Section 3 we introduce the class of two-stage standard quadratic optimization problems and bounds on their recourse function. Bounds on the scenario problem are provided in Section 4 by means of upper bounds based on the Infection-Immunization-Dynamics (IID) algorithm, scalable copositive lower and upper bounds, and lower bounds based on dissecting the probability measure. In Section 5 numerical results are provided, corroborating our approach. Conclusions follow.

Two-stage stochastic optimization: basic facts
The following mathematical model represents a general formulation of a two-stage stochastic optimization problem in which a decision maker needs to determine x in order to minimize expected total costs ( Birge & Louveaux, 2011;King & Wallace, 2012;Pflug & Pichler, 2014;Ruszczynski & Shapiro, 2003 ): where x is a first-stage decision variable restricted to the set X ⊆ R n 1 + , and E ξ stands for the expectation with respect to a random vector ξ, defined on some probability space ( , A , P ) with support and given probability distribution P on the σ −algebra A . The function h 2 is the value function of the second stage optimization problem defined for a specific realization ξ of the random vector ξ which models the minimization of the cost q (y; x , ξ ) associated with adapting to information revealed through the realization ξ which also determines the feasible set Y (x , ξ ) . The term r(x ) := E ξ h 2 x , ξ in (2) is referred to as the recourse function. Introducing the function z(x , y, ξ ) := f 1 (x ) + q (y; x , ξ ) , problem (2) can be rewritten as follows: and its solution x * is called the here and now solution . This class of problems brings computational complexity which increases exponentially with the size of the scenario tree , representing a discretization of the underlying random process ξ ∈ .
Even if a large discrete tree model is constructed, the problem might be untractable due to the curse of dimensionality. Several are the bounds and approximations proposed in the literature to cope with this problem.
Among them, Frauendorfer (1988) , Hausch & Ziemba (1983) , Huang, Vertinsky, & Ziemba (1977a) , Huang, Ziemba & Ben-Tal (1997b) generalize Jensen's inequality ( Jensen, 1906 ) for lower bounding and the Edmundson-Madansky ( Edmundson, 1956;Madansky, 1960 ) inequality for upper bounding. An alternative method is to aggregate constraints and variables in the extensiveform and solve the resulting problem ( Birge, 1985;Rosa & Takriti, 1999 ). Easy-to-compute bounds have been also proposed in literature by solving small size sub-problems instead of the big one associated with the large discrete scenario tree model (see for instance Kuhn, 2008 Maggioni, Allevi, & Bertocchi, 2016;Maggioni & Pflug, 2016;Sandıkçı, Kong, & Schaefer, 2013 ). An alternative approach is to construct two approximating trees, a lower tree and an upper tree, the solution of which lead to upper and lower bounds for the optimal value of the original continuous problem. The advantage of this approach is that it generates intervals in which the optimal value lies under guarantee. Results in this direction were for the first time obtained by Frauendorfer & Schürle (2009) , followed by Frauendorfer, Kuhn, & Schürle (2011) , Kuhn (2005 . In Kuhn (2005) , barycentric discretizations are adopted in a more general setting investigating convex multistage stochastic programs with a generalized nonconvex dependence on the random variables. In Maggioni & Pflug (2019) , the authors generalize the bounding ideas of Frauendorfer & Schürle (2009) , Frauendorfer et al. (2011) , Kuhn (2005 to not necessarily Markovian scenario processes and derive valid lower and upper bounds for the convex case. They construct new discrete probability measures directly from the simulated data of the whole scenario process. In the following we will present, for the new class (2(St) 3 QP) , new lower and upper bounds to their objective function value that are scalable.

Problem formulation and recourse function
What happens if the problem data of a DStQP are uncertain? While the constraints in (DStQP) are out of dispute as probabilities or portfolio shares always are non-negative and sum up to one, the only variable problem data are to be found in the matrix ˜ Q . Now suppose a (possibly) small n 1 × n 1 principal submatrix A of ˜ Q is known (more or less) exactly, whereas the rest of the problem data are subject to uncertainty with known probability distribution: C ] are the uncertain data. Decomposing z := [ x , y ] with x ∈ R n 1 + and y ∈ R n 2 + , we arrive at the following problem reformulation under data uncertainty: and the recourse function given by with P x : As announced, we refer to problem (5) as the two-stage Stochastic StQP (2(St) 3 QP) . If we denote by y(x ; ξ) a minimizer of the random (via ξ) quadratic optimization problem min parametrized by x , the recourse function can also be written as Typically, the expression inside the expectation brackets is neither linear in ξ nor quadratic in x because of the dependence of y(x ; ξ) on both x and ξ. Before we proceed to attack this problem, let us note some bounds for the recourse function.

Bounding the recourse function
Simple closed-form upper bounds for r(x ) are obtained by looking at their counterparts for (DStQP), the most elementary being Defining y = 1 1 −e x y, we have that the recourse function r(x ) can be rewritten as the minimum of n 2 (possibly non-convex) QPs over T n 1 which can be reduced to n 2 DStQPs or to an equivalent copositive optimization problem. Likewise we can use the edge-minimization upper bound using the argument of tightness with high probability ( Chen & Peng, 2015;Chen, Peng, & Zhang, 2013 ): which by a similar method can be used for an upper bound on z * stoch with a slightly more involved argument to fractional QPs with linear denominators over n 1 . The fractions appear since the minimal values along a convex edge 1 conv e i , e j of a DStQP are given by the fractions We reserve dealing with this type of problem in a way similar to Amaral, Bomze, & Júdice (2014) for a future research project but note beforehand that the common factor (1 − e x ) occurring in the estimates of r(x ) does not depend on i, j which may simplify analysis considerably. Turning to lower bounds, we note three variants: the trivial one Nesterov's bound and a stronger one than both above from Bomze, Locatelli, & Tardella (2008) Despite being of closed form, it is not obvious how to proceed using these for a tractable procedure solving (2(St) 3 QP) .

Bounds on (2(St) 3 QP) via scenario tree discretization
In stochastic optimization is customary to approximate the stochastic optimization problem by discretizing the underlying random process ξ ∈ and subsequently solve the so called scenario problem, i.e. the stochastic optimization problem where the true distribution is replaced by its discretization. Several methods are adopted in the literature for discretizing distributions and generating scenarios. Among the most common we list: Conditional sampling, sampling from specified marginals and correlations, moment matching, path-based methods and optimal discretization. For an overview of scenario-generation methods, see Pflug & Pichler (2011) .
In this section we will discuss upper and lower bounds on the scenario problem based on stochastic and quadratic optimization theory. But first let us derive the actual form of the scenario problem in case of (2(St) 3 QP) .
Suppose now that the distribution of ξ has finite support. This means that ξ has a finite number of possible realizations, called scenarios ξ s := ( ˜ The whole two-stage stochastic problem can be approximated by the following large-scale deterministic equivalent quadratic problem z * stoch := min where z = [ x , y 1 , . . . , y S ] collects all decision variables. Since (9) describes a class of non-convex QPs which contains the StQP as a special case, it is NP-hard. We will now proceed in presenting some bounds which can be computed with reasonable effort.

Upper bounds by Frank-Wolfe variants
Frank-Wolfe variants are fast and efficient first-order methods for smooth, linearly constrained optimization problems of the form with P a polytope in R d of which the set of vertices V (P ) are known or can be generated quickly on the fly if they are too many. A recent survey is offered in Bomze, Rinaldi, & Zeffiro (2021) . These methods enjoy increasing popularity in particular for large-scale problems as occurring in Data Science, also because of their good properties like global convergence and support identification in finite time ( Bomze, Rinaldi, & Rota Bulò, 2019;Bomze, Rinaldi, & Zeffiro, 2020;Lacoste-Julien, 2016 ), even for non-convex problems. They are iterative primal methods which employ, at an iterate z k , the knowledge of the gradient g k := ∇ f (z k ) to define directions d k along which then a line search is performed. Frequently employed line search strategies include predetermined step sizes (e.g., constant or γ k with a parameter γ > 0 ) as well as approximately optimal ones (à la Armijo) or even exact line searches over the interval Fortunately, for our problem (9) with d = n 1 + n 2 S, both the structure of the feasible polytope P as well as the objective f enable a quick implementation of the algorithm even for the exact Furthermore, since f is quadratic, a closed-form solution for the exact step size α k is given by means of the quadratic form of the Hessian H = D 2 f (z ) (the only second-order aspect of our approach; as f is quadratic, H is constant across all iterations): else.
. (10) Convergence and active set identification with this choice of step size was recently discussed in Bomze et al. (2019Bomze et al. ( , 2020 . For better numerical stability, we resorted to a damped variant where 0 < β < 1 , with β a tuning parameter. By thorough numerical experimentation we found that β = 1 2 is most efficient. We still need to define the search direction d k . To this end, we employ the so-called Pairwise Frank-Wolfe (PFW) approach, which combines the classical Frank-Wolfe direction with the Away-step Frank-Wolfe direction. More precisely, an update on an iterate z k is performed by selecting a vertex v k ∈ P towards which we wish to move, and a vertex w k ∈ S k ⊆ P from which we wish to move away, The vertex v k is chosen as the vertex that minimizes a linear model of f based on ∇ f (z k ) over P , and w k is chosen as the maximizer of said model over S k . The latter set is critical in this calculation. Since z k ∈ P , it is a convex combination of vertices of P , and S k is the set of currently used vertices. The idea behind moving away from vertices in S k is to eliminate socalled bad atoms in the description of the current iterate in every iteration. Of course, the set S k depends on z k and it is imperative for algorithmic performance to update S k in an efficient manner. We will discuss our solution to this obstacle in more detail in the discussion below.
First, we will show that determining v k and w k in every iteration amounts to extracting the extremes of a list of at most n 1 numbers plus S such extractions from lists of at most n 2 numbers. The argument requires the following characterization of vertices of P (recall that an x ∈ P is a vertex or extremal point of P if 0 < λ < 1 and We need the notation e i for the i th column of an identity matrix I d of suitable order, and abbreviate by J := [1 : n 2 ] S .

Theorem 1. The vertices of the set
Proof. The proof proceeds in two steps. First we show that the alleged points are indeed vertices of P , second we show that any JID: EOR [m5G;15:53 ] other point in P is a convex combination of our alleged points, so that there cannot be any other extremal point of P outside V (P ) . So consider x 1 , x 2 ∈ P , 0 < λ < 1 so that e i = λx 1 + ( 1 − λ) x 2 for any i ∈ [1 : n 1 ] . Then for any j = i we get from λ(

ARTICLE IN PRESS
The analogous argument can be made for S s =1 e j s + n 1 +(s −1) n 2 and any j ∈ J by applying the same analysis to each of the S segments of length n 2 individually. Hence the points in V (P ) are vertices of P , and we will now show that these are the only extremal points of P . So let x = (x , y ) ∈ P with 0 < e x < 1 . Then Evidently the two vectors on the righthand side are members of P , so that x is a convex combination of these two vectors. Further, since e x e x = 1 , we see that x e x is a convex combination of e i , i ∈ [1 : n 1 ] . Thus our claim will follow if we can show that the second vector is a convex combination of members of S s =1 e j s + n 1 +(s −1) n 2 : j ∈ J . To this end, we shift our attention to the vector ˆ y := ˆ y 1 , . . . , ˆ y S ∈ R Sn 2 . For every j ∈ J we define e j := e j 1 , . . . , e j S ∈ R Sn 2 and then need to show that the following system has a solution λ: By Farkas' lemma we can equivalently show that Note, that for every s ∈ [1 : S] we have that e j s ˆ z s ≤ˆ y s ˆ z s for some j s ∈ [1 : n 2 ] , because the left-hand side equals a single component of ˆ z s while the right-hand side is a convex combination of all the components of ˆ z s due to e ˆ y s = 1 and ˆ y s ≥ 0 . Thus, for any ˆ z we can always find j ∈ J such that S s =1 e j s ˆ z s ≤ S s =1 ˆ y s ˆ z s , which proves the inconsistency of the alternative system of equations. This concludes the proof, leaving the trivial cases e x ∈ { 0 , 1 } to the readers.
Let us note here a particular distinction contrasting our approach with usual Away-Step strategies in Frank-Wolfe variants.
First note that the number of vertices of P is exponential in | S| , so that in general enumeration of all of them to identify bad atoms would be prohibitive. We circumvene this issue by shifting our attention from vertices active in the k th iterate z k to the nonzero coordinates of that iterate. More specifically we will define for any iterate z k , Hence, once a coordinate of z k is zero, all associated vertices are dropped from the description of S k . This reduces the computational burden, as merely the index of the dropped coordinates have to be stored, and optimizing a linear function over S k becomes a matter of identifying the respective extreme objective function coefficients of the nonzero coordinates. This is due to the particular structure of our optimization problem. We detail the procedure in the following theorem where, for simplicity of exposition, we shortly suppress the iteration counter in notation: and similarly for the max operator but restricted to s upp z : Let e j denote the jth column of I d . Then the PFW search direction d is The conditions on ( j s ) s ∈ [1:S] in (13)  Proof. At an iterate z , PFW selects the search direction among all vertex differences v − w such that g v = min g v : v ∈ V (P ) and g w = max g w : w ∈ V z (P ) , where V z (P ) = ∅ denotes those vertices with have positive weight in at least one barycentric coordinate representation of z with respect to V (P ) . Taking convex combinations, it is always possible to have all vertices in V z (P ) with positive weights in a barycentric representation (call it the least sparse one). The resultant d is always a feasible first-order descent direction, g d ≤ 0 . In our case, we get from Theorem 1 that the vertices of P are given by (12) : and (considering a least sparse representation of z ) If v = e i with i ∈ [1 : n 1 ] , we obviously get g v = g i . Further, since any selection j s ∈ [ n 1 + (s − 1) n 2 + 1 : n 1 + sn 2 ] across s ∈ [1 : S] gives a vertex v with and since extremizing the summands g j across all j ∈ [ n 1 + (s − 1) n 2 + 1 : n 1 + sn 2 ] (with j ∈ s upp z in case of maximization) and summing over all s ∈ [1 : S] , extremizes the sum in (15) across these vertices, the claim follows.
To summarize the PFW algorithm, we need also a stopping criterion for which we use the Frank/Wolfe gap Lacoste-Julien (2016) at an iterate z ∈ P with gradient g = ∇ f (z ) : JID: EOR [m5G;15:53 ] Algorithm 1. Pairwise Frank/Wolfe method for (9) .

ARTICLE IN PRESS
Input: Problem data, bound ε > 0 for the Frank/Wolfe gap, initial guess z 0 ∈ P ; Output: upper bound f (z * ) Generate search direction d k as in~Theorem~2; calculate step size α k as in~ (17); Multistarts may be necessary to improve upon the local quality of the bound.

Scalable copositive bounds
In this section we will derive lower and upper bounds to (9) based on a well known reformulation strategy described in the following theorem originally from Burer & Anstreicher (2013) . However, the special structure of our problem (9) allows for an essential reduction in size which enables to solve much larger problems than it would have been possible with the standard reformulation. For better understanding, we first shortly refer the general case also including quadratic constraints -which are not present in (9) . Theorem 3. Let K ⊆ R n be a cone, and let be a feasible set of a quadratically constraints quadratic optimization problem and denote where clconv (C) stands for the closure of the convex hull of a set C. Then for any Q ∈ S n , q ∈ R n and ω ∈ R we have The theorem rests on a few basic ideas. First note x Qx = Q • xx by elementary properties of the trace function. The first step to obtain the right-hand side problem is lifting, i.e., to replace term xx ⊆ S n by a matrix variable X ⊆ S n . In this way we obtain a linearization of the objective where X i j replaces the quadratic term x i x j . The second, and much more challenging ingredient is G(F ) .
By construction it is convex and all its extreme points are of the form x , xx where x is a member of the original feasible set F. The linearized problem attains its optimal value at an extreme point of its feasible set, which by construction corresponds to a feasible solution of the original problem. So, on one hand the righthand side problem gives a lower bound since any x ∈ F gives a solution (x , xx ) ∈ G(F ) . On the other hand it has an optimal solution of the form x , xx with x ∈ F, thus yielding an upper bound on the original problem. Hence the problems are equivalent.
The cone of completely positive matrices CPP n is intensively researched (see Berman & Shaked-Monderer, 2003 ). It is well known, that certifying membership in CPP n is NP-hard so that the above convex reformulation is intractable. However, there are powerful outer approximations of this cone the simplest of which is the doubly nonegative cone DN N n := S n + ∩ N n , i.e. the intersection of the cone of positive-semidefinite matrices and the nonnegative orthant. For applications in optimization see, e.g. Bomze (2012) .
In the spirit of Theorem 3 we will provide a convex reformulation of (9) described in the theorem below.
Theorem 5. Consider the problem (P) : min x , y 1 , ... , y S x Ax + 2 a x + S s =1 (2 x B s y s + y s Cy s + 2 c s y s ) The following conic optimization problem gives a lower bound. Further, if for all s ∈ [1 : S] we have c s = o , B s = b s e and b s ∈ R n 1 , then the bound is actually tight: Proof. Take a feasible solution (x , y 1 , . . . , y S ) to (17) and define X = xx , Z s = y s x , Y s = y s y s , s ∈ [1 : S] . Using again the identity A • xx = x Ax , we have identical objective function values. Clearly, the first S linear constraints are fulfilled and from squaring both sides we get 1 2 = (e x + e y s ) 2 = (e x ) 2 + 2(e xe y s ) and hence, denoting with val(·) the objective function value of a problem, we have val (P ) ≥ val (R ) .
For the converse, under the additional assumptions we can always transform the objective function

ARTICLE IN PRESS
Note that by Burer's reformulation (see Burer, 2009 ) the set X is the projection of G(F ) onto the (x , X ) coordinates, where F = (x , y ) ∈ R n 1 + n 2 + : e x + e y = 1 .
Since the extreme points of a projection of a convex set correspond to extreme points of the projected set, it follows that the extreme points of X are of the form (x , xx ) . The feasible set in the description of φ(x , X ) (we suppress the index for the moment to avoid double indices in the following paragraphs) is given by which again is a projection of G(F ) , but this time on the (Y) coordinates. We will first show that Further, by Burer's reformulation we have the following representation where (x i , y i ) ∈ R n 1 + × R n 2 + , e x i + e y i = 1 , and λ i ≥ 0 , for all i ∈ [1 : k ] as well as k i =1 λ i = 1 . We also have It is well known (see Bomze et al. (20 0 0) ) that Ȳ (x , X ) is a compact, conic intersection that has extreme points that are rank one matrices. We thus have so that e ȳ = 1 . Now consider the following problem We have since the objective is linear and the extreme points of X are rank one matrices. We have x ∈ R n 1 + , 1 − e x ȳ s ∈ R n 2 + , s ∈ [1 : S] and e x + 1 − e x e ȳ s = e x + 1 − e x = 1 so that we have a feasible solution for (P) that gives the same objective function value. In conclusion we have Note that in case we implemented the changes in the objective function mentioned in (19) , the objective function of the reformulation becomes However, we can undo these changes so that we truly arrive at the representation that is stated in the theorem. We can rearrange the transformed objective function as follows: and consider the S terms in the sum individually. We have , so that we can again consider the decomposition in (20) . As before we suppress the index s of y s , Z s , Y s , b s and C s in order to avoid double indices. The components of the decomposition again fulfil e x i + e y i = 1 for all i ∈ [1 : k ] , which allows us to rearrange the terms as follows: so that we recover the form stated in problem (R).
Regardless whether the exactness condition is met or not, the optimal value of (R) is always a lower bound of (P). This remains true if we replace the intractable conic problems by tractable ones. Moreover, taking a part of its solution z = [ x , y 1 , . . . , y S ] , we see that z is (P)-feasible, so we also get an upper bound by the (P)feasible value f (z ) which is generally good (the only problematic term is indeed the bilinear one). Note that (17) can be reformulated using Theorem 4 where the size of the completely positive matrix block would be n = n 1 + Sn 2 + 1 , so that the number of variables would increase quadratically with S. On the other hand, in Theorem 5 we have S completely positive matrix blocks of order n 1 + n 2 + 1 so that the number of variables is linear in S. It is obvious that this yields substantial advantages in computation time and memory consumption over the generic CP-representation formulated in Theorem 4 ; and numerical experience confirms this expectation. JID: EOR [m5G;15:53 ]

Lower bounds based on dissecting the probability measure
In order to have a good description of the uncertain parameters, the number of scenarios S to be considered in z * stoch is typically very large, making the problem computationally intractable. Nevertheless, chains of lower bounds can be constructed by solving sets of group sub-problems less complex than the original one, and recalculating the probabilities of each scenario in the group accordingly (see Maggioni & Pflug, 2016 ). We briefly recall now the construction.
Suppose again that the distribution of ξ has a finite support, i.e. ξ has a finite number of possible scenarios ξ s = ( ˜  (8) ), and therefore the same is true for z 1 * .
Better lower bounds can be obtained by dissecting the probability measure in convex combinations of probabilities defined in two or more scenarios. We consider the following refinement chain: . . .
where each row is a collection of subsets of the probability space with the property that their union covers the whole space for all j and that each set i is the union of sets from the next more refined collection i p s , we get to a chain of lower bounds expressed as follows z 1 * ≤ z 2 * ≤ · · · ≤ z j * ≤ · · · ≤ z * stoch .
Notice that the higher the index j, the fewer problems have to be solved but with an increasing number of scenarios. We refer to Maggioni & Pflug (2016) for examples of constructions of refinement chains. Notice that for all these problems z j * we can use the lower bounds defined in the previous section.

Numerical experiments
We conduct a series of experiments that investigate the performance of the upper and lower bounds proposed in the previous , what is the trade off between solution quality an size of the subsets of the probability space? (e) How consistent are the results for the approximations described in Section 4.3 for a fixed size of subsets, when the grouping of the scenarios is performed randomly? (f) How do our procedures perform when confronted with a real world problem such as the portfolio optimization problem?
In our experiments we generated data by two different processes for obtaining the data matrix ˜ Q : For the first one, the elements of the matrix ˜ Q represent the mutual distances of n = n 1 + n 2 points sampled under an uniform distribution from the unit square [0 , 1] × [0 , 1] ; the n 1 points are fixed and generate the principal n 1 × n 1 submatrix A , while for the other n 2 points it is only known that they lie in a square with side length 2 ε. The actual location of the n 2 unknown points follows a uniform distribution in these squares and generates the matrices ˜ B s and ˜ C s , s ∈ [1 : S] of dimension n 1 × n 2 and n 2 × n 2 respectively. An example is given in Fig. 1 where the asterisks represent the n 1 = 3 preset points, and the squares the uncertainty regions of the other n 2 = 5 points.

Comparing the copositive bounds
In the following we assessed the value and shortcomings of the compact copositive bound proposed in Section 4.2 over the traditional copositive bound from Burer (2009) . To this end we picked different configurations for (n 1 , n 2 , S) and for all of them, we generated 20 instances from P 1 and P 2 , respectively. Then we solved the instances using the DN N relaxations (where the cone of completely positive matrices CPP n is replaces by the outer approximation given by the doubly nonegative cone DN N n = S n + ∩ N n ) of the two copositive reformulations. The experiments of this section were conducted on an AMD Ryzen 7 1700X Eight-Core 3.4 gigahertz CPU, and 16 gigabyte RAM.
Note that from each of these relaxations we do not only get a lower bound, i.e. the optimal value of the relaxation, but also an upper bound obtained by considering the feasible values for the considered decision variables. If both bounds coincide we actually can certify optimally. Table 1 summarizes the results obtained under the first data generation strategy, while Table 2 summarizes the results under the second one. In each of them the column Time gives the computation time in seconds. In the column Gap we give the average percentage gaps between the upper bound and the lower bound produced by the relaxations, and under max Gap we give the biggest such gaps that was encountered across all 20 instances. Column DNN1 refers to the results for the traditional copositive bound proposed in Burer (2009) and column DNN2 the results for our newly proposed bound given in Section 4.2 . Finally, column Vs . reports the average percentage gap between the two copositive relaxations (avgG) and the biggest such gap across the 20 instances (maxG) respectively. Notice that two objective function values have been considered to be numerically identical if their percentage gap is smaller than 10 −3 percent, which seems reasonable given the accuracy one can expect from semidefinite optimization software. Table 1 shows that both the bounds are tight. It is a known phenomenon that the bounds obtained from the DN N relaxations of copositive reformulations is often exact on random instances, and gaps start to appear only after introducing special structures. It is noteworthy that there is no gap between the two copositive bounds: only the copositive bound from Burer (2009) is provably tight, while the bound we proposed is provably tight only in special cases. In conclusion our experiments show that, if there is a gap between the two copositive bounds, then it must be vanishingly small for instances generated by the distance matrix strategy. This is especially encouraging if we take into account the performance with regards to computation time (column Time ). We see that our new bound scales much nicer with the number of scenarios S, showing a desired property for stochastic optimization problems, since the quality of the solution depends on the number of scenarios that are considered. Even instance types where the traditional DN N lower bound was not solvable due to memory capacity limits (see instances (5,40,10) and (5,5,60)), our scalable lower bound was obtained in few seconds.
In Table 2 we see that under the uniformly random strategy, we were able to create instances where the relaxations actually exhibited a gap between the upper and lower bounds they produced (see columns Gap and max Gap ). We found that the scalable copositive bound produced slightly bigger gaps on average. Also the percentage gap between the lower bounds seems to widen but remaining very small. It is questionable whether or not these are reals gaps or whether our threshold for numerical identity (i.e. 10 −3 ) is too large. We have so far not been able to produce instances where the gaps between lower bounds are comfortably outside of what could have happened by numerical inaccuracies.

Optimality gaps without warm-starting
In Section 5.1 we have seen that we can create instances where the upper bounds generated by the copositive relaxation do not close the optimality gap. However, we can produce upper bounds using the PFW algorithm that might improve upon the upper bound generated using the copositive approach. This can be done by either starting the algorithm at an arbitrary point, perhaps restarting it multiple times, or by warm-starting the algorithm at the point previously calculated via the copositive procedure. In this section we focus on the result we obtain from cold-starting the procedure from the point x = (n 1 + n 2 ) −1 e and likewise y s = (n 1 + n 2 ) −1 e , all s ∈ [1 : S] , or else from multi-starting JID: EOR [m5G;15:53 ] Table 4 Results of cold-starting the upper-bound procedures: best bounds and solved instances.

ARTICLE IN PRESS
Best Solved it from points chosen at random. As a benchmark we also run Gurobi and fmincon , without providing an initial solution. We chose to use fmincon as benchmark since it employs an easily available interior point solver that is able to provide local solutions to non-convex problems. Hence, it is conceptually different from our Frank-Wolfe type approach but aims at achieving a similar goal, which makes the comparison worthwhile. Gurobi is a global solver, so that a direct comparison is not really meaningful. However, it is interesting to see whether our approach may aid a global procedure in that it is able to narrow the optimality gap faster, so that the results of such a pre-processing can then be further utilized by the global solver. The latter is of course only attractive if good upper bounds are obtained sufficiently fast, which is what we tried to demonstrate in these experiments. For the experiments of this and the succeeding section we used a Intel(R) Core(TM) i5-9300H CPU 2.40 gigahertz with 16 gigabyte of RAM. In order to increase the gap in the copositive relaxation, we slightly modified the uniformly random strategy; now Our reasoning was to find instances where the copositive bounds alone are not enough to satisfactorily narrow the optimality gaps, so that more elaborate machinery is necessary. The results of our experiments are summarized in Table 3 . For each configuration of (n 1 , n 2 , S) , indicated in the first column, we generated 20 instances in this manner.
We take it that the notation in the table is self explanatory except for the mnemonic PFWM, which reports the result of multistarting the Pairwise Frank-Wolfe algorithm at 100 random points, and then taking the best found solution as output. The optimality gaps are calculated with respect to the dual bound produced by the copositive relaxation. We see that Gurobi always exhausted its time limit, which we set to 300 seconds, but produced quite good gaps on average. However, the multistarted PFW procedure produced tighter bounds much more cheaply, albeit not as cheap as fmincon , which on the other hand performed similar to the PFW-algorithm in terms of optimality gap but much worse in terms of computation time.
It is also interesting to see how often a given procedure performed best among all the primal bounds and how often each of these were able to solve a given instance (see Table 4 ). We considered an instance solved if the reported gap (with respect to the dual bound from the copositive relaxation) was smaller than 0 . 01% . We see again that PFWM fares well with respect to both features. Interestingly, the instances where either n 1 of n 2 were large, turned out to be troublesome for all procedures. Note that in the numbers in the first block do not all add up to 20 since if the best and second best solutions were within a margin of 10 −5 , both were awarded the 'best'-label.
These experiments also offered an opportunity to evaluate the potential use of the dual bounds from the copositive relaxation in a global optimizer like Gurobi . One key challenge is to provide bounds tight enough so that optimality can be proven quickly. In Table 5 we see that in our experiments, the dual bounds provided by the conic relaxation were on average better than the bounds produced by Gurobi itself, within the set time limit. Also, we never certified a solution to be optimal based on our criterion when using Gurobi 's dual bound, while, when using the copositive bound, we were able to do so at least in some cases.

Narrowing the optimality gaps by warm-starting the upper bound procedures
We now investigate the usefulness of the copositive primal solution as a point for warm-starting Gurobi and the other primal procedures. To this end we used the same instances generated for the previous section and the respective copositive solution and warm-started Gurobi , fmincon and the PFW-algorithm at that solution, in order to further narrow the optimality gap. We also performed a two step procedure, where we used the improved solution generated by the PFW-algorithm as a starting point for Gurobi . We summarize the results in Table 6 . JID: EOR [m5G;15:53 ] Table 7 Warmstarting the upper bound procedures: best bounds and solved instances.

ARTICLE IN PRESS
Best Solved  We see that on average DNN+FW+ Gurobi produced the best bounds. Of course, in theory, Gurobi would eventually solve the instance given enough time, which might be prohibitive. We again operated Gurobi under a 300 second time limit, which was always exhausted. Similar results were obtained via DNN+FW in a fraction of the time, which in our opinion is a desirable feature of our techniques. Using fmincon , the bounds were weaker on average and also took longer to compute. The picture changes slightly when we look at instances solved and best bounds provided as depicted in Table 7 .
We see that fmincon , more often than the other methods, provides the best bound and is able to find an optimal solution (based on our criteria). This seems to indicate that fmincon has better potential to find good bounds, but is less reliable than the other methods and also slower than the PFW.

Experiments on dissecting the probability measure
This section presents some computational tests for problems being intractable involving a large number of scenarios. For this section we again used AMD Ryzen 7 1700X Eight-Core 3.4 gigahertz CPU, and 16 gigabyte RAM. We generate data by the distance matrix strategy with n 1 = n 2 = 15 , = 0 . 1 and considering S = 200 equiprobable scenarios. We constructed lower bounds as described in Section 4.3 by partitioning the scenario tree in disjoint subgroups.
We now describe the way we constructed the groups of each layer j of the dissection chain. We fix a certain group size, say n g , with 1 < n g < S. The first layer always corresponds to the wait and see (WS) approach, i.e. we consider each scenario independently so that the number of "groups" is S 1 := S. In the second layer, we randomly group the scenarios into groups of size n g . If the total number of groups S 1 is not evenly divisible by n g , a remainder group is constructed with cardinality S 1 mod n g . This results in S 2 groups, with S 2 = S 1 div n g if the division has no remainder, or S 2 = (S 1 div n g ) + 1 if there is a remainder and an additional remainder group is added. These S 2 groups are again randomly grouped into super-groups of n g groups of the previous layer. Thus, for the jth layer the number of super-groups is given by Fig. 2 provides a boxplot of the percentage gaps obtained from a chain of lower bounds composed by 5 layers over 20 random dissections as described below. The first bound in the chain ( z 1 * ) refers to the wait and see (WS) solution and it has been obtained solving the S 1 = 200 scenarios independently. The next one ( z 2 * ) by partitioning the scenario tree into S 2 = 40 subproblems of n g = 5 scenarios each, solving them independently and taking their mean. These sets were again grouped making S 3 = 8 subproblems of 25 scenario each to obtain z 3 * . Finally z 4 * has been obtained by grouping 5 groups of 25 scenarios in one subproblem of 125 scenario and the remaining 3 groups of 25 scenarios in one of 75 with a total of S 4 = 2 . The last level in the chain z 5 * refers to the original problem z * stoch . The decision which scenarios, or sets of scenarios were grouped was done at random and we created 20 random dissections in this manner. For each of the dissections we calculated z j * for each layer j ∈ [1 : 5] using the scalable copositive bounds (which empirically we have found to be tight for the distance matrix strategy) and subsequently calculated the percentage gaps between z j * and z * stoch ≡ z 5 * . Results show that even in the second layer, where groupings of five scenarios are considered, the values for z 2 * substantially outperform z 1 * across all 20 dissections passing from a percentage gap of -0.5658 to -0.1236. For deeper layers z 3 * and z 4 * , the quality of the bounds as well as the consistency between dissections improves, but the marginal improvement diminishes.
We repeated the experiment, this time considering a new tree with again 200 scenarios but this time with n g = 2 . Fig. 3 summarizes the results of this process confirming the fact that increasing the layer j, the quality of the bound improves, while the marginal improvement diminishes. In particular we notice that the seventh layer seems to outperform the eighth and the "true value" z * stoch . The reason for this phenomenon is the fact that deeper into the layers we have to solve exponentially larger semidefinite optimiza- Fig. 2. Percentage deviation from the optimal objective value z * stoch for level j of a refinement chain composed by 5 layers.

Fig. 3.
Percentage deviation from the optimal objective value z * stoch for level j of a refinement chain composed by 9 layers. tion problems which are known to have problems with scaling. Specifically, the accuracy of the solution suffers from the scale of the problem so that in fact the final few layers are numerically indistinguishable when taking into account the noise added by the solution method. This shortcoming, at least in our experiment, is partly redeemed by the fact that already for the shallow layers the quality of the bounds is satisfactory since the gap is close to zero.

Application to mean-variance-portfolio optimization
In this section we apply our framework to the classical meanvariance-portfolio optimization problem. To this end we consider the historical mean return and variance of returns of ten assets based on ten time series. For assets i ∈ [1 : 5] a longer time series of 48 days is available while for assets i ∈ [6 : 10 ] only a short time series of 12 days is available. Hence, we consider the portion of the covariance matrix that involve the latter assets to be uncertain and we consider the model: min x ∈ T 5 μ x x + x xx x + E min y∈ P x μ y y + 2 x xy y + y yy y .
In this model, x , y ∈ R 5 are the respective shares of assets i ∈ [1 : 5] and i ∈ [6 : 10 ] in the portfolio, μ x , μ y are the historical mean returns and = xx xy xy yy is the historical covariance matrix of returns where we assume xy and yy to be uncertain. Despite the presence of a linear term, the above optimization problem can be rewritten as min x ∈ T 5 x ¯ xx x + E min y∈ P x 2 x ¯ xy y + y ¯ yy y , where ¯ xx := xx + 1 2 μ x e + e μ x , ¯ xy := xy + 1 2 μ x e + e μ y , ¯ yy := yy + 1 2 μ y e + e μ y .
We approximated the problem via the scenario approach, where the uncertain portions of the covariance matrix were generated from a elementwise normal distribution, i.e. ( ) i j ∼ N(( ˆ ) i j , σ ) with ˆ the empirical covariance matrix and σ = 1 . Scenarios with S ∈ { 10 0 , 50 0 , 10 0 0 , 50 0 0 , 10 0 0 0 } were generated. The following table reports the results, where we solved instances with different and large numbers of scenarios.
In our experiments, all instances were solved using merely the reduced conic lower bound, which shows that despite the presence of hard instances we tackled in previous sections, there are realworld applications where gaps can be closed confidently using the basic but novel strategies we propose. Note that the large numbers of scenarios we worked with would have been highly prohibitive for the traditional copositive approach, while our reduced conic approach allowed Mosek to solve the respective doublynonnegative relaxation within seconds.

Conclusions
In this paper, we have defined a new class of two-stage stochastic optimization problems, namely two-stage standard stochastic quadratic optimization problems, where the objective function is quadratic in the first and second stage decisions, the feasible region is a standard simplex and the goal is to minimize the expected quadratic cost function which is not required to be convex. Possible applications of this class of problems have been introduced to motivate the investigation. In order to address this difficult decision problem, different type of upper and lower bounds have been considered and compared with existing methods available in the literature. We proposed a scalable alternative to the copositive bound from Burer (2009) and numerical experiments show that they provide a good benefit in terms of computational resources without sacrificing solution quality. Further, we introduced a procedure for generating and improving upper bounds based on the Immunization-Infection-Dynamics algorithm.
However, when compared to the fmincon function of Matlab , we find that results are mixed. While improving solution quality slightly, these procedures incur a substantial cost in terms of computation time. However, future research might reveal improvements which remedy these shortcomings. For problems including a large number of scenarios, bounds based on partitioning the scenario tree in sub-groups have also been considered, where the individual sub-problems were solved using our scalable copositive bound. A systematic numerical study over different instances, along with an application example taken from real-world data showed the efficiency of the proposed bounds.