1 Introduction

Given a symmetric positive semidefinite matrix \(Q\in \mathbb {R}^{n\times n}\), vectors \(a,b\in \mathbb {R}^n\) and set \(Z\subseteq \{0,1\}^n\), consider the mixed-integer quadratic optimization (MIQO) problem with indicator variables

$$\begin{aligned} \min \;&a^\top x+b^\top z+\tfrac{1}{2}t \end{aligned}$$
(1a)
$$\begin{aligned} (\text {MIQO})\qquad \text {s.t.}\;&x^\top Q x\le t \end{aligned}$$
(1b)
$$\begin{aligned}&x_i(1-z_i)=0, \ i=1,\dots ,n \end{aligned}$$
(1c)
$$\begin{aligned}&x\in \mathbb {R}^n,\; z\in Z,\; t\in \mathbb {R}, \end{aligned}$$
(1d)

and the associated mixed-integer nonlinear set

$$\begin{aligned} X=\left\{ (x,z,t)\in \mathbb {R}^n\times Z\times \mathbb {R}: t\ge x^\top Qx,\; x\circ (\varvec{e}-z)=0\right\} , \end{aligned}$$

where \(\varvec{e}\) denotes a vector of ones, and \(x\circ (\varvec{e}-z)\) is the Hadamard product of vectors x and \(\varvec{e}-z\). There has recently been an increasing interest in problem (1) due to its statistical applications: the nonlinear term (1b) is used to model a quadratic loss function, as in regression, while Z represents logical conditions on the support of the variables x. For example, given model matrix \(F \in \mathbb {R}^{m\times n}\) and responses \(\beta \in \mathbb {R}^m\), setting \(a=-\beta ^\top F\), \(Q=F^\top F\), \(b=0\) and \(Z=\left\{ z\in \{0,1\}^n:\sum _{i=1}^nz_i\le r\right\} \) in (1) is equivalent to the best subset selection problem with a given cardinality r [10, 16]:

$$\begin{aligned} \min _{x, z} \; \Vert \beta - F x\Vert _2^{2} \quad \text {s.t.} \quad x\circ (\varvec{e}-z)=0, \; \sum _{i=1}^{n} z_i \le r. \end{aligned}$$
(2)

Other constraints defining Z that have been considered in statistical learning applications include multicollinearity [10], cycle prevention [28, 30], and hierarchy [12]. Set X arises as a substructure in many other applications, including portfolio optimization [13], optimal control [21], image segmentation [26], signal denoising [9].

A critical step toward solving MIQO effectively is to convexify the set X. Indeed, the mixed-integer optimization problem (1) is equivalent to the convex optimization problem

$$\begin{aligned} \min _{x,z,t} \bigg \{ a^\top x+b^\top z+\tfrac{1}{2}t \, \ (x,z,t)\in \text {cl conv}(X) \bigg \}, \end{aligned}$$

where \({\text {conv}}(X)\) denotes the convex hull of X and \(\text {cl conv}(X)\) is the closure of \({\text {conv}}(X)\). However, problem MIQO is \(\mathcal{N}\mathcal{P}\)-hard even if \(Z=\{0,1\}^n\) [15]. Thus, a simple description of \(\text {cl conv}(X)\) is, in general, not possible unless \(\mathcal{N}\mathcal{P}\)= Co-\(\mathcal{N}\mathcal{P}\).

In practice, one aims to obtain a good convex relaxation of X, which can then be used either as a standalone method (as is pervasively done in the machine learning literature), to obtain high-quality solutions via rounding, or in a branch-and-bound framework. Nonetheless, it is unclear how to determine whether a given relaxation is good or not. In mixed-integer linear optimization, it is well-understood that facet-defining inequalities give strong relaxations. However, in MIQO (and, more generally, in mixed-integer nonlinear optimization problems), \(\text {cl conv}(X)\) is not a polyhedron and there is no consensus on how to design good convex relaxations, or even what a good relaxation should be.

An important class of convex relaxations of X that has received attention in the literature is obtained by decomposing matrix \(Q=\sum _{i=1}^\ell \Gamma _i+R\), where \(\Gamma _i\succeq 0\), \(i=1,\dots ,\ell \), are assumed to be “simple" and \(R\succeq 0\). Then

$$\begin{aligned} t\ge x^\top Qx\Longleftrightarrow t \ge \sum _{i=1}^\ell \tau _i+x^\top Rx\text {, and } \tau _i\ge x^\top \Gamma _ix,\; \forall i\in \{1,\dots ,\ell \}, \end{aligned}$$
(3)

and each constraint \(\tau _i\ge x^\top \Gamma _i x\) is replaced with a system of inequalities describing the convex hull of the associated “simple" mixed-integer set. This idea was originally used in [19], where \(\ell =n\), \((\Gamma _i)_{ii}=d_i>0\) and \((\Gamma _i)_{jk}=0\) otherwise, and constraints \(\tau _i\ge d_i x_i^2\) are strengthened using the perspective relaxation [1, 18, 22], i.e., reformulated as \(z_i\tau _i\ge d_i x_i^2\). Similar relaxations based on separable quadratic terms were considered in [17, 35]. A generalization of the above approach is rank-one decomposition, which lets \(\Gamma _i=h_ih_i^\top \) be a rank-one matrix [5, 6, 33, 34]; in this case, letting \(S_i=\left\{ i\in [n]:h_i\ne 0\right\} \), constraints \(\left( \sum _{j\in S_i}z_j\right) \tau _i\ge (h_i^\top x)^2\) can be added to the formulation. Alternative generalizations of perspective relaxation that have been considered in the literature include exploiting substructures based on \(\Gamma _i\) where non-zeros are \(2\times 2\) matrices [4, 7, 8, 20, 24, 27] or tridiagonal [29].

Convexifications based on decomposition (3) have proven to be strong computationally, and are attractive from a theoretical perspective. The fact that a given formulation is ideal for the substructure \(\tau _i \ge x^\top \Gamma _ix\) lends some theoretical weight to the strength of the convexification. However, approaches based on decomposition (3) have fundamental limitations as well. First, they require computing the convex hull description of a nonlinear mixed-integer set to establish (theoretically) the strength of the relaxation, a highly non-trivial task that restricts the classes of matrices \(\Gamma _i\) that can be used. Second, even if the ideal formulation for the substructure \(\tau _i \ge x^\top \Gamma _i x\) is available, the convexification based on such decomposition can still be a poor relaxation of X—and there is currently no approach to establish the strength of the relaxation without numerical computations. Third, it is unclear whether the structure of the relaxations induced by (3) matches the structure of \(\text {cl conv}(X)\), or if they are overly simple or complex.

1.1 Contributions and outline

In this paper, we close the aforementioned gaps in the literature by characterizing the structure of \(\text {cl conv}(X)\). First, in Sect. 2, we review relevant background for the paper. In Sect. 3, we show that \(\text {cl conv}(X)\) can be described in a compact extended formulation with \({\mathcal {O}}(n^2)\) additional variables with linear constraints and an \((n+1) \times (n+1)\) positive semidefiniteness constraint. In particular, convexification of X in this extended formulation reduces to describing a base polytope. We use the vertex description of this base polytope, which is exponential in general. However, we show that the set of vertices can be represented as the feasible points of a compact mixed-integer linear formulation (Sect. 5). In Sect. 4, we characterize \(\text {cl conv}(X)\) in the original space of variables. While the resulting description has an infinite number of conic quadratic constraints, we show that \(\text {cl conv}(X)\) is finitely generated, and thus we establish which inequalities are necessary to describe \(\text {cl conv}(X)\)—in precisely the same manner that facet-defining inequalities are required to describe a polyhedron. We also establish a relationship between \(\text {cl conv}(X)\) and relaxations obtained from decompositions (3). In Sect. 5, we present a mixed-integer linear formulation of the MIQO problem using the theoretical results in Sect. 3. Finally, in Sect. 6 we conclude the paper with a few remarks.

We point out that, using standard disjunctive programming techniques [14, 20], it is possible to obtain a conic quadratic extended formulation of (1), although such representation typically requires adding \({\mathcal {O}}(|Z|n)\) number of variables and \({\mathcal {O}}(|Z|)\) nonlinear constraints. Since |Z| is often exponential in n, these formulations are in general impractical, and therefore their use has been restricted to small instances with \(n\le 2\) [4, 7, 20, 22, 24] or problems with special structures that admit a compact representation [23]. We argue that the convexifications in this paper are significantly more tractable: regardless of Z, we require only \({\mathcal {O}}(n^2)\) variables instead of \({\mathcal {O}}(|Z|n)\), and only one nonlinear conic constraint instead of \({\mathcal {O}}(|Z|)\). The major complexity of the proposed formulations in this paper is the exponential number of linear inequalities, which can be generated, as needed, using mature mixed-integer linear optimization techniques.

2 Notation and preliminaries

In this section, we first review the relevant background and introduce the notation used in the paper.

Definition 1

([31]) Given a matrix \(W\in \mathbb {R}^{p\times q}\), its pseudoinverse \(W^\dagger \in \mathbb {R}^{q\times p}\) is the unique matrix satisfying the four properties:

$$\begin{aligned} WW^\dagger W =W, \ \ W^\dagger WW^\dagger =W^\dagger ,\ \ (WW^\dagger )^\top =(WW^\dagger ),\ \ (W^\dagger W)^\top =W^\dagger W. \end{aligned}$$

Clearly, if W is invertible, then \(W^{-1}=W^\dagger \). It also readily follows from the definition that \((W^\dagger )^\dagger =W\).

We recall the generalized Schur complement, relating pseudoinverses and positive semidefinite matrices.

Lemma 1

([3]) Let \(W= \begin{pmatrix}W_{11}&{}W_{12}\\ W_{12}^\top &{} W_{22}\end{pmatrix}\), with symmetric \(W_{11}\in \mathbb {R}^{p\times p}\), symmetric \(W_{22}\in \mathbb {R}^{q\times q}\), and \(W_{12}\in \mathbb {R}^{p\times q}\). Then \(W\succeq 0\) if and only if \(W_{11}\succeq 0\), \(W_{11}W_{11}^\dagger W_{12}=W_{12}\) and \(W_{22}-W_{12}^\top W_{11}^\dagger W_{12}\succeq 0\).

Note that if \(W_{11}\succ 0\), then the second condition of Lemma 1 is automatically satisfied. Otherwise, this condition is equivalent to the system of equalities \(W_{11}U=W_{12}\) having a solution \(U\in \mathbb {R}^{p\times q}\).

Let \([n]=\{1,\dots ,n\}\). Throughout, we use the convention that \(x_i^2/z_i=0\) if \(x_i=z_i=0\) and \(x_i^2/z_i=+\infty \) if \(z_i=0\) and \(x_i\ne 0, i\in [n]\). For a vector \(a\in \mathbb {R}^{n}\), \(\Vert a\Vert _2\) and \(\Vert a\Vert _{\infty }\) denote the vector \(\ell _2\)-norm and the maximum absolute value among \(a_i\)’s, respectively. Given two matrices VW of matching dimensions, let \(\langle V,W\rangle =\sum _{i}\sum _jV_{ij}W_{ij}\) denote the usual inner product. Given a matrix \(W\in \mathbb {R}^{n \times n}\), let \(\text {Tr}(W)=\sum _{i=1}^nW_{ii}\) denote its trace, and let \(W^{-1}\) denote its inverse, if it exists. Let \(\Vert W\Vert _{F}\) and \(\Vert W\Vert _{\max }\)denote the Frobenius norm and the maximum absolute value of entries of W respectively, and \(\lambda _{\max }(W)\) means the maximum eigenvalue of W. We let \(\textrm{col}(W)\) denote the column space of matrix W. Given a matrix \(W\in \mathbb {R}^{n\times n}\) and \(S\subseteq [n]\), let \(W_S\in \mathbb {R}^{S\times S}\) be the submatrix of W induced by S, and let \({\hat{W}}_S\in \mathbb {R}^{n\times n}\) be the \(n\times n\) matrix obtained from \(W_S\) by filling the missing entries with zeros, i.e., matrices subscripted by S without “hat" refer to the lower-dimensional submatrices. For any two sets \(S, T \subset [n]\), let \(W_{S,T}\) denote the submatrix of W with rows in S and columns in T. Note that if matrix \(W\succ 0\), then it can be easily be verified from Definition 1 that the submatrix of \({\hat{W}}_S^{\dagger }\) indexed by S coincides with \(W_S^{-1}\), and \({\hat{W}}_S^{\dagger }\) is zero elsewhere; in this case, we abuse notation and write \({\hat{W}}_S^{-1}\) instead of \({\hat{W}}_S^{\dagger }\). Given \(S\subseteq [n]\), let \({\hat{\varvec{e}}}_S\in \{0,1\}^n\) be the indicator vector of S. We define \(\pi _S\) as the projection onto the subspace indexed by S and \(\pi _{S}^{-1}(x)\) as the preimage of x under \(\pi _S\).

Example 1

Let \(Q=\begin{pmatrix} d_1&{} b\\ b &{}d_2 \end{pmatrix}\) with \(d_1,d_2>0\) and \(d_1d_2>b^2\). Then

$$\begin{aligned}&{\hat{Q}}_\emptyset ^{-1}=\begin{pmatrix} 0&{} 0\\ 0 &{}0 \end{pmatrix},\;{\hat{Q}}_{\{1\}}^{-1}=\begin{pmatrix} 1/d_1&{} 0\\ 0 &{}0 \end{pmatrix},\; {\hat{Q}}_{\{2\}}^{-1}=\begin{pmatrix} 0&{} 0\\ 0 &{}1/d_2 \end{pmatrix},\text { and }\\&Q_{\{1,2\}}^{-1}=\frac{1}{d_1d_2-b^2}\begin{pmatrix} d_2&{} -b\\ -b &{}d_1 \end{pmatrix}. \end{aligned}$$

3 Convexification in an extended space

In this section, we describe \(\text {cl conv}(X)\) in an extended space. In Sect. 3.1, we provide a “canonical" representation of \(\text {cl conv}(X)\) under the assumption that \(Q\succ 0\). In Sect. 3.2, we provide alternative representations of \(\text {cl conv}(X)\), which can handle non-invertible matrices Q and may also lead to sparser formulations.

3.1 Canonical representation

Given \(Q\succ 0\), define the polytope \(P\subseteq \mathbb {R}^{n+n^2}\) as

$$\begin{aligned} P{\mathop {=}\limits ^{\text {def}}}{\text {conv}}\left( \left\{ ({\hat{\varvec{e}}}_S,{\hat{Q}}_S^{-1})\right\} _{{\hat{\varvec{e}}}_{S} \in Z}\right) . \end{aligned}$$

Proposition 1 below shows how to construct mixed-integer conic formulations of MIQO using polytope P.

Proposition 1

If \(Q\succ 0\), then the mixed-integer optimization model

$$\begin{aligned} \min _{x,z,W,t}\;&a^\top x+b^\top z+\tfrac{1}{2}t \end{aligned}$$
(4a)
$$\begin{aligned} \text {s.t.}\;&\begin{pmatrix}W &{} x\\ x^\top &{}t\end{pmatrix}\succeq 0 \end{aligned}$$
(4b)
$$\begin{aligned}&(z,W)\in P \end{aligned}$$
(4c)
$$\begin{aligned}&z\in \{0,1\}^n \end{aligned}$$
(4d)
$$\begin{aligned}&x\in \mathbb {R}^n, t\in \mathbb {R} \end{aligned}$$
(4e)

is a valid formulation of problem (1).

Proof

Consider a point (xztW) satisfying constraints (4b), (4c) with \(z={\hat{\varvec{e}}}_S\) for some \(\hat{\varvec{e}}_S\). Constraint (4c) is satisfied if and only if \(W={\hat{Q}}_S^{-1}\). Therefore, constraint (4b) reduces to

$$\begin{aligned} \begin{pmatrix} Q_S^{-1} &{} \textbf{0} &{} x_S\\ \textbf{0} &{} \textbf{0} &{} x_{[n]\setminus S}\\ x_{S}^\top &{} x_{[n]\setminus S}^\top &{} t \end{pmatrix}\succeq 0. \end{aligned}$$

Since the pseudoinverse of matrix \(W=\begin{pmatrix} Q_S^{-1} &{} \textbf{0}\\ \textbf{0} &{} \textbf{0}\end{pmatrix}\) is \(W^\dagger =\begin{pmatrix} Q_S &{} \textbf{0}\\ \textbf{0} &{} \textbf{0}\end{pmatrix}\), we find from Lemma 1 that constraint (4b) is satisfied if and only if:

  • \(W\succeq 0\), which is automatically satisfied.

  • \( WW^\dagger x=x\Leftrightarrow \begin{pmatrix} I &{} \textbf{0}\\ \textbf{0} &{} \textbf{0}\end{pmatrix}\begin{pmatrix}x_S\\ x_{[n]{\setminus } S}\end{pmatrix}=\begin{pmatrix}x_S\\ x_{[n]{\setminus } S}\end{pmatrix}\Leftrightarrow x_{[n]{\setminus } S}=0.\) Thus, condition \(WW^\dagger x=x\) simply enforces the complementarity constraints \(x\circ (\varvec{e}-z)=0\).

  • \(t\ge x^\top W^\dagger x\Leftrightarrow t\ge x_S^\top Q_Sx_S\), which is precisely the nonlinear constraint defining set X.

Now, it is clear that for any (xztW) satisfying constraints (4b), (4c), (4d), it holds \((x, z, t) \in X\). On the other hand, for any \((x, z, t) \in X\) with \(z = {\hat{\varvec{e}}}_S\) for some \(S \subset [n]\), we can always let \(W = {\hat{Q}}_{S}^{-1}\) and similarly, (xzWt) satisfies constraints (4b), (4c), (4d). \(\square \)

Note that condition \(WW^\dagger x=x\) is used to enforce the complementarity constraints. We point out that a similar idea was recently used in the context of low-rank optimization [11].

Now consider the convex relaxation of (4), obtained by dropping the integrality constraints \(z\in \{0,1\}^n\):

$$\begin{aligned} \min _{x,z,W,t}\;&a^\top x+b^\top z+\tfrac{1}{2}t \end{aligned}$$
(5a)
$$\begin{aligned} \text {s.t.}\;&4b,4c,4e. \end{aligned}$$
(5b)

Theorem 1

Let Q be a positive definite matrix. Then

$$\begin{aligned} \text {cl conv}(X) \; = \; \{(z, x, t) \in [0,1]^{n} \times \mathbb {R}^{n+1} \; | \; \exists W \in \mathbb {R}^{n \times n} \text {s.t.} \; 4b,4b\}. \end{aligned}$$

Consequently, that problem (5) has an optimal solution integral in z.

Proof

First observe that constraints (4b),(4c) define a closed convex set. Projecting out variable t, we find that problem (5) reduces to

$$\begin{aligned} \min _{x,z,W}\;&a^\top x+b^\top z+\tfrac{1}{2}x^\top W^\dagger x \end{aligned}$$
(6a)
$$\begin{aligned} \text {s.t.}\;&WW^\dagger x=x \end{aligned}$$
(6b)
$$\begin{aligned}&(z,W)\in P,\; x\in \mathbb {R}^n. \end{aligned}$$
(6c)

Note that this formulation uses the pseudoinverse of a matrix of variables. Observe that we omit the constraint \(W\succeq 0\). Since every extreme point \(({\bar{z}}, {\bar{W}})\) of P satisfies \({\bar{W}}\succeq 0\), it follows \((z,W)\in P\) already implies \(W\succeq 0\).

We argue that for any fixed \((z, W)\in P\), setting \(x=-Wa\) is optimal for (6). Using equality (6b), we replace the term \(a^\top x\) in the objective with \(a^\top WW^\dagger x\). Since the problem is convex in x, from KKT conditions we find that any point x satisfying

$$\begin{aligned}&WW^\dagger x=x \end{aligned}$$
(7a)
$$\begin{aligned}&\exists \lambda \in \mathbb {R}^n \text { s.t. } W^\dagger Wa+ W^\dagger x+\lambda ^\top ( W W^\dagger -I)=0 \end{aligned}$$
(7b)

is optimal. In particular, setting \(x=-Wa\), we find that (7b) is satisfied with \(\lambda =0\), and (7a) is satisfied since \(W W^\dagger x=- W W^\dagger Wa=- Wa=x\).

Substituting \(x=-Wa\) in the relaxed problem, we obtain

$$\begin{aligned} \min _{z,W}\;&-\tfrac{1}{2} a^\top Wa+b^\top z \end{aligned}$$
(8a)
$$\begin{aligned} \text {s.t.}\;&(z,W)\in P. \end{aligned}$$
(8b)

Since the objective \(-\frac{1}{2} \langle aa^\top ,W\rangle +b^\top z\) is linear in (zW) and P is a polytope, there exists an optimal solution \((z^*,W^*)\) that is an extreme point of P, and in particular there exists \({\hat{\varvec{e}}}_{S} \in Z\) such that \(z^*={\hat{\varvec{e}}}_S\) and \(W^*={\hat{Q}}_S^{-1}\). \(\square \)

Remark 1

The convexification for the case where Q is tridiagonal [29] is precisely in the form given in Theorem 1, where the polyhedron P is described with a compact extended formulation. \(\square \)

3.1.1 Bivariate quadratic functions

Consider set

$$\begin{aligned} X_{2\times 2}\!=\!\left\{ (x,z,t)\in \mathbb {R}^2\!\times \!\{0,1\}^n\!\times \!\mathbb {R}: t\ge d_1x_1^2-2x_1x_2+d_2x_2^2,\; x\!\circ \! (\varvec{e}-z)\!=\!0\right\} , \end{aligned}$$

where \(d_1d_2>1, d_1,d_2 > 0\). Set \(X_{2\times 2}\) corresponds (after scaling) to a generic strictly convex quadratic function of two variables. We now illustrate Theorem 1 by computing an extended formulation of \(\text {cl conv}(X_{2\times 2})\), that is, for \(Q= \begin{pmatrix}d_1&{}-1\\ -1&{}d_2\end{pmatrix}\). Let \(\Delta :=d_1d_2-1 >0\) be the determinant of Q.

Proposition 2

The closure of the convex hull of \(X_{2\times 2}\) is

Proof

Polyhedron P is the convex hull of the four points given in Table 1.

Note that equalities \(W_{11}=\frac{1}{d_1}(z_1+W_{12})\) and \(W_{22}=\frac{1}{d_2}(z_2+W_{12})\) are valid. Letting \(w=W_{12}\) and projecting out variables \(W_{11}\) and \(W_{22}\), we find that

$$\begin{aligned} W=\begin{pmatrix}\frac{1}{d_1}z_1 &{}0\\ 0&{}\frac{1}{d_2}z_2\end{pmatrix}+\begin{pmatrix}1/d_1&{}1\\ 1&{}1/d_2\end{pmatrix}w. \end{aligned}$$
(9)

Also note that \(w = \frac{1}{\Delta } \min \{z_1, z_2\}\), and the convex hull of \(\big \{(z_1, z_2, w) \in \{0,1\}^2 \times \mathbb {R}\; | \; w = \frac{1}{\Delta } \min \{z_1, z_2\} \big \}\) is described by the following inequalities:

(10)

Then, (9) and (10) describe the polyhedron P. \(\square \)

Conic quadratic disjunctive programming representations of \(\text {cl conv}(X_{2\times 2})\) have been used in the literature [4]; explicit representations of \(\text {cl conv}\left( X_{2\times 2}\cap \{(x,z,t):x\ge 0\}\right) \) in the original space of variables have been given [8, 24], and descriptions of the rank-one case \(d_1d_2=1\) were given in [5]. A description of \(\text {cl conv}\left( X_{2 \times 2} \cap \{(x,z,t):\ell \le x \le u\}\right) \) in a conic quadratic extended formulation is given in [20] via disjunctive programming. This formulation can be easily adapted to the case with no bounds (considered here), and requires three additional variables and three conic quadratic constraints. In Proposition 2, we give an alternative description of \(\text {cl conv}(X_{2 \times 2})\) using three additional variables, a compact \(3 \times 3\) positive semidefinite constraint, and linear inequalities.

Remark 2

Since P is not full-dimensional, we require only one additional variable w (instead of three) for conic representation of \(\text {cl conv}(X_{2\times 2})\) via the constraints \(0\le z\le 1\), (10), and

$$\begin{aligned} \begin{pmatrix}(1/d_1)(z_1+w)&{}w&{}x_1\\ w&{}(1/d_2)(z_2+w)&{}x_2\\ x_1&{}x_2&{}t\end{pmatrix}\succeq 0. \end{aligned}$$

\(\square \)

Table 1 Extreme points of P corresponding to set \(X_{2\times 2}\)

Remark 3

The matrix representation (9) suggests an interesting connection between \(\text {cl conv}(X_{2\times 2})\) and McCormick envelopes. Indeed, from Table 1, we see that

Moreover, the usual McCormick envelopes of the bilinear term \(z_1z_2\), given by \(\max \{0,-1+z_1+z_2\}\le z_1z_2\le \min \{z_1,z_2\}\), are sufficient to characterize the convex hull. \(\square \)

3.1.2 Quadratic functions with “choose-one" constraints

Given \(Q\succ 0\), consider set

$$\begin{aligned} X_{C1}\!=\!\left\{ \!(x,z,t)\in \mathbb {R}^n\!\times \!\{0,1\}^n\!\times \mathbb {R}: t\ge x^\top Qx,\; x\circ (\varvec{e}-z)=0,\;\sum _{i=1}^nz_i\le 1\right\} \cdot \end{aligned}$$

Set \(X_{C1}\) arises, for example, in regression problems with multicollinearity constraints [10]: given a set J of features that are collinear, constraints \(\sum _{i\in J}z_i\le 1\) are used to ensure that at most one such feature is chosen.

The closure of the convex hull of \(X_{C1}\) is [see, e.g., 20, 33]

$$\begin{aligned} \text {cl conv}(X_{C1})=&\Bigg \{(x,z,t)\in \mathbb {R}^{n}\times \mathbb {R}_+^n\times \mathbb {R}:t\ge \sum _{i=1}^n Q_{ii}x_i^2/z_i,\; \sum _{i=1}^nz_i\le 1 \Bigg \} \cdot \end{aligned}$$

We now give an alternative derivation of this result using our technique. Polyhedron P is the convex hull of \(n+1\) points: point (0, 0) and points \(\{({\hat{\varvec{e}}}_{\{i\}},{\hat{Q}}_{\{i\}}^{-1})\}_{i=1}^n\). It can easily be seen that P is described by constraints \(W_{ij}=0\) whenever \(i\ne j\), \(W_{ii}=z_i/Q_{ii}\) for \(i\in [n]\), and constraints \(z\ge 0\), \(\sum _{i=1}^n z_i\le 1\). In particular, constraint (4b) reduces to

$$\begin{aligned}&\begin{pmatrix}z_1/Q_{11}&{}0&{}\dots &{}0&{}x_1\\ 0&{}z_2/Q_{22}&{}\dots &{}0&{}x_2\\ \vdots &{}\vdots &{}\ddots &{}\vdots &{}\vdots \\ 0&{}0&{}0&{}z_n/Q_{nn}&{}x_n\\ x_1&{}x_2&{}\dots &{}x_n&{}t \end{pmatrix}\succeq 0,\end{aligned}$$

which by Lemma 1 is equivalent to

$$\begin{aligned} t\ge \sum _{i=1}^n Q_{ii}x_i^2/z_i,\; z_i/Q_{ii}\ge 0, \end{aligned}$$

and \(x_i = 0\) if \(z_i/Q_{ii}=0, \; \forall i \in [n]\). Note that the second condition is the complementarity constraint, which is already included in the constraint \(t\ge \sum _{i=1}^n Q_{ii}x_i^2/z_i\) (since \(z_i = 0\) and \(x_i > 0\) implies \(\frac{x_i^2}{z_i} = + \infty \)).

3.2 Factorable representation

A (possibly low-rank) matrix \(Q\in \mathbb {R}^{n\times n}\) is positive semidefinite if and only if there exists some \(F\in \mathbb {R}^{n\times k}\) such that \(Q=FF^\top \). Then, letting \(u=F^\top x\), one can rewrite \(x^\top Qx\) as \(x^\top FF^\top x=u^\top u\). Matrix F may be immediately available when formulating the problem, or may be obtained through a Cholesky decomposition or eigendecomposition of Q. Such a factorization is often employed by solvers, since it results in simpler (separable) nonlinear terms, and in many situations matrix F is sparse as well. In this section, we discuss representations of \(\text {cl conv}(X)\) amenable to such factorizations of Q. While the proofs of the propositions of this section are similar to those in Sect. 3.1, additional care is required to handle unbounded problems (1) arising from a rank-deficient Q.

Given \(F\in \mathbb {R}^{n\times k}\), define \(F_S\in \mathbb {R}^{S\times k}\) as the submatrix of F corresponding to the rows indexed by S, and let \({\hat{F}}_S\in \mathbb {R}^{n\times k}\) be the matrix obtained by filling the missing entries with zeros. Define the polytope \(P_F\subseteq \mathbb {R}^{n+k^2}\) as

$$\begin{aligned} P_F={\text {conv}}\left( \left\{ ({\hat{\varvec{e}}}_S,{\hat{F}}_S^\dagger {\hat{F}}_S)\right\} _{{\hat{\varvec{e}}}_{S} \in Z}\right) \cdot \end{aligned}$$

Remark 4

For any \(S\subseteq [n]\), matrix \({\hat{F}}_S^\dagger {\hat{F}}_S\) is an orthogonal projection matrix (symmetric and idempotent), and in particular \(({\hat{F}}_S^\dagger {\hat{F}}_S)^\dagger ={\hat{F}}_S^\dagger {\hat{F}}_S\). These properties can be easily verified from Definition 1. Since all eigenvalues of an orthogonal projection matrix are either 0 or 1, it also follows that \({\hat{F}}_S^\dagger {\hat{F}}_S\succeq 0\). \(\square \)

Proposition 3

If \(Q=FF^\top \), then the mixed-integer optimization model

$$\begin{aligned} \min _{x,z,W,t}\;&a^\top x+b^\top z+\tfrac{1}{2}t \end{aligned}$$
(11a)
$$\begin{aligned} \text {s.t.}\;&\begin{pmatrix}W &{} F^\top x\\ x^\top F&{}t\end{pmatrix}\succeq 0 \end{aligned}$$
(11b)
$$\begin{aligned}&(z,W)\in P_F \end{aligned}$$
(11c)
$$\begin{aligned}&z\in \{0,1\}^n,\; x\circ (\varvec{e}-z)=0 \end{aligned}$$
(11d)
$$\begin{aligned}&x\in \mathbb {R}^n, t\in \mathbb {R} \end{aligned}$$
(11e)

is a valid formulation of problem (1).

Proof

Consider a point \((x,z,t)\in X\) with \(z={\hat{\varvec{e}}}_S\) for some \({\hat{\varvec{e}}}_{S} \in Z\). Constraint (11d) is trivially satisfied. Constraint (11c) is satisfied if and only if \(W={\hat{F}}_S^{\dagger }{\hat{F}}_S\). Note that in any feasible solution, \(x_i=0\) whenever \(i\not \in S\), and in particular \(F^\top x={\hat{F}}_{S}^\top x\). From Lemma 1, we find that constraint (11b) is satisfied if and only if (recall properties in Remark 4):

  • \({\hat{F}}_S^{\dagger }{\hat{F}}_S \succeq 0\), which is automatically satisfied.

  • \( {\hat{F}}_S^{\dagger }{\hat{F}}_S({\hat{F}}_S^{\dagger }{\hat{F}}_S)^\dagger F^\top x=F^\top x.\) We find that

    $$\begin{aligned} {\hat{F}}_S^{\dagger }{\hat{F}}_S({\hat{F}}_S^{\dagger }{\hat{F}}_S)^\dagger {\hat{F}}_S^\top x={\hat{F}}_S^{\dagger }{\hat{F}}_S{\hat{F}}_S^{\dagger }{\hat{F}}_S {\hat{F}}_S^\top x={\hat{F}}_S^{\dagger }{\hat{F}}_S {\hat{F}}_S^\top x={\hat{F}}_S^{^\top }({\hat{F}}_S^\dagger )^\top {\hat{F}}_S^\top x={\hat{F}}_S^\top x, \end{aligned}$$

    and, therefore, this condition is satisfied as well.

  • \(t\ge x^\top F W^\dagger F^\top x\Leftrightarrow t\ge x_S^\top {\hat{F}}_S ({\hat{F}}_S^\dagger {\hat{F}}_S)^\dagger {\hat{F}}_S^\top x_S=x_S^\top {\hat{F}}_S {\hat{F}}_S^\dagger {\hat{F}}_S{\hat{F}}_S^\top x_S=x_S^\top {\hat{F}}_S{\hat{F}}_S^\top x_S\), which is precisely the nonlinear constraint defining set X and is thus satisfied.

\(\square \)

While the proofs of Propositions 1 and 3 are similar in spirit, we highlight a critical difference. In the proof of Proposition 1, with the assumption \(Q\succ 0\), constraints \(WW^\dagger x=x\) enforce the complementarity constraints \(x\circ (\varvec{e}-z)=0\), and therefore, such constraints are excluded in (4). In contrast, in the proof of Proposition 3, with Q potentially of low-rank, constraints \(WW^\dagger F^\top x=F^\top x\) alone are not sufficient to enforce \(x\circ (\varvec{e}-z)=0\), and therefore, they are included in (11) and are used to prove the validity of the mixed-integer formulation. Indeed, if there exist \({\hat{\varvec{e}}}_{S} \in Z\) and \({\bar{x}}\in \mathbb {R}^{n}\) such that \({\bar{x}}_S\ne 0\), \({\bar{x}}_{[n]\setminus S}=0\) and \(F^\top {\bar{x}}=0\), then for any \((x,z,t)\in X\) we find that

$$\begin{aligned} \lim _{\lambda \rightarrow 0^+}(1-\lambda )(x,z,t)+\lambda ((1/\lambda ){\bar{x}},{\hat{\varvec{e}}}_S,0)=(x+{\bar{x}}, z,t)\in \text {cl conv}(X). \end{aligned}$$

In particular, the point \((x+{\bar{x}},z,t)\), which may not satisfy the complementarity constraints, cannot be separated from \(\text {cl conv}(X)\), or any closed relaxation. On the other hand, if matrix Q is full-rank, then \(F^\top {\bar{x}}=0\implies {\bar{x}}=0\) (as shown in the proof of Proposition 1); therefore, the complementarity constraints are enforced by the conic constraint.

Recall that \(\pi _S: \mathbb {R}^{n} \rightarrow \mathbb {R}^{S}\) is the projection onto the subspace indexed by S. Now we consider the natural convex relaxation of (11) by dropping constraint (11d), and show that it is ideal under certain technical conditions over F and the set Z, as stated in Theorem  2 below.

Theorem 2

Let \(Q = F F^{\top }\), where \(F \in \mathbb {R}^{n \times k}\) is a full-column rank matrix satisfying \(\textrm{col}(F) = \bigcap _{{\hat{\varvec{e}}}_{S} \in Z} \pi _S^{-1}(\textrm{col}(F_S))\). Then

$$\begin{aligned} \text {cl conv}(X) \; = \; \{(z, x, t) \in [0,1]^{n} \times \mathbb {R}^{n+1} \; | \; \exists W \in \mathbb {R}^{k \times k} \; \text {s.t.} \ 11b, 11c\}. \end{aligned}$$

Proof

Clearly, constraints (11b),(11c) define a closed convex set. Consider the two optimization problems:

$$\begin{aligned} \min \quad&a^{\top } x + b^{\top } z + \tfrac{1}{2}t \end{aligned}$$
(12a)
$$\begin{aligned} \text {s.t.} \; \;&(x,z,t) \in \text {cl conv}(X), \end{aligned}$$
(12b)

and

$$\begin{aligned} \min \quad&a^{\top } x + b^{\top } z + \tfrac{1}{2} t \end{aligned}$$
(13a)
$$\begin{aligned} \text {s.t.} \; \;&\begin{pmatrix}W&{} F^\top x \\ x^\top F &{}t\end{pmatrix} \succcurlyeq {\textbf{0}}, \end{aligned}$$
(13b)
$$\begin{aligned}&(z,W)\in P_F,\; x\in \mathbb {R}^n, t\in \mathbb {R}. \end{aligned}$$
(13c)

It suffices to show that problem (12) and (13) always attain the same optimal value. Consider the following two cases:

  • \(F F^{\dagger } a \ne a\): In other words, a is not in the column space of F, i.e., \(a \notin \textrm{col}(F)\). In this case, by the condition \(\textrm{col}(F) = \bigcap _{{\hat{\varvec{e}}}_{S} \in Z} \pi _{S}^{-1}(\textrm{col}(F_S))\), there exists one \({\hat{\varvec{e}}}_{S} \in Z\) such that \(a_{S} \notin \textrm{col}(F_S)\). Then, let z be such that \(z_i = 1, \; \forall i \in S\). Since \(a_{S} \notin \textrm{col}(F_{S})\), there exists x such that \(x_i = 0\) for all \(i \in [n]\backslash S\), \(x_{S}\) is in the orthogonal complement of \(F_S\) and \(a_S^\top x_S < 0\). Clearly, z and x satisfy the constraint \(x_i (1 - z_i) = 0\) for all \(i = 1, \dots ,n\). Complementarity holds for \(\lambda x\) for \(\lambda > 0\) as well. Since, by construction, \(x^{\top } F F^{\top } x = 0\), the objective \(b^\top z + \lambda \langle a, x \rangle + \lambda ^2 (x^{\top } F F^{\top } x)\) tends to \(-\infty \) for \((\lambda x, z)\) as \(\lambda \rightarrow \infty \). Thus problem (12) is unbounded and since problem (13) is a convex relaxation of (12), problem (13) is unbounded as well.

  • \(F F^{\dagger } a = a\): For problem (13), we can project out t using the relation

    $$\begin{aligned} \begin{pmatrix} W &{} F^\top x \\ x^{\top } F &{} t \end{pmatrix} \succcurlyeq 0 \quad \text {iff} \quad W W^{\dagger } F^{\top } x = F^{\top } x \; \; \text {and} \; \; t \ge x^{\top } F W^{\dagger } F^{\top } x. \end{aligned}$$

    Therefore, problem (13) is equivalent to

    $$\begin{aligned} \min \quad&a^{\top } x + b^{\top } z + \tfrac{1}{2} x^{\top } F W^{\dagger } F^{\top } x \end{aligned}$$
    (14a)
    $$\begin{aligned} \text {s.t.} \; \;&W W^{\dagger } F^{\top } x = F^{\top } x \end{aligned}$$
    (14b)
    $$\begin{aligned}&(z,W)\in P_F,\; x\in \mathbb {R}^n. \end{aligned}$$
    (14c)

Since \(F F^{\dagger } a = a\), we can write \(a^{\top } x = (F^{\dagger } a)^{\top } F^{\top } x\). Define \({\tilde{a}} = F^{\dagger } a\), then \(a^{\top } x = {\tilde{a}}^{\top } F^{\top } x\). Substituting \(F^{\top } x\) with a new variable \(u \in \mathbb {R}^{k}\) and since F has full column rank, problem (14) is equivalent to

$$\begin{aligned} \min \quad&b^{\top } z + {\tilde{a}}^{\top } u + \tfrac{1}{2} u^{\top } W^{\dagger } u \end{aligned}$$
(15a)
$$\begin{aligned} \text {s.t.} \; \;&W W^{\dagger } u = u \end{aligned}$$
(15b)
$$\begin{aligned}&(z,W)\in P_F, u\in \mathbb {R}^k. \end{aligned}$$
(15c)

Using identical arguments as in the proof of Theorem 1, we find that there exists \({\hat{\varvec{e}}}_{S}\in Z\) such that \((u^*,z^*,W^*)=(-{\hat{F}}_S^\dagger {\hat{F}}_S {\tilde{a}}, {\hat{\varvec{e}}}_S, {\hat{F}}_S^\dagger {\hat{F}}_S)\) is optimal for (15). We now construct an optimal solution for (14). Let \(x^*\) be defined as \(x_S^*=-(F_{S}^{\dagger })^{\top } F_{S}^{\dagger } a_{S}\) and \(x_{[n]\setminus S}^*=0\), and observe that \((x^*,z^*)\) is feasible for (12), with objective \(\sum _{i \in S} b_i - \frac{1}{2} \Vert F_{S}^{\dagger } a_{S} \Vert _2^2\). Substituting \(W^{*} = {\hat{F}}_{S}^{\dagger } {\hat{F}}_{S}\), the optimal value of problem (13) equals \(\sum _{i \in S} b_i - \frac{1}{2} \Vert F_{S}^{\dagger } F_{S} F^{\dagger } a \Vert _2^2\). Note that both \(\alpha _1 = F^{\dagger } a\) and \(\alpha _2 = F_{S}^{\dagger } a_{S}\) satisfy the equation \(F_{S} \alpha = a_{S}\) and thus \(\alpha _1 - \alpha _2\) is orthogonal to the row space of \(F_{S}\) which means \(F_{S}^{\dagger } F_{S} \alpha _1 = F_{S}^{\dagger } F_{S} \alpha _2 = \alpha _2\). Hence, we conclude that the optimal values of problem (12) and problem (13) coincide. \(\square \)

Remark 5

From the first case analysis of the proof of Theorem 2, one sees that the technical condition \(\textrm{col}(F) = \bigcap _{{\hat{\varvec{e}}}_{S} \in Z} \pi _S^{-1}(\textrm{col}(F_S))\) is equivalent to stating that the mixed-integer optimization problem and the proposed convex relaxation are unbounded at the same time. The condition is automatically satisfied if \(\varvec{e}\in Z\). Moreover, if matrix Q is rank-one, then this condition is equivalent to the nondecomposability condition on Z given in [34]. If it fails to hold, the convexification presented is still valid but may be weak: the convex relaxation may be unbounded even if the mixed-integer optimization problem is bounded. We provide an example illustrating this phenomenon in Sect. 3.2.3. \(\square \)

Remark 6

An immediate consequence of Theorem 2 is that if matrix Q is rank-deficient, i.e., \(k<n\), then the extended formulation describing \(\text {cl conv}(X)\) is simpler than the full rank case, i.e., it has fewer additional variables and lower-dimensional conic constraints. \(\square \)

We now illustrate Theorem 2 by providing an alternative proof of the main result of [5] using our unifying framework.

3.2.1 Rank-one quadratic functions

Consider the rank-one set

$$\begin{aligned} X_{R1}=\left\{ (x,z,t)\in \mathbb {R}^n\times \{0,1\}^n\times \mathbb {R}: t\ge \left( h^\top x\right) ^2,\; x\circ (\varvec{e}-z)=0\right\} , \end{aligned}$$

where we assume \(h_i\ne 0\) for all \(i\in [n]\).

Proposition 4

([5]) The closure of the convex hull of \(X_{R1}\) is

$$\begin{aligned} \text {cl conv}(X_{R1})=&\Bigg \{(x,z,t)\in \mathbb {R}^{2n+1}: \begin{pmatrix}\min \{1,\varvec{e}^\top z\}&{}h^\top x\\ h^\top x&{}t\end{pmatrix}\succeq 0,\;0\le z\le \varvec{e}\Bigg \}. \end{aligned}$$

Proof

In the case of a rank-one function, we have \(F=h\) and \(W \in \mathbb {R}^1\). Note that the pseudoinverse of vector \({\hat{h}}_S\) is given by

$$\begin{aligned} {\hat{h}}_S^\dagger ={\left\{ \begin{array}{ll}0&{}\text {if }{\hat{h}}_S=0\\ {\hat{h}}_S^\top /({\hat{h}}_S^\top {\hat{h}}_S)&{}\text {otherwise,}\end{array}\right. } \end{aligned}$$

and, in particular, we find that \({\hat{h}}_S^\dagger {\hat{h}}_S=1\) if \(S\ne \emptyset \), and \({\hat{h}}_S^\dagger {\hat{h}}_S=0\) otherwise. Thus, \({\hat{h}}_S^\dagger {\hat{h}}_S=\max \{z_1,\dots ,z_n\}\), and \(P_F\) is described by the linearization \(0\le W\le \min \{1,\varvec{e}^\top z\}\). Projecting out variable W, we arrive at the result. \(\square \)

We discuss generalizations of \(X_{R1}\) with arbitrary constraints on the indicator variables in Sect. 4.

3.2.2 An example with a rank-two quadratic function

In order to illustrate how convexification methods for polyhedra can be directly utilized to convexify the mixed-integer nonlinear set X, we consider a special rank-two quadratic function with three variables and the associated set

$$\begin{aligned} X_{3}=\left\{ (x,z,t)\in \mathbb {R}^3\times \{0,1\}^3\times \mathbb {R}: t\ge (x_1+x_2+x_3)^2+x_3^2,\; x\circ (\varvec{e}-z)=0\right\} . \end{aligned}$$

In this case, \(Q = F F^\top \) with \(F^\top = \begin{pmatrix}1 &{} 1 &{}1 \\ 0&{}0 &{} 1 \end{pmatrix}\). The extreme points of \(P_F\) are given in Table 2. Using PORTA [32] to switch from the extreme point representation of \(P_F\) to its facial description, we obtain the closure of the convex hull of \(X_{3}\):

Table 2 Extreme points of \(P_F\) corresponding to set \(X_3\)

3.2.3 An example where the technical condition fails

Consider the set

$$\begin{aligned} X_{R1}^{C1} \!=\!\left\{ \!(x,z,t)\in \mathbb {R}^n \! \times \! \{0,1\}^n \! \times \! \mathbb {R}: t\ge \left( h^\top x\right) ^2, x\circ (\varvec{e}-z)=0, \sum _{i=1}^n z_i\le 1\right\} \end{aligned}$$

with \(h_i\ne 0\) for \(i\in [n]\). In this case, \(F=h\) and \(\textrm{col}(F_{\{i\}})=\mathbb {R}\) and \(\pi _S^{-1}(\textrm{col}(F_{\{i\}}))=\mathbb {R}^n\). Thus, \( \bigcap _{{\hat{\varvec{e}}}_{S}\in Z} \pi _S^{-1}(\textrm{col}(F_S))=\mathbb {R}^n\), while \(\textrm{col}(F) = \{x\in \mathbb {R}^n: x=\lambda h \text { for some }\lambda \in \mathbb {R}\}\), and the technical assumption is not satisfied.

The relaxation induced by (11b), (11c), (11e), which is constructed as outlined in Proposition 4, results in the set induced by bound constraints \(0\le z\le 1\), \(\varvec{e}^\top z\le 1\) and \(t\ge (h^\top x)^2/(\varvec{e}^\top z)\). Moreover, the corresponding optimization problem

$$\begin{aligned} \min _{x,z}\;a^\top x+ b^\top z+(h^\top x)^2/(\varvec{e}^\top z) \text { s.t. }\varvec{e}^\top z\le 1,\; x\in \mathbb {R}^n,\; z\in [0,1]^n \end{aligned}$$

is unbounded unless \(a\in \textrm{col}(F)\).

In contrast, \(\text {cl conv}(X_{R1}^{C1})\) is described via constraint \(t\ge \sum _{i=1}^n h_i^2x_i^2/z_i\) [33, 34] (similar to the result described in Sect. 3.1.2), and the corresponding optimization problem is always bounded.

4 Convexification in the original space

We now turn our attention to describing \(\text {cl conv}(X)\) in the original space of variables. The discussion of this section is based on projecting out the matrix variable W in the canonical description of \(\text {cl conv}(X)\) given in Theorem 1 for \(Q\succ 0\). Identical arguments hold for the representation in Theorem 2 for low-rank matrices.

Suppose that a minimal description of polyhedron P is given by the facet-defining inequalities

$$\begin{aligned} \langle \Gamma _i,W \rangle -\gamma _i^\top z\le \beta _i, \quad i=1,\dots ,m_1,\end{aligned}$$
(16)

and equalities

$$\begin{aligned}\langle \Gamma _i,W \rangle -\gamma _i^\top z= \beta _i, \quad i=m_1+1,\dots ,m,\end{aligned}$$

where \(\Gamma _i\in \mathbb {R}^{n \times n},\) \(\beta _i\in \mathbb {R}\) and \(\gamma _i\in \mathbb {R}^n\). Theorem 3 describes \(\text {cl conv}(X)\) in the original space of variables. Note that, in practice, a complete description of P may not be explicitly available, in which case one can use a partial description to derive valid inequalities.

Before we give the description in the original space, we define a set of feasible coefficients used to derive the inequalities. Let

$$\begin{aligned} {\mathcal {Y}} {\mathop {=}\limits ^{\text {def}}}\bigg \{y\in \mathbb {R}_+^{m_1}\times \mathbb {R}^{m-m_1}: \sum _{i=1}^m \Gamma _iy_i\succeq 0,\;\sum _{i=1}^m \text {Tr}(\Gamma _i)y_i\le 1 \bigg \}. \end{aligned}$$

Theorem 3

If \(Q\succ 0\), point \((x,z,t)\in \text {cl conv}(X)\) if and only if \(z\in {\text {conv}}(Z)\), \(t\ge 0\) and

$$\begin{aligned} t\ge \;&\frac{x^\top \left( \sum _{i=1}^m \Gamma _iy_i\right) x}{y^\top \beta +\left( \sum _{i=1}^my_i\gamma _i\right) ^\top z},&\forall y\in {\mathcal {Y}}, \end{aligned}$$
(17)

or equivalently,

$$\begin{aligned} t\ge \max _{y\in {\mathcal {Y}}}\;&\frac{x^\top \left( \sum _{i=1}^m \Gamma _iy_i\right) x}{y^\top \beta +\left( \sum _{i=1}^my_i\gamma _i\right) ^\top z} \cdot \end{aligned}$$
(18)

Proof

A point \((x,z,t)\in \text {cl conv}(X)\) if and only if

$$\begin{aligned} 0\ge \min _{W,\lambda } \;&\lambda \\ \text {s.t.}\;&\langle \Gamma _i,W \rangle \le \beta _i+\gamma _i^\top z, \ \ i=1,\dots ,m_1\\&\langle \Gamma _i,W \rangle = \beta _i+\gamma _i^\top z, \ \ i=m_1+1,\dots ,m\\&W-xx^\top /t+\lambda I\succeq 0, \ \lambda \ge 0. \end{aligned}$$

Strong duality holds since there exists \((z,W)\in P\) that satisfies the facet-defining inequalities strictly, and we can always increase \(\lambda \) to find a strictly feasible solution to the above minimization problem. Substituting \(V=W-xx^\top /t+\lambda I\), the optimization problem simplifies to

figure a
figure b

Letting \(y\in \mathbb {R}_+^{m_1}\times \mathbb {R}^{m-m_1}\) denote the dual variables, we find the equivalent representation

$$\begin{aligned} 0\ge \max _{y\in \mathbb {R}_+^{m_1}\times \mathbb {R}^{m-m_1}}\;&\sum _{i=1}^m y_i\left( -\beta _i-\gamma _i^\top z+\langle \Gamma _i, xx^\top /t\rangle \right) \end{aligned}$$
(20a)
$$\begin{aligned} \text {s.t.}\;&-\sum _{i=1}^m y_i\Gamma _i\preceq 0,\; \sum _{i=1}^m\text {Tr}(\Gamma _i)y_i\le 1. \end{aligned}$$
(20b)

In particular, inequality (20a) is valid for any fixed feasible y. Multiplying both sides of the inequality by t, we find the equivalent conic quadratic representation

$$\begin{aligned} t\left( y^\top \beta +\left( \sum _{i=1}^my_i\gamma _i\right) ^\top z\right) \ge \langle \sum _{i=1}^m y_i\Gamma _i,xx^\top \rangle . \end{aligned}$$
(21)

Note that validity of inequalities (21) implies that \(y^\top \beta +\left( \sum _{i=1}^my_i\gamma _i\right) ^\top z\ge 0\) for any primal feasible z and dual feasible y; dividing both sides of the inequality by \(y^\top \beta +\left( \sum _{i=1}^my_i\gamma _i\right) ^\top z\), the theorem is proven. \(\square \)

Note that even if inequalities (16) are not facet-defining or are insufficient to describe P, the corresponding inequalities (23) are still valid for \(\text {cl conv}(X)\).

We also state the analogous result for low-rank matrices, without proof, where \((\Gamma _i,\gamma _i,\beta _i), i\in [m]\) defines \(P_F\).

Theorem 4

Let \(Q = F F^{\top }\), where \(F \in \mathbb {R}^{n \times k}\) is a full-column rank matrix satisfying \(\textrm{col}(F) = \bigcap _{{\hat{\varvec{e}}}_{S} \in Z} \pi _S^{-1}(\textrm{col}(F_S))\). Then point \((x,z,t)\in \text {cl conv}(X)\) if and only if \(z\in {\text {conv}}(Z)\), \(t\ge 0\) and

$$\begin{aligned} t\ge \;&\frac{x^\top F \left( \sum _{i=1}^m \Gamma _iy_i\right) F^\top x}{y^\top \beta +\left( \sum _{i=1}^my_i\gamma _i\right) ^\top z},&\forall y\in {\mathcal {Y}}, \end{aligned}$$
(22)

or equivalently,

$$\begin{aligned} t\ge \max _{y\in {\mathcal {Y}}}\;&\frac{x^\top F \left( \sum _{i=1}^m \Gamma _iy_i\right) F^{\top }x}{y^\top \beta +\left( \sum _{i=1}^my_i\gamma _i\right) ^\top z} \cdot \end{aligned}$$
(23)

We now illustrate Theorem 3 for the set \(X_{2\times 2}\) discussed in Sect. 3.1.1.

Example 2

(Description of \(\text {cl conv}(X_{2\times 2})\) in the original space) From Proposition 2, we find that for \(X_{2\times 2}\), a minimal description of polyhedron P is given by the bound constraints \(0\le z\le 1\) and

figure c
figure d
figure e
figure f
figure g
figure h

Then, an application of Theorem 3 yields the inequality

$$\begin{aligned} t\ge \max _{y\in \mathbb {R}_+^6}\;&\frac{y_1x_1^2+y_2x_2^2+(-y_1/d_1-y_2/d_2-y_3-y_4+y_5+y_6)x_1x_2}{(1/\Delta )y_4+(y_1/d_1-y_4/\Delta +y_5/\Delta )z_1+(y_2/d_2-y_4/\Delta +y_6/\Delta )z_2} \end{aligned}$$
(24a)
$$\begin{aligned} \text {s.t.}\;&4y_1y_2\ge (-y_1/d_1-y_2/d_2-y_3-y_4+y_5+y_6)^2,\; y_1+y_2\le 1. \end{aligned}$$
(24b)

Note that variables \(y_1,y_2\) are originally free as dual variables for equality constraints, however, the nonnegativity constraints are imposed due to the positive definiteness constraint in \({\mathcal {Y}}\). In Appendix A we provide an independent verification that inequality (24) is indeed valid, and reduces to the quadratic inequality \(t\ge d_1x_1^2+d_2x_2^2-2x_1x_2\) at integral z. \(\square \)

From Theorem 3, we see that \(\text {cl conv}(X)\) can be described by an infinite number of fractional quadratic/affine inequalities (23). More importantly, the convex hull is finitely generated: the infinite number of quadratic and affine functions are obtained from conic combinations of a finite number of base matrices \(\Gamma _i\) and vectors \((\gamma _i,\beta _i)\), which correspond precisely to the minimal description of P. To solve the resulting semi-infinite problem in practice, one can employ a delayed cut generation scheme, where at each iteration, the problem with a subset of inequalities (22) is solved to obtain \(({\bar{x}},{\bar{z}})\). Then, the separation problem to find a maximum violated inequality (i.e., y) at \(({\bar{t}}, {\bar{x}},{\bar{z}})\), if it exists, is a convex optimization problem given by the inner maximization problem in (23).

Example 3

(Rank-one function with constraints) Given \(Z\subseteq \{0,1\}^{n}\), consider the set

$$\begin{aligned} X_{R1}^Z=\left\{ (x,z,t)\in \mathbb {R}^n\times Z\times \mathbb {R}: t\ge \left( h^\top x\right) ^2,\; x\circ (\varvec{e}-z)=0\right\} , \end{aligned}$$

that is, a rank-one function with arbitrary constraints on the indicator variables z defined by Z. As discussed in the proof of Proposition 4, \(P_F\subseteq \mathbb {R}^{n+1}\) with one additional variable \(W \in \mathbb {R}^1\) which, at integer points, is given by \(W=\max \{z_1,\dots ,z_n\}\). For simplicity, assume that \(0\in Z\), and that both \({\text {conv}}(Z)\) and \({\text {conv}}(Z{\setminus } \{0\})\) are full-dimensional. Finally, consider all facet-defining inequalities of \({\text {conv}}(Z\setminus \{0\})\) of the form \(\gamma _i^\top z\ge 1\) (that is, inequalities that cut off point 0), for \(i=1,\dots ,m\). Now consider the inequalities

$$\begin{aligned} W\le \gamma _i^\top z,\qquad \forall i\in [m]. \end{aligned}$$
(25)

First, observe that inequalities (25) are valid for \(P_F\): given \(z\in Z\), if \(z=0\), then \(W=0\); otherwise, \(z\in Z{\setminus }\{0\}\implies \gamma _i^\top z\ge 1=W\). Second, note that inequalities (25) are facet-defining for \(P_F\). Indeed, given \(i\in [m]\), consider the face \(Z_i=\{z\in {\text {conv}}(Z{\setminus } \{0\}): \gamma _i^\top z=1\}\) of \({\text {conv}}(Z\setminus \{0\})\): since \({\text {conv}}(Z\setminus \{0\})\) is full-dimensional and \(\gamma _i^\top z\ge 1\) is facet-defining, there are n affinely independent points \(\{z^j\}_{j=1}^n\) such that \(z^j\in Z_i\). Thus, we find that points \((z^j,1)_{j=1}^n\) and (0, 0) are (\(n+1\))-affinely independent points satisfying (25) at equality. Moreover, one can easily verify that inequality \(W\le 1\) is facet-defining as well. Thus, from (23) (adapted to the factorable representation discussed in Sect. 3.2), we conclude that the inequality

$$\begin{aligned} \displaystyle t\ge \max _{y\in \mathbb {R}_+^{m+1}}\;&\left\{ \frac{\left( \sum _{i=0}^m y_i\right) (h^\top x)^2}{y_0+\sum _{i=1}^my_i(\gamma _i^\top z)} \ \ \text {s.t. }\;\sum _{i=0}^m y_i\le 1 \right\} \end{aligned}$$
(26)

is valid for \(\text {cl conv}(X_{R1}^Z)\). Moreover, an optimal solution to optimization problem (26) corresponds to setting \(y_i=1\) for \(i\in \arg \min _{i\in [m]}\{\gamma _i^\top z\}\), and we conclude that inequalities \(t\ge (h^\top x)^2\) and \(t\ge (h^\top x)^2/(\gamma _i^\top z), i\in [m]\) are valid for \(\text {cl conv}(X_{R1}^Z)\). Indeed, as shown in [34], these inequalities along with \(z\in {\text {conv}}(Z)\) fully describe \(\text {cl conv}(X_{R1}^Z)\) (when a nondecomposability condition holds). \(\square \)

4.1 Connection with decomposition methods

From Theorem  3, we see that the convex hull, X, is obtained by adding conic quadratic inequalities \( t \ge \frac{x^\top \left( \sum _{i=1}^m \Gamma _iy_i\right) x}{y^\top \beta +\left( \sum _{i=1}^my_i\gamma _i\right) ^\top z}\) with simpler quadratic structure \(x^{\top } \Gamma _i x\) (corresponding to inequalities describing P). In particular, the intuition is similar to convexifications obtained from decompositions (3). We now show how the theory presented in this paper sheds light on the strength of the aforementioned decompositions.

Suppose inequalities (16), which we repeat for convenience:

$$\begin{aligned} \langle \Gamma _i,W \rangle -\gamma _i^\top z\le \beta _i, \quad i=1,\dots ,m, \end{aligned}$$
(27)

are valid for P and, additionally, \(\Gamma _i\succeq 0\) for all \(i\in [m]\). Since P is not full-dimensional in general, positive semidefiniteness conditions may not be as restrictive as they initially seem.

Example 4

(Description of \(\text {cl conv}(X_{2\times 2})\), continued) None of the matrices in the facets of P for \(\text {cl conv}(X_{2\times 2})\) given in Example 2 are positive semidefinite. Nonetheless, the inequalities below also describe P (we abuse notation and encode using variables y how each inequality is obtained):

figure i
figure j
figure k
figure l
figure m
figure n

In particular, the last two inequalities satisfy positive semidefiniteness. Moreover, the relaxation of the first two equalities obtained by replacing them with inequalities also satisfies positive semidefiniteness. Finally, if Q is sufficiently diagonally dominant and \(d_1d_2\ge 4\), then the third and fourth inequalities satisfy positive semidefiniteness as well. \(\square \)

Now suppose that in (23), we fix \(y_i=\lambda /(\beta _i+\gamma _i^\top z)\), where \(\lambda \) is small enough to ensure that constraint \(\sum _{i=1}^m\text {Tr}(\Gamma _i)y_i\le 1\) is satisfied. Then inequality (23) reduces to

$$\begin{aligned} mt\ge \sum _{i=1}^m \frac{x^\top \Gamma _i x}{\beta _i+\gamma _i^\top z}, \end{aligned}$$

which is precisely the relaxations obtained from (3). We make the following two important observations.

Observation 1 Relaxations obtained by fixing a given decomposition (3) [19, 20] are, in general, insufficient to describe \(\text {cl conv}(X)\). Indeed, from Theorem 3, describing \(\text {cl conv}(X)\) requires one inequality per extreme point of the region \({\mathcal {Y}}\), whereas a given decomposition corresponds to a single point in this region.

Observation 2 On the other hand, the strong “optimal” or “dynamic” relaxations [5, 17, 35], where the decomposition is not fixed but instead is chosen dynamically, are excessive to describe \(\text {cl conv}(X)\). Indeed, they are of the form (23) for every possible (rank-one, \(2\times 2\), remainder) matrix, and are not finitely generated; whereas, our results imply that the necessary inequalities are finitely generated.

We conclude this section with an analysis of rank-one decompositions, where we assume for simplicity that \(Q\succ 0\): given a subset \({\mathcal {T}}\subseteq 2^{[n]}\), rank-one relaxations are given by

$$\begin{aligned} t\ge \sum _{T\in {\mathcal {T}}}\frac{({\hat{h}}_T^\top x)^2}{{\hat{\varvec{e}}}_T^\top z }+x^\top Rx, \end{aligned}$$
(28)

where \(R=Q-\sum _{T\in {\mathcal {T}}}{\hat{h}}_T{\hat{h}}_T^\top \succeq 0\), and \({\hat{h}}_T\in \mathbb {R}^n\) are given vectors that are zero in entries not indexed by T. Relaxation (28) can be interpreted as a decomposition obtained from valid inequalities for P of the form

$$\begin{aligned} \langle {\hat{h}}_T{\hat{h}}_T^\top ,W \rangle \le \gamma {\hat{\varvec{e}}}_T^\top z, \end{aligned}$$
(29)

where \(\gamma \ge 0\). Note that inequality (29) is valid for P if

$$\begin{aligned} \gamma \ge \max _{{\hat{\varvec{e}}}_{S}\in Z} \frac{1}{|S \bigcap T|}\langle {\hat{h}}_T{\hat{h}}_T^\top , {\hat{Q}}_S^{-1} \rangle .\end{aligned}$$
(30)

Proposition 5

If \(\gamma = \max _{{\hat{\varvec{e}}}_{S}\in Z} \frac{1}{|S \bigcap T|} \langle {\hat{h}}_T{\hat{h}}_T^\top , {\hat{Q}}_S^{-1} \rangle \), then inequality (29) defines a face of P of dimension at least \(\textrm{dim}(P_0)+1\), where

$$\begin{aligned} P_0=\left\{ (z,W)\in P: z_T=0\text { and } W_T=0\right\} . \end{aligned}$$

Proof

There are \(\text {dim}(P_0)+1\) affinely independent points in \(P_0\), and all satisfy (29) at equality. Letting \(S^*\in \mathop {\mathrm {arg\,max}}\limits _{{\hat{\varvec{e}}}_{S}\in Z} \frac{1}{|S \bigcap T|} \langle {\hat{h}}_T{\hat{h}}_T^\top , {\hat{Q}}_S^{-1} \rangle \), we find that \(({\hat{\varvec{e}}}_{S^*}, {\hat{Q}}_{S^*}^{-1})\) is an additional affinely independent point satisfying (29) at equality. \(\square \)

Note that if optimization problem (30) has multiple optimal solutions, then one can find additional affinely independent points. In particular, (29) is guaranteed to define a high dimensional face of P if |T| is small. Indeed, inequalities (29) were found to be particularly effective computationally if \({\mathcal {T}}=\left\{ T\subseteq [n]: |T|\le \kappa \right\} \) for some small \(\kappa \) [5], although a theoretical justification of this observation has been missing until now.

Remark 7

(Description of \(\text {cl conv}(X_{2\times 2})\), continued) Consider again the facet-defining inequalities given in Example 4. The last two inequalities correspond to a rank-one strengthening with \(|T|=1\), which leads to relaxations of \(X_{2\times 2}\) similar to the perspective relaxation. Thus, we may argue that the perspective relaxation is required to describe \(\text {cl conv}(X_{2\times 2})\). \(\square \)

5 A mixed-integer linear formulation for P

The polyhedron P can (in theory) be studied using standard methods from mixed-integer linear optimization. However, the vertex representation of P is often not convenient, as most techniques require that the polyhedron be described explicitly via linear inequalities. Thus, in this section, we present such a mixed-integer linear formulation for the vertices of polytope P when the Hessian matrix Q is positive definite.

First, we describe the linear equalities necessary for P. Throughout this section, for ease of exposition, for a given \(S\subseteq [n]\), we permute the rows and columns of Q such that indices in S appear first.

Proposition 6

For any \((z, W) \in P\),

$$\begin{aligned} \sum _{k} Q_{ik} W_{ki} = z_i, \quad \forall i \in [n]. \end{aligned}$$
(31)

Proof

For any \(S \subseteq [n]\), \(({\hat{\varvec{e}}}_{S}, {\hat{Q}}_{S}^{-1}) \in P\), we have

$$\begin{aligned} {\hat{Q}}_{S}^{-1}Q = \begin{pmatrix} Q_S^{-1} &{} 0 \\ 0 &{} 0 \end{pmatrix} \begin{pmatrix} Q_{S} &{} Q_{S, [n] \backslash S} \\ Q_{S, [n] \backslash S}^{\top } &{} Q_{[n] \backslash S} \end{pmatrix} = \begin{pmatrix} I_{|S|} &{} Q_{S}^{-1} Q_{S, [n] \backslash S} \\ 0 &{} 0 \end{pmatrix}.\end{aligned}$$
(32)

Observe that the \(i^{th}\) diagonal entry of \({\hat{Q}}_{S}^{-1} Q\) is one if \(i \in S\) and zero otherwise. Since at all extreme points of P we have \(z={\hat{\varvec{e}}}_S\) and \(W={\hat{Q}}_S^{-1}\) for some \(S\subseteq [n]\), it follows that \((WQ)_{ii}=({\hat{Q}}_{S}^{-1} Q)_{ii}=z_i\). \(\square \)

Since P satisfies n linearly independent equalities, we immediately get insights into the dimension of P.

Corollary 1

The dimension of P is at most \(n(n+1)/2\). If \(Q_{ij}\ne 0\) for all \(i,j\in [n]\) and \(Z=\{0,1\}^n\), then this bound is tight.

Proof

Polyhedron P has \(n+n^2\) variables, but symmetry constraints \(W_{ij}=W_{ji}\) and equalities (31) imply the upper bound on the dimension. If \(Q_{ij}\ne 0\) for all \(i,j\in [n]\), the set of points \(({\hat{\varvec{e}}}_{\{i,j\}},Q_{\{i,j\}}^{-1})_{i\ne j}\) and \(({\hat{\varvec{e}}}_{i}, Q_{\{i\}}^{-1})_{i \in [n]}\) are \(n(n+1)/2\) affinely independent points of P, because each point is the unique one satisfying \(W_{ij}\ne 0\). Together with point \((0,\textbf{0})\), where 0 represents the null matrix, we find the required \(n(n+1)/2+1\) affinely independent points in P. \(\square \)

From Corollary 1, we see that (under mild conditions) there are no other equalities in the description of P. In order to construct a mixed-integer linear formulation for the vertices of P, we will use big-M constraints. Lemmas 2 and 3 are necessary to identify valid bounds for coefficients M.

Lemma 2

For any \(S \subseteq [n]\), \(Q^{-1} \succeq {\hat{Q}}_{S}^{-1}\) and \(\Vert {\hat{Q}}_{S}^{-1}\Vert _{\max } \le \lambda _{\max }(Q^{-1})\).

Proof

To prove \(Q^{-1} \succeq {\hat{Q}}_{S}^{-1}\) for \(S \subseteq [n]\), it suffices to show \(I \succeq Q^{1/2} {\hat{Q}}_{S}^{-1} Q^{1/2}\). Since switching the order of matrix multiplication does not change the set of nonzero eigenvalues, the nonzero eigenvalues of \(Q^{1/2} {\hat{Q}}_{S}^{-1}Q^{1/2}\) coincide with those of \({\hat{Q}}_{S}^{-1}Q \). From (32) one sees that \({\hat{Q}}_{S}^{-1}Q = \begin{pmatrix} I_{|S|} &{} Q_{S}^{-1} Q_{S, [n] \backslash S} \\ 0 &{} 0 \end{pmatrix}\) is an upper triangular matrix, which has a maximum eigenvalue of one. Then we conclude that \(I \succeq Q^{1/2} {\hat{Q}}_{S}^{-1} Q^{1/2}\) and thus \(Q^{-1} \succeq {\hat{Q}}_{S}^{-1}\).

For the second part, it follows that for \(i \in [n]\), \(({\hat{Q}}_{S}^{-1})_{ii}\le Q_{ii}^{-1} \le \lambda _{\max } (Q^{-1})\). Since \({\hat{Q}}_{S}^{-1} \succeq 0\), for any \(i,j \in [n]\), \(({\hat{Q}}_{S}^{-1})_{ij}^2\le ({\hat{Q}}_{S}^{-1})_{ii}({\hat{Q}}_{S}^{-1})_{jj}\). As \(\lambda _{\max }(Q^{-1})\) gives a uniform bound on the diagonal elements of \({\hat{Q}}_{S}^{-1}\), \(\lambda _{\max }(Q^{-1})\) also bounds the absolute value of the off-diagonal elements of \({\hat{Q}}_{S}^{-1}\). \(\square \)

Next, we define

$$\begin{aligned} M {\mathop {=}\limits ^{\text {def}}}\lambda _{\max }(Q^{-1}) \max _{i \in [n]} \big \{\Vert Q_{[n],\{i\}}\Vert _{2} \big \} \end{aligned}$$
(33)

and prove that M provides a bound for the off-diagonal elements of \({\hat{Q}}_{S}^{-1} Q\) for any \(S \subseteq [n]\) in the following lemma.

Lemma 3

For any \(S \subseteq [n]\), the off-diagonals of \({\hat{Q}}_{S}^{-1} Q\) are bounded by M.

Proof

Note that \({\hat{Q}}_{S}^{-1} Q = \begin{pmatrix} I_{|S|} &{} Q_{S}^{-1} Q_{S, [n]\backslash S} \\ 0 &{} 0 \end{pmatrix}\). For any \(j \notin S\),

$$\begin{aligned}\Vert Q_{S}^{-1} Q_{S, \{j\}}\Vert _{\infty } \le \Vert Q_{S}^{-1} Q_{S, \{j\}}\Vert _{2}&\le \lambda _{\max }(Q_{S}^{-1}) \Vert Q_{S, \{j\}}\Vert _{2} \\&= \lambda _{\max } ({\hat{Q}}_{S}^{-1}) \Vert Q_{S, \{j\}}\Vert _{2} \le \lambda _{\max }(Q^{-1}) \Vert Q_{[n],\{j\}}\Vert _{2}, \end{aligned}$$

where the last inequality follows from Lemma 2. \(\square \)

One can make a few observations about \(P = \{({\hat{\varvec{e}}}_{S}, {\hat{Q}}_{S}^{-1})\}_{{\hat{\varvec{e}}}_{S} \in Z}\). Note that at extreme points of P, \(W={\hat{Q}}_S^{-1}\) for some S. Thus, for any extreme point \((z, W) \in P\), \(W_{ij}\) is nonzero only if \(z_i = z_j = 1\). Moreover, for any \(S \subseteq [n]\), \(({\hat{\varvec{e}}}_{S}, {\hat{Q}}_{S}^{-1}) \in P\), \(Q {\hat{Q}}_{S}^{-1} = QW=\begin{pmatrix} I_{|S|} &{} 0 \\ Q_{S, [n] \backslash S}^{\top } Q_{S}^{-1} &{} 0 \end{pmatrix}\), and the off-diagonal entries in the \(i^{th}\) row of QW are all zeros if \(i \in S\). These two observations lead to the formulation in the following proposition.

Proposition 7

The extreme points of P are described as

$$\begin{aligned} \left\{ ({\hat{e}}_{S}, {\hat{Q}}_{S}^{-1})_{{\hat{e}}_S \in Z} \right\} = \Big \{(z, W)&\in Z \times \mathbb {R}^{n \times n} \; | \; \sum _{k = 1}^{n} Q_{ik} W_{ki} = z_i, \; \forall i \in [n], \\&-M(1 - z_i) \le \sum _{k = 1}^{n} Q_{ik} W_{kj} \le M(1 - z_i), \; \forall i \ne j, \\&|W_{ij}| \le \lambda _{\max }(Q^{-1}) \min \{z_i, z_j\}, \; \forall i,j \in [n]\Big \}. \end{aligned}$$

Proof

For any \(z = {\hat{e}}_{S} \in Z\), the constraint

$$\begin{aligned} |W_{ij}| \le \lambda _{\max }(Q^{-1}) \min \{z_i, z_j\}, \quad \forall i,j \in [n], \end{aligned}$$

implies that \(W_{ij} = 0\) if either i or j is not in S. For \(i \in S\), we have

$$\begin{aligned}&\sum _{k = 1}^{n} Q_{ik} W_{ki} = 1 \end{aligned}$$
(34)
$$\begin{aligned}&\sum _{k = 1}^{n} Q_{ik} W_{kj} = 0, \quad \forall j \ne i. \end{aligned}$$
(35)

Inequalities (34) and (35) imply that \(\begin{pmatrix} Q_{S}&Q_{S, [n] \backslash S}\end{pmatrix} \begin{pmatrix} W_{S} \\ W_{S, [n] \backslash S}^{\top }\end{pmatrix} = I\). Since \(W_{S, [n] \backslash S} = 0\), we have \(Q_S W_{S} = I\) and \(W = {\hat{Q}}_{S}^{-1}\). Therefore, \(Q {\hat{Q}}_{S}^{-1} = \begin{pmatrix} I &{} 0 \\ Q_{S, [n]\backslash S}^{\top } Q_S^{-1} &{} 0 \end{pmatrix}\). It is clear that the off-diagonal elements in the \(i^{th}\) row are all zero if \(i \in S\), otherwise (if \(i\not \in S\)) they are bounded by M according to Lemma  3. In other words, constraints

$$\begin{aligned}&-M (1 - z_i) \le \sum _{k = 1}^{n} Q_{ik} W_{kj} \le M(1 - z_i), \quad \forall j \ne i \end{aligned}$$

hold. Moreover, thanks to Lemma  2, the constraints

$$\begin{aligned} |W_{ij}| \le \lambda _{\max }(Q^{-1}) \min \{z_i, z_j\}, \; \forall i,j \in [n] \end{aligned}$$
(36)

hold at \(W = {\hat{Q}}_{S}^{-1}\) and \(z = {\hat{\varvec{e}}}_{S}\) as well. \(\square \)

Proposition 7 allows us to give a mixed-integer linear formulation for the MIQO problem (1). Substituting the mixed-integer linear representation of P in Proposition  7 in the equivalent MIQO formulation (8), we arrive at:

$$\begin{aligned} \min _{z, W} \;&-\frac{1}{2} a^{\top } W a + b^{\top } z \end{aligned}$$
(37a)
$$\begin{aligned} \text {s.t.} \;&\sum _{k = 1}^{n} Q_{ik} W_{ki} = z_i, \quad \forall i \in [n] \end{aligned}$$
(37b)
$$\begin{aligned} \text {(MILO)} \quad \quad \ \ \ {}&-M (1 - z_i) \le \sum _{k = 1}^{n} Q_{ik} W_{kj} \le M(1 - z_i), \quad \forall i \ne j \end{aligned}$$
(37c)
$$\begin{aligned}&|W_{ij}| \le \lambda _{\max }(Q^{-1}) \min \{z_i, z_j\}, \quad \forall i,j \in [n] \end{aligned}$$
(37d)
$$\begin{aligned}&z \in Z, \end{aligned}$$
(37e)

where M is defined in (33). MILO is the first polynomial-size explicit mixed-integer linear formulation given for (1).

We point out that the mixed-integer representation of P in Proposition 7 relies on big-M constraints and, therefore, it is not a strong formulation. Nonetheless, advanced mixed-integer linear optimization solvers have a plethora of built-in techniques to improve such formulations. Preliminary computations using Gurobi indicate the following findings:

  1. (1)

    The natural relaxation of (37) is very weak and, therefore, (37) results in worse performance than alternative (nonlinear) formulations for problem (1) in most cases.

  2. (2)

    In some cases, however, and notably when the matrix Q is sparse, Gurobi improves the relaxation in presolve to the point where the problems are solved at the root node, faster than existing formulations for (1). This situation illustrates that (in some cases) existing methods can improve even weak relaxations, whereas similar improvements are not currently available for nonlinear formulations.

Detailed computational results are presented in Appendix B. Overall, the results illustrate the potential benefits of reducing convexification to describing a polyhedral set, but also indicate that much work remains to be done for deriving better relaxations of P.

6 Conclusion

In this paper, we first describe the convex hull of the epigraph of a convex quadratic function with indicators in an extended space, which is given by one semi-definite constraint, and an exponential system of linear inequalities defining the convex hull of a polytope, P (or \(P_F\)). We then derive the convex hull description in the original space as a semi-infinite conic quadratic program. Furthermore, we give a compact mixed-integer linear representation of the vertices of the polytope P that results in the first compact mixed-integer linear formulation of MIQO problems. While this is a weak formulation, our preliminary computational experience indicates that for a class of sparse problems, off-the-shelf solvers are able to take advantage of the developments in MILO to improve the formulation substantially and it is competitive if not better than state-of-the-art approaches. To translate our theoretical developments into effective practical methods, it is crucial to exploit the structure of P. In our ongoing work, we explore the case when Q is a Stieltjes matrix for which P has a nice structure that allows us to use our results directly without resorting to the MILO formulation. Our results provide a unifying framework for several convex relaxations of MIQO problems in the literature and can also be used to evaluate their strength.