On different Versions of the Exact Subgraph Hierarchy for the Stable Set Problem

Let $G$ be a graph with $n$ vertices and $m$ edges. One of several hierarchies towards the stability number of $G$ is the exact subgraph hierarchy (ESH). On the first level it computes the Lov\'{a}sz theta function $\vartheta(G)$ as semidefinite program (SDP) with a matrix variable of order $n+1$ and $n+m+1$ constraints. On the $k$-th level it adds all exact subgraph constraints (ESC) for subgraphs of order $k$ to the SDP. An ESC ensures that the submatrix of the matrix variable corresponding to the subgraph is in the correct polytope. By including only some ESCs into the SDP the ESH can be exploited computationally. In this paper we introduce a variant of the ESH that computes $\vartheta(G)$ through an SDP with a matrix variable of order $n$ and $m+1$ constraints. We show that it makes sense to include the ESCs into this SDP and introduce the compressed ESH (CESH) analogously to the ESH. Computationally the CESH seems favorable as the SDP is smaller. However, we prove that the bounds based on the ESH are always at least as good as those of the CESH. In computational experiments sometimes they are significantly better. We also introduce scaled ESCs (SESCs), which are a more natural way to include exactness constraints into the smaller SDP and we prove that including an SESC is equivalent to including an ESC for every subgraph.


Introduction
One of the most fundamental problems in combinatorial optimization is the stable set problem.Given a graph G = (V, E), a subset of vertices S ⊆ V is called stable set if no two vertices of S are adjacent.A stable set is called maximum stable set if there is no stable set with larger cardinality.The cardinality of a maximum stable set is called stability number of G and denoted by α(G).The stable set problem asks for a stable set of size α(G).It is an NP-hard and well-studied problem, see for example the survey of Bomze, Budinich, Pardalos and Pelillo [3].
In this paper we show that it makes sense to consider this new hierarchy, which we newly introduce as compressed (because the SDP is smaller) ESH (CESH).We prove that both the ESH and the CESH are equal to ϑ(G) on the first level and equal to α(G) on the n-th level.Furthermore, the SDP has a smaller matrix variable and and fewer constraints, so intuitively the CESH is computationally favorable.However, we prove that the bounds obtained by including an ESC into (T n+1 ) are always at least as good as those obtained from including the same ESC into (T n ), demonstrating that the bounds obtained from the ESH are at least as good as those from the CESH.Furthermore, it turns out in our computational comparison that the bounds are sometimes significantly worse for the CESH, but the running times do not significantly decrease.Hence, we confirm that the ESH has the better trade-off between the quality of the bound and the running time.
The intuition behind the SDP (T n ) is different than the one of (T n+1 ), in particular for the solutions representing stable sets.We show in this paper that there is an alternative intuitive definition of exact subgraphs for (T n ).This leads to our new definition of scaled ESCs (SESC) and our introduction of another new hierarchy, the scaled ESH (SESH).We prove that SESCs coincide with the original ESCs for (T n ), which implies that the ESH and the SESH coincide.
To summarize, in this paper we confirm that even though our new hierarchies based on exactness seem more intuitive and computational favorable, with off the shelve SDP solvers it is the best option to consider the ESH in the way it has been done so far.Our findings are in accordance with the results of [16], where it is observed that (T n+1 ) typically gives stronger bounds when strengthened.
The rest of the paper is organized as follows.In Section 2 we give rigorous definitions of ESCs and the ESH and explain how they can be exploited computationally.In Section 3 we introduce the CESH and compare it to the ESH, also in the light of the results of [16].Then we introduce SESCs in Section 4 and investigate how they are related to the ESCs.In Section 5 we present computational results and we conclude our paper in Section 6.
We use the following notation.We denote by N 0 the natural numbers starting with 0. By 1 d and 0 d we denote the vector or matrix of all ones and all zeros of size d, respectively.Furthermore, by S n we denote the set of symmetric matrices in R n×n .We denote the convex hull of a set S by conv(S) and the trace of a matrix X by trace(X).Moreover, diag(X) extracts the main diagonal of the matrix X into a vector.By x T and X T we denote the transposed of the vector x and the matrix X, respectively.Moreover, we denote the i-th entry of the vector x by x i and the entry of X in the i-th row and the j-th column by X i,j .Furthermore, we denote the inner product of two vectors x and y by x, y = x T y.The inner product of two matrices X = (X i,j ) 1 i,j n and Y = (Y i,j ) 1 i,j n is defined as X, Y = n i=1 n j=1 X i,j Y i,j .Furthermore, the t-dimensional simplex is given as ∆ t = λ ∈ R t : t i=1 λ i = 1, λ i 0 ∀1 i t .

The Exact Subgraph Hierarchy
In this section we recall exact subgraph constraints and the exact subgraph hierarchy for combinatorial optimization problems that have an SDP relaxation introduced by Adams, Anjos, Rendl and Wiegele in 2015 [1].We detail everything for the stable set problem, because in [1] they focused on Max-Cut.Besides mo-tivation and definitions, we provide new examples, discuss the representation of exact subgraph constraints and compare the exact subgraph hierarchy to other hierarchies from the literature.

Lovász Theta Function
We start by presenting the Lovász theta function.To to so, it is handy to consider the incidence vectors of stable sets and the polytope they span.
Then the set of all stable set vectors S(G) and the stable set polytope STAB(G) are defined as S(G) = {s ∈ {0, 1} n : s i s j = 0 ∀{i, j} ∈ E} and STAB(G) = conv {s : s ∈ S(G)} .
It is easy to see that the stability number α(G) is obtained by solving but unfortunately STAB(G) is very hard to describe in general.Several linear relaxations of STAB(G) have been considered, like the so-called fractional stable set polytope and the clique constraint stable set polytope.We refer to [18] for further details.We focus on another relaxation, namely the Lovász theta function ϑ(G), which is an upper bound on α(G).Grötschel, Lovász and Schrijver [18] proved and hence provided an SDP formulation of ϑ(G).This SDP has a matrix variable of order n + 1.Furthermore, there are m constraints of the form X i,j = 0, n constraints to make sure that diag(X) = x and one constraint ensures that in the matrix of order n + 1 the entry in the first row and first column is equal to 1. Hence, there are n + m + 1 linear equality constraints in (T n+1 ).
To formulate (T n+1 ) in a more compact way we observe the well-known fact that X −xx T 0 if and only if 1 x T x X 0, see Boyd and Vandenberghe [4, Appendix A.5.5] on Schur complements.Thus, the feasible region of (T n+1 ) is Clearly for each element (x, X) of TH 2 (G) the projection of X onto its main diagonal is x.The set of all projections is called theta body.More information on TH(G) can be found for example in Conforti, Cornuejols and Zambelli [8].It is easy to see that STAB(G) ⊆ TH(G) holds for every graph G, see [18].Thus, ϑ(G) is a relaxation of α(G).

Introduction of the Exact Subgraph Hierarchy
In order to present the exact subgraph hierarchy we need a modification of the stable set polytope STAB(G), namely the squared stable set polytope.
Definition 2. Let G = (V, E) be a graph.The squared stable set polytope STAB 2 (G) of G is defined as The matrices of the form ss T for s ∈ S(G) are called stable set matrices.
Note that the elements of STAB(G) are vectors in R n , whereas the elements of STAB 2 (G) are matrices in R n×n .In comparison to STAB(G) the structure of STAB 2 (G) is more sophisticated and less studied.Only if G has no edges a projection of STAB 2 (G) coincides with a well-studied object, the boolean quadric polytope, see Padberg [27].In particular, by putting the upper triangle with the main diagonal into a vector for all elements of STAB 2 (G) we obtain the elements of the boolean quadric polytope.
Let us now turn back to ϑ(G).The following lemma turns out to be the key ingredient for defining the exact subgraph hierarchy.
Lemma 1.If we add the constraint X ∈ STAB 2 (G) into (T n+1 ) for a graph G, then the optimal objective function value is α(G), so Proof.Let (P E ) be the SDP on the right-hand side of (1), let z E be its optimal objective function value and let S(G) = {s 1 , . . ., s t }.
Let without loss of generality s t be the incidence vector of a maximum stable set of G. Then clearly x = s t and X = s t s T t is feasible for (P E ) and has objective function value α(G), so α(G) z E holds.
Furthermore, any feasible solution (x, X) of (P E ) can be written as for some λ ∈ ∆ t because X ∈ STAB 2 (G) holds.Thus, x can be written as In consequence, the objective function value of (x, X) for (P E ) is equal to and hence z E α(G) holds, which finishes the proof.
Lemma 1 implies that if we add the constraint X ∈ STAB 2 (G) to (T n+1 ), then we get the best possible bound on α(G), namely α(G).Unfortunately, depending on the representation of the constraint, we either include an exponential number of new variables (if we use a formulation as convex hull) or inequality constraints (if we include inequalities representing facets of STAB 2 (G), see Section 2.3) into the SDP.In order to only partially include X ∈ STAB 2 (G) we exploit a property of stable sets, namely that a stable set of G induces also a stable set in each subgraph of G. To formalize this in an observation, we first need the following definition.Definition 3. Let I ⊆ V be a subset of the vertices of the graph G = (V, E) with |V | = n and let k I = |I|.We denote by G I the subgraph of G that is induced by I. Furthermore, we denote by X I = (X i,j ) i,j∈I the submatrix of X ∈ R n×n which is indexed by I.
Proof.As X I ∈ STAB 2 (G I ) for all I ⊆ V implies X ∈ STAB 2 (G) for I = V , one direction of the equivalence is trivial.For the other direction note that X ∈ STAB 2 (G) implies that X is a convex combination of ss T for stable set vectors s ∈ S(G).From this one can easily extract a convex combination of ss T for s ∈ S(G I ) for X I , thus X I ∈ STAB 2 (G I ) for all I ⊆ V .
Observation 1 implies that adding the constraint X ∈ STAB 2 (G) to (T n+1 ) as in Lemma 1 makes sure that the constraint X I ∈ STAB 2 (G I ) is fulfilled for all subgraphs G I of G.This gives rise to the following definition.Definition 4. Let G = (V, E) be a graph and let I ⊆ V .Then the exact subgraph constraint (ESC) for G I is defined as X I ∈ STAB 2 (G I ).
Definition 5. Let G = (V, E) be a graph with |V | = n and let J be a set of subsets of V .Then z E J (G) is the optimal objective function value of (T n+1 ) with the ESC for every subgraph induced by a set in J, so Furthermore, for k ∈ N 0 with k n let J k = {I ⊆ V : |I| = k}.Then the k-th level of the exact subgraph hierarchy (ESH) is defined as In other words the k-th level of the ESH is the SDP for calculating the Lovász theta function (T n+1 ) with additional ESCs for every subgraph of order k.Due to Lemma 1 every level of the ESH is a relaxation of (1).
Note that Adams, Anjos, Rendl and Wiegele did not give the hierarchy a name.However, they called the ESCs for all subgraphs of order k and therefore the constraint to add at the k-th level of the ESH the k-projection constraint.
Let us briefly look at some properties of z E k (G).For example, the next lemma shows that the bound obtained from the ESH is better the higher the level of the ESH is.
For k = 0 we do not add any additional constraint into (T n+1 ).For k = 1 the ESC for I = {i} boils down to X i,i ∈ [0, 1], which is enforced by X 0. Therefore, ϑ(G) = z E 0 (G) = z E 1 (G) holds.Additionally, due to Lemma 1 whenever all subgraphs of order k are exact, also all subgraphs of order k − 1 are exact, which yields the desired result.
Next, we consider an example in order to get a feeling for the ESH and how good the bounds on α(G) obtained with it are.
Example 1.We consider z E k (G) for k 8 for a Paley graph, a Hamming graph [10] and a random graph G 60,0.25 from the Erdős-Rényi model in Table 1.It is possible to compute z E 2 (G).For k 3 we use relaxations (i.e.we compute z E J (G) by including the ESCs only for a subset J of the set of all subgraphs of order k and determine the sets J as it is described in more detail in Section 5) to get an upper bound on z E k (G) or deduce the value.For hamming6 4 already for k = 2 the upper bound is an excellent bound on α(G) for this graph.For G 60,0.25 as k increases z E k (G) improves little by little.For k = 4 the floor value of z E k (G) decreases, which is very important in a branchand-bound framework, where this potentially reduces the size of the branch-andbound tree drastically.For the Paley graph on 61 vertices only for k 6 the value of z E k (G) improves towards α(G).This example represents one of the worst cases, where including ESCs for subgraphs of small order does not give an improvement of the upper bound.Example 1 shows that there are graphs where including ESCs for subgraphs of small order improves the bound very much, little by little and not at all.It is not surprising that the ESH does not give outstanding bounds for all instances, as the stable set problem is NP-hard.

Representation of Exact Subgraph Constraints
Next, we briefly discuss the implementation of ESCs.In Definition 2 we introduced STAB 2 (G) as convex hull, so the most natural way to formulate the ESC is as a convex combination as in the proof of Lemma 1.We start with the following definition.Definition 6.Let G be a graph and let G I be the subgraph induced by I ⊆ V .Furthermore, let |S(G I )| = t I and let S(G I ) = s I 1 , . . ., s I tI .Then the i-th stable set matrix S I i of G I is defined as S I i = s I i (s I i ) T .Now the ESC X I ∈ STAB 2 (G I ) can be rewritten as and it is natural to implement the ESC for subgraph G I as This implies that for the implementation of the ESC for G I we include t I additional non-negative variables, one additional equality constraint for λ I and a matrix equality constraint of size k I × k I that couples X I and λ I into (T n+1 ).
There is also a different possibility to represent ESCs that uses the following fact.The polytope STAB 2 (G I ) is given by its extreme points, which are the stable set matrices of G I .Due to the Minkowski-Weyl's theorem it can also be represented by its facets, i.e. by (finitely many) inequalities.A priory different subgraphs induce different stable set matrices and hence also different squared stable set polytopes.The next result allows us to consider the squared stable set polytope of only one graph for a given order.
Proof.If X ∈ STAB 2 (G), then by definition X is a convex combination of stable set matrices of G. Then it is also a convex combination of stable set matrices of G 0 n , which are all possible stable set matrices of order n.Hence, then X is a convex combination of all possible stable set matrices of order n.Consider an edge {i, j} ∈ E, then by assumption X i,j = 0. Since all entries of stable set matrices are 0 or 1, this implies that whenever the entry (i, j) of a stable set matrix in the convex combination is not equal to zero, its coefficient is zero.Therefore, in the convex combination only stable set matrices which are also stable set matrices of G have non-zero coefficients and thus X ∈ STAB 2 (G).
As a consequence of Lemma 3 we can replace the ESC X I ∈ STAB 2 (G I ) by the constraint X I ∈ STAB 2 (G 0 kI ) whenever we add the ESC to (T n+1 ).Thus, it is enough to have a facet representation of STAB 2 (G 0 kI ) in order to include the ESC for G I represented by inequalities into (T n+1 ).
In order to obtain all facets of STAB 2 (G 0 k ) for a given k we can use the fact that a projection of STAB 2 (G 0 k ) is the boolean quadric polytope of size k as already explained in Section 2.2.Deza and Laurent [9] called the boolean quadric polytope of size k the correlation polytope of size k.They showed that the correlation polytope of size k is in one-to-one correspondence with the cut polytope of size k+1 via the so-called covariance map.Moreover, they presented a complete list of the facets of the cut polytopes up to a size of k + 1 = 7, gave several references of other lists of facets and furthermore linked to a web page.The recent version of this web page is maintained by Christof [6] and a conjectured complete facet description of the cut polytope of size k + 1 = 8 and a possibly complete description of the cut polytope of size k + 1 = 9 can be found there.Therefore, we could take this list and go back via the covariance map to transfer it into a complete list of facets of STAB 2 (G 0 k ).However, we take a more direct path and use the software PORTA [7] in order to obtain all inequalities that represent facets of STAB 2 (G 0 k ) from its extreme points for a given k.The number of facets for all k 6 is presented in Table 2.
For a subgraph G I of order k I = 3 with I = {i, j, k} the ESCs is equivalent to (3) for all three sets {i, j}, {i, ℓ} and {j, ℓ} and the following inequalities so 3 • 4 + 4 = 16 inequalities, which matches Table 2.We come back to these inequalities in Section 2.4 and Section 3.4.
To summarize, we have discussed two different options to represent ESCs, one as convex combination and one as inequalities that represent facets.

Comparison to Other Hierarchies
In this section we compare the ESH for the stable set problem to other hierarchies, as it has never been done before.
The most prominent hierarchies of relaxations for general 0-1 programming problems are the hierarchies by Sherali and Adams [29], by Lovász and Schrijver [25] and by Lasserre [22].We refer to Laurent [23] for rigorous definitions, comparisons and for details of applying them to the stable set problem.
In fact the Lasserre hierarchy is a refinement of the Sherali-Adams hierarchy which is a refinement of the SDP based Lovász-Schrijver hierarchy.All three hierarchies are exact at level α(G), so after at most α(G) steps STAB(G) is obtained.
Silvestri [30] observed that z E 2 (G) is at least as good as the upper bound obtained at the first level of the SDP hierarchy of Lovász-Schrijver.This is easy to see, because this SDP is (T n+1 ) with non-negativity constraints for X, and every X I ∈ STAB(G I ) is entry-wise non-negative due to (3a).Furthermore, Silvestri proved that the bound on the k-th level of the Lasserre hierarchy is at least as good as z E k (G), so the Lasserre hierarchy yields stronger relaxations than the ESH.
A drawback of all the above hierarchies is that the size of the SDPs to solve grows at each level.In particular, the SDP at the k-th level of the Lasserre hierarchy has a matrix variable with one row for each subset of i vertices of the n vertices for every 1 i k.Therefore, the matrix variable is of order k i=0 n i .For the ESH this order remains n + 1 on each level and only the number of constraints increases.
Another big advantage of the ESH over the Lasserre hierarchy is that it is possible to include partial information of the k-th level of the hierarchy, which was exploited by Gaar and Rendl [13,14,15].In the case of the Lasserre hierarchy one needs the whole huge matrix in order to incorporate the information.Due to that Gvozdenović, Laurent and Vallentin [20] introduced a new hierarchy where they only consider suitable principal submatrices of the huge matrix.
Eventually we want to compare the ESH with other relaxations of ϑ(G) towards α(G).Lovász and Schrijver [25] proposed to add inequalities that boil down to (3a), and inequalities of the form (4c) and (4d) whenever {i, j} ∈ E. Hence, z E k (G) is as least as good as this bound for all k 3. Furthermore, Gruber and Rendl [19] proposed to add inequalities of the form (4c) and (4d) also if {i, j} ∈ E, hence the k-th level of the ESH is as least as strong as this relaxation for every k 3.
Note that Fischer, Gruber, Rendl and Sotirov [12] add triangle inequalities into an SDP relaxation of Max-Cut.Therefore, applying the ESH to the Max-Cut relaxation as it is done by in [15] can be viewed as generalization of the approach in [12].
For a discussion of other approaches for improving a relaxation by including information of smaller polytopes into the relaxation see [1].

The Compressed Exact Subgraph Hierarchy
In this section we newly introduce a variant of the ESH, namely the compressed ESH, which at first sight is computational favorable to the ESH, as it starts from a smaller SDP formulation of the Lovász theta function.Additionally, we compare this new hierarchy to the ESH and to other hierarchies from the literature.

Two SDP Formulations of the Lovász Theta Function
The starting point of the new compressed ESH is an SDP formulation of the Lovász theta function ϑ(G) by Lovász [24], namely As the feasible region of (T n ) will be used later, we define Before we continue, we compare the two SDP formulations (T n+1 ) and (T n ) of ϑ(G).As already mentioned (T n+1 ) is an SDP with a matrix variable of order n + 1 and n + m + 1 equality constraints.The formulation (T n ) has a matrix variable of order n and m + 1 constraints, so both the number of variables and constraints is smaller.Hence, in computations (T n ) seems favorable.
So far, there has been a lot of work on comparing (T n+1 ) and (T n ).Gruber and Rendl [19] showed the following.If (x * , X * ) is a feasible solution of (T n+1 ), then X ′ = 1 trace(X * ) X * is a feasible solution of (T n ) which has at least the same objective function value.Hence, an optimal solution of (T n+1 ) can be transformed into an optimal solution of (T n ).They also proved that whenever X ′ is optimal for (T n ), then X * = 1 n×n , X ′ X ′ is optimal for (T n+1 ).Furthermore, Yildirim and Fan-Orzechowski [31] gave a transformation from a feasible solution X ′ of (T n ) to obtain x * of a feasible solution (x * , X * ) of (T n+1 ) with at least the same objective function value.Galli and Letchford [16] showed how to construct a corresponding X * .For an optimal X ′ the obtained optimal (x * , X * ) coincides with the one of Gruber and Rendl.Further details can be found in [16], where also the influence of adding certain cutting planes into (T n+1 ) and (T n ) is discussed.We come back to that later in Section 3.4.

Introduction of the Compressed Exact Subgraph Hierarchy
Next, we newly introduce the compressed exact subgraph hierarchy, a hierarchy similar to the ESH, but it starts from (T n ) instead of starting from (T n+1 ).First, we verify that it makes sense to build such a hierarchy.
Lemma 4. If we add the constraint X ∈ STAB 2 (G) into (T n ) for a graph G, then the optimal objective function value is α(G), so Proof.Let (P C ) be the SDP on the right-hand side of (5), let z C be its optimal objective function value and let S(G) = {s 1 , . . ., s t }.
Let without loss of generality s t be the incidence vector of a maximum stable set of G, and s 1 be the incidence vector of the empty set, which is of course stable.Then clearly Furthermore, any feasible solution X of (P C ) can be written as In consequence, the objective function value of X for (P C ) is equal to and hence z C α(G) holds, which finishes the proof.
Lemma 4 corresponds to Lemma 1 for the ESH and justifies the introduction of the compressed exact subgraph hierarchy.Definition 7. Let G = (V, E) be a graph with |V | = n and let J be a set of subsets of V .Then z C J (G) is the optimal objective function of (T n ) with the ESC for every subgraph induced by a set in J, so For k ∈ N 0 with k n the k-th level of the compressed exact subgraph hierarchy (CESH) is defined as As in the case of the ESH we can deduce the following result for the CESH.
Proof.Analogous to the proof of Lemma 2.
Hence, due to Lemma 2 and Lemma 5 both the ESH and the CESH start at ϑ(G) at level 1 and reach α(G) on level n.

Comparison to Other Hierarchies
Before we continue to consider the differences between the ESH and the CESH, we compare the CESH with other relaxations of α(G) based on (T n ).
Schrijver [28] suggested to add non-negativity constraints into (T n ) to obtain stronger bounds.Galli and Letchford [16] proved that it is equivalent to include non-negativity constraints into (T n+1 ) and (T n ), so z E 2 (G) is a stronger bound than this one because it induces non-negativity in (T n+1 ).Lemma 3 implies that also for (T n ) it is equivalent to include X I ∈ STAB 2 (G I ) and X I ∈ STAB 2 (G 0 kI ), so z C 2 (G) induces non-negativity due to (3a).Hence, also z C 2 (G) is as least as good as the bound of Schrijver.
Dukanovic and Rendl [11] proposed to add so-called triangle inequalities to (T n ).Silvestri [30] showed that z C 3 (G) is at least as good as upper bound as the bound of Dukanovic and Rendl.This is intuitive, because the triangle inequalities correspond to (4a), (4b) and (4c) and therefore represent faces of STAB 2 (G I ) for k I = 3.As a result, the CESH can be seen as a generalization of the relaxation of [11].

Comparison of the CESH and the ESH
Now we continue our comparison of the bounds based on the ESH and our new CESH.
Theorem 1.Let G = (V, E) be a graph with |V | = n and let J be a set of subsets of V .Then z E J (G) z C J (G).Proof.We consider the transformation of an optimal solution of (T n+1 ) into an optimal solution of (T n ) by Gruber and Rendl [19].We show that this transformation applied to the optimal solution of (2) yields a feasible solution of ( 6) with at least the same objective function value, thus Towards that end, let (x * , X * ) be an optimal solution of ( 2) and γ = z E J (G) = 1 T n x * its objective function value.Let X ′ = 1 γ X * .First, we show that X ′ is feasible for (6).Clearly X * − x * (x * ) T 0 and γ 0 imply X ′ 0. Furthermore, due to X * i,j = 0 for all {i, j} ∈ E we have What is left to check for feasibility are the ESCs.We can rewrite X * I ∈ STAB 2 (G I ) as X * I = tI i=1 λ I i S I i for tI i=1 λ I i = 1 and λ I i 0 for all 1 i t I .Let w.l.o.g.S I 1 be the zero matrix of dimension k I × k I , i.e. the first stable set matrix corresponds to the empty set.Then we define It is easy to see that λ I i ′ 0 for all 1 i t I and that tI i=1 holds.Furthermore, because S I 1 is a zero matrix and so γ−1 γ S I 1 = 0, we have As a consequence X ′ I ∈ STAB 2 (G I ) and thus X ′ I is feasible for (6).It remains to determine the objective function value of X ′ I for (6).From X * −x * (x * ) T 0 it follows that 1 T n (X * −x * (x * ) T )1 n 0 and hence 1 n×n , X * − x * (x * ) T 0. This implies that holds, thus To summarize, X ′ is a feasible solution of ( 6) with objective function value γ = z E J (G).Therefore, the optimal objective function value of the maximization problem ( 6) is at least z E J (G), so z E J (G) z C J (G).Theorem 1 states that the bounds obtained by starting from (T n+1 ) and including some ESCs is always at least as good as the bound obtained by starting from (T n ) and including the same ESCs.In particular, this implies that the relaxation on the k-th level of the ESH is at least as good as the relaxation on the k-th level of the CESH, which is formalized in the following corollary.
. We now further investigate the theoretical difference between the ESH and the CESH, especially in the light of the results of Galli and Letchford [16].They proved that whenever a collection of homogeneous inequalities is added to (T n+1 ), the resulting optimal solution yields a feasible solution for (T n ) with the same collection of inequalities, which has at least the same objective function value.This implies that adding homogeneous inequalities to (T n+1 ) gives stronger bounds on α(G) than adding the same inequalities to (T n ).
If we consider the ESCs in more detail as we did in Section 2.3, then in turns out that for k = 2 the inequalities (3a), (3b) and (3c) are homogeneous, while (3d) is inhomogeneous, so inhomogeneous inequalities are needed to represent ESCs.
Next, we give an intuition for the different behavior of inhomogeneous inequalities for the two SDP formulations of the Lovász theta function (T n+1 ) and (T n ).Let (x * , X * ) be an optimal solution of (T n+1 ) with additional constraints (3).From the proof of Lemma 1 we know that X ′ = 1 γ X * is a feasible solution of (T n ) with additional constraints (3).Indeed, the homogeneous inequalities (3a), (3b) and (3c) are preserved under scaling, matching [16].Scaling (3d) with 1 γ yields that X ′ satisfies and since 1 γ 1 it follows that X ′ satisfies (3d).If X ′ an optimal solution of (T n ) with additional constraints (3) and we use the transformation X * = γX ′ , then clearly X * satisfies (3a), (3b) and (3c).Scaling (3d) with γ yields that holds for X * .This does not imply that X * fulfills (3d) as γ 1.
To summarize, this consideration confirms that the ESCs for k I = 2 yield a stronger restriction in (T n+1 ) than they do in (T n ).This gap of the bounds gets even larger for larger k I , so for example for k I = 3 the inequality (4d) is inhomogeneous.This concludes our investigation of the new CESH.

The Scaled Exact Subgraph Hierarchy
In Section 3 we saw that including an ESC into (T n+1 ) as in the ESH gives a stronger bound than including the same ESC into (T n ) as in the CESH.In this section we investigate whether this is due to a suboptimal definition of the ESCs for the later case.In particular, we go back to the intuition behind ESCs for (T n+1 ) and transfer this intuition to (T n ).This will lead to the new definitions of scaled ESCs and the scaled ESH.We will explore this hierarchy and compare the CESH and the scaled ESH in detail.

Introduction of the Scaled Exact Subgraph Hierarchy
To start, observe the following.It can be confirmed easily that both (T n+1 ) and (T n ) are upper bounds on α(G).Let s ∈ S(G) be a stable set vector that corresponds to a maximum stable set.Then X * = ss T is feasible for (T n+1 ) and has objective function value α(G).Therefore, intuitively STAB 2 (G) defines exactly the appropriate polytope for (T n+1 ).
For (T n ) the matrix X ′ = 1 s T s ss T yields a feasible solution with objective function value α(G), whereas X * = ss T is not feasible unless α(G) = 1.Hence, intuitively it makes more sense to consider the polytope spanned by matrices of the form 1 s T s ss T for s ∈ S(G) for (T n ) than to consider STAB 2 (G).This leads to the following definition.Definition 8. Let G = (V, E) be a graph with |V | = n.Then the scaled squared stable set polytope SSTAB 2 (G) of G is defined as The goal of this section is to investigate a new modified version of the CESH based on the scaled squared stable set polytope defined in the following way.Definition 9. Let G = (V, E) be a graph and let I ⊆ V .Then the scaled exact subgraph constraint (SESC) for G I is defined as X I ∈ SSTAB 2 (G I ).Furthermore, let |V | = n and let J be a set of subsets of V .Then z S J (G) is the optimal objective function value of (T n ) with the SESC for every subgraph induced by a set in J, so For k ∈ N 0 with k n the k-th level of the scaled exact subgraph hierarchy (SESH) is defined as z S k (G) = z S J k (G).Note that with the considerations above it does not make sense to include the SESC for the whole graph G into (T n+1 ), as this SDP does not yield an upper bound on α(G), because all solutions corresponding to α(G) are not feasible.Hence, we introduce a hierarchy based on SESCs only starting from (T n ) and not from (T n+1 ).
Additionally, note that a priory we do not know whether the SESH has as nice properties as the ESH and the CESH.

Comparison of the SESH and the CESH
The next lemma is the key ingredient to compare the SESH to the CESH.Lemma 6.Let G = (V, E) be a graph.Then X ∈ SSTAB 2 (G) holds if and only if X ∈ STAB 2 (G) and trace(X) 1.
If X ∈ SSTAB 2 (G), then X can be written as for λ ∈ ∆ t and therefore X ∈ STAB 2 (G).Hence, X ∈ SSTAB 2 (G) implies that X ∈ STAB 2 (G) and trace(X) 1 holds.Now assume X ∈ STAB 2 (G) and trace(X) 1.Then X can be rewritten as We define λ i = λ i s T i s i for 2 i t and λ 1 = 1 − t i=2 λ i .Then clearly λ i 0 holds for 2 i t.Furthermore, (8) implies that λ 1 0 holds, so λ ∈ ∆ t .This together with Lemma 6 allows us to prove the following.
Theorem 2. Let G = (V, E) be a graph and let J be a set of subsets of V . Then J (G) can be replaced by the ESC X I ∈ STAB 2 (G I ) and trace(X I ) 1.The latter is redundant, as trace(X) = 1 is fulfilled by all X ∈ CTH 2 (G) and all elements on the main diagonal of X are non-negative because X 0. Thus, Theorem 2 implies that the SESH and the CESH coincide and in particular that the SESH has the same properties as the CESH stated in Lemma 2, which we now forumlate explicitly.
Hence, even though intuitively it makes more sense to add SESCs into (T n ) instead of ESCs, both versions give the same bound and the SESH and the CESH coincide.

Computational Comparison
In the previous sections we have theoretically investigated first the original ESH, which starts from (T n+1 ) and includes ESCs.Next, we introduced the CESH, which starts from (T n ) and includes ESCs and finally the SESH which starts from (T n ) and includes SESCs.Each of these hierarchies can be exploited computationally by including a wisely chosen subset J of all possible ESCs or SESCs.We denote the resulting bounds based on the ESH, the CESH and the SESH by z E J (G), z C J (G) and z S J (G), respectively.So far we have proven in Lemma 1 and Theorem 2 that z S J (G) = z C J (G) z E J (G) holds for all graphs G and for all set of subsets J, hence the bounds based on the CESH and the SESH coincide and the bounds based on the ESH are always as least as good as those bounds.
In this section we compare the ESH and the CESH computationally.We refrain from computations with SESH since both the obtained bounds and the sizes of the SDPs are the same for SESH and CESH.First, we are interested in whether z E J (G) is significantly better than z C J (G).Second, we are interested in the running times.In theory, the running times for z C J (G) should be smaller, because the matrix variable is of order n instead of n + 1 and the number of equality constraints is n less.
We consider several graphs in various settings.Some graphs are from the Erdős-Rényi model G(n, p) for different values of n and p (the probability that an edge is present in the graph), some are complement graphs of graphs of the second DIMACS implementation challenge [10] and some come from the house of graphs collection [5].Furthermore, there is a spin glass graph (see [12]), a Paley graph, a circulant and a cubic graph among the instances.In the computations we always compare including all ESCs of the same set J into (T n+1 ) and (T n ), so we compute z E J (G) and z C J (G).The source code and all the used graphs are available online at https://arxiv.org/src/2003.13605/anc.
All computations are done on an Intel(R) Core(TM) i7-7700 CPU @ 3.60GHz with 32 GB RAM with MATLAB.We use the interior point solver MOSEK [26] for solving the SDPs.Note that there is a lot of research on how to solve SDPs of the form (2) much faster using the bundle method, see Gaar [13] and Gaar and Rendl [14,15].We refrain from using these involved methods, as we are interested in comparing the bounds in a simple way.
In the first experiment, we compare levels of the ESH and the CESH.For including all possible ESCs of order k into a graph of order n we have n k additional ESCs to the SDPs (T n+1 ) and (T n ), so these computations are out of reach rather quickly.Table 3 summarizes the values of z C J (G) and z E J (G) for including all ESCs for k ∈ {0, 2, 3, 4} and presents the running times in seconds to solve the corresponding SDPs.
First, we note that indeed the computation of ϑ(G) (corresponds to the column k = 0) yields the same value for computing it via (T n+1 ) and (T n ).Furthermore, the computations confirm that z E J (G) z C J (G) holds for all graphs G. On the second level of the ESH and the CESH the two values coincide for almost all graphs.Only the instances HoG 34272, HoG 34274 and HoG 34276 show a significant difference.On the third and fourth level the difference is more substantial.This is not surprising, as there are more inhomogeneous facets defining STAB 2 in these cases.In the running times there is almost no difference for small graphs with not so many ESCs.Only if the number of ESCs becomes larger, typically the computation time for z C J (G) is significantly shorter.However, most of the times this comes with a worse bound.
Computing the k-th level of the ESH and the CESH by including all ESCs of order k is beyond reach rather soon, so in the next experiments we want to include the ESCs only for some subgraphs of a given order k.In order to determine the set J of subgraphs for which to include the ESCs we follow the approach of Gaar and Rendl [14,15].In particular, we start with J = ∅ and iteratively solve an SDP for computing the Lovász theta function (either (T n+1 ) and (T n )) with the already determined ESCs induced by J. Then we use the optimal solution of the SDP in order to search for violated ESCs.To find potentially violated subgraphs we perform a heuristic search among all subgraphs that tries to minimize the inner product of the optimal solution corresponding a subgraph and certain matrices (e.g., matrices that induce facets of STAB 2 (G 0 k )).We refer to [14,15] for more details.We perform 10 iterations with including at most 200 ESCs of order k in each iteration, so in the end for each graph and for each k we have a set J of at most 2000 ESCs.Of course it makes a difference whether we do the search starting from (T n+1 ) and (T n ) as different subgraphs might be violated.We denote by J E and J C the set of subsets obtained by using (T n+1 ) and (T n ) in order to search for violated subgraphs.The used sets J E and J C are available online at https://arxiv.org/src/2003.13605/anc.Table 4 summarizes the cardinalities of J E and J C .The values of z E J (G) and z C J (G) and the running time for the sets J = J E can be found in Table 5 and Table 6.The analogous computational results when considering J = J C are presented in Table 7 and Table 8.First, observe in Table 4 that the cardinality of J C is typically larger compared to the cardinality of J E .This is plausible, because due to the additional row and column in (T n+1 ) and the SDP constraint in this formulation some ECSs might be satisfied, which are violated in the version with (T n ).
When we turn to the values of z E J (G) and z C J (G) in Table 5 and Table 7 we observe for both J = J E and J = J C that (a) the larger k becomes, the better the bounds are, (b) for k = 0, so for computing ϑ(G), we have z E J (G) = z C J (G) as expected, (c) for a fixed set J we have z E J (G) z C J (G) in accordance with the theory derived earlier and (d) typically the difference between z E J (G) and z C J (G) increases with increasing k.This behavior is observable for both J = J E and J = J C , hence the choice of the set J has no significant influence on the behavior of the values of z E J (G) and z C J (G).However, we observe that usually the values of z E J (G) for J E are the best bounds, then z E J (G) for J C are the second best bounds, z C J (G) for J C are the third best bounds and z C J (G) for J E yields the worst bounds -even if the differences are typically very small.This behavior is not surprising, because we know that for a fixed set J we have z E J (G) z C J (G) and it makes sense that the final bounds obtained are better when using the same formulation of ϑ(G) to obtain the bounds that was used to obtain J.
Looking at the running times in Table 6 and Table 8 we see that our expectations are not met: Even though the order of the matrix variable and the number of constraints of the SDP to compute z C J (G) are smaller than those to compute z E J (G), the running times are typically larger.So apparently the highly sophisticated interior point solver MOSEK can deal better with z E J (G).If we compare the running times for the set J = J E and J = J C we see that the running times for J E typically are shorter, but there are also instances (e.g., G 100 025 for k = 6) where the computation of both z E J and z C J is faster for J = J C than it is for J = J E .
As a result, we confirm that tightening the Lovász theta function towards the stability number with the help of ESCs typically works better when starting from the Lovász theta function formulation (T n+1 ) (as it is done in the ESH) as it does when starting with the formulation (T n ) (as it is done in the CESH), even though this is not obvious at first sight as the latter SDPs are smaller.However, in some cases it can be advantageous to use the CESC, but then also the subset J should be determined using (T n ).

Conclusions
In this paper we derived two new SDP hierarchies from the Lovász theta function towards the stability number.The classical ESH from the literature starts from the SDP (T n+1 ) and adds ESCs.We introduced the new CESH starting from (T n ) and including ESCs.We proved that this new hierarchy has some same properties as the ESH.Moreover, we showed that the bounds based on the ESH are at least as good as those from the CESH -not only for including all ESCs of a certain order, but also for including only some of them.
We also newly introduced SESCs, which are a more natural formulation of exactness for (T n ).Including them into (T n ) yields the new SESH.Even though SESCs are more intuitive, the bounds based on the CESH and the SESH coincide.
In our computational results with an off-the-shelve interior point solver we typically obtain the best bounds with the fastest running times when using the ESH.However, for some instances using the CESH is beneficial.
It would be interesting to derive a specialized solver for the CESH as it was done by Gaar and Rendl [14,15] for the ESH.They dualize the ESCs, use the bundle method and instead of solving a huge SDP with all ESCs, they iterate and solve (T n+1 ) with a modified objective function in each iteration.Since (T n ) has a smaller matrix order and fewer constraints, this approach presumably works even better for the CESH.Such a solver allows to compare the running times for the ESH and the CESH in a more sophisticated way.
Another open question is the more precise relationship of the ESH and the CESH.In this paper we have shown that z C k (G) z E k (G) holds for all k ∈ {1, . . ., n}.It would be interesting to know if there is some constant ℓ 1 such that z E k (G) z C k+ℓ (G) holds for all graphs G and for all k ∈ {1, . . ., n}, so such that it suffices to add ℓ levels to the CESH to reach the quality of the ESH.
Finally, it would be interesting to investigate which implications it has for the ESH and the CESH to induce the positive semidefiniteness constraint not for the whole matrix X, but only for a submatrix of X like it has been done in the recent work [2].
Table 3: The values of z E J (G) and z C J (G) for different graphs G with including all ESCs of order 0 (corresponds to ϑ(G)), 2, 3 and 4 and the running times to compute the values

Table 1 :
The value of z E

Table 5 :
The values of z E J (G) and z C J (G) for different graphs G and sets J = J E for subgraphs of order k for k ∈ {0, 2, 3, 4, 5, 6}

Table 6 :
The running times for the results of Table5

Table 8 :
The running times for the results of Table7