Minimal Constraints in the Parity Formulation of Optimization Problems

As a means to solve optimization problems using quantum computers, the problem is typically recast into a Ising spin model whose ground-state is the solution of the optimization problem. An alternative to the Ising formulation is the Lechner-Hauke-Zoller model, which has the form of a lattice gauge model with nearest neighbor 4-body constraints. Here we introduce a method to find the minimal strength of the constraints which are required to conserve the correct ground-state. Based on this, we derive upper and lower bounds for the minimal constraints strengths. We find that depending on the problem class, the exponent ranges from linear $\alpha \propto 1$ to quadratic $\alpha \propto 2$ scaling with the number of logical qubits.

The overhead in qubits for all-to-all pair interactions is quadratic and this increased number of degrees of freedom is compensated by constraints. Arranging the physical spins on a 2D lattice, allows to construct the constraints from 4-local interaction on individual plaquettes consisting of 4 neighbouring spins. Figure 1 sketches the layout and the labeling of the physical qubits (i, j) and plaquettes [i, j], where the labels only run over pairs with i < j. The 4-body constraints ensure that the low energy sub-space of the physical system H phys is exactly the spectra of the logical Hamiltonian H logic by adjusting the constraint strengths c ij ∈ R. The constraints c ij have to be chosen large enough to separate the allowed logical subspace, i.e. states which do have a translation back into the logical picture, from states which do not have a counterpart in the logical model. The scaling of the constraint strengths is also crucial for the performance of the quantum annealing protocol [26].
In this paper, we determine the minimum constraint strengthsĉ ij that satisfy the lowest and first exited state of the problem Hamiltonian. In favour to reduce the magnitude of the constraint strengths we drop the requirement for a full separation between logical sub-spectra and the other eigenvalues. We show that finding the minimal constraints can be rewritten as a linear program. In the homogeneous setting c ij = c, we derive a series of upper and lower bounds to the optimal constraint strength c allowing to approximate the optimal values. Different classes of optimization problems are modelled by considering J ij as independent and identically distributed (i.i.d.) random variables with probability density function (pdf) f (µ, σ 2 ). In the case µ/σ → ±∞ we derive analytic solutions to the minimal constraint problem, which are naturally related to the problem of solving MaxCut on the complete graph K n or the total ferromagnetic problem respectively. By a simple argument the authors in [26] concluded, that in the antiferromagnetic case the constraints should grow at least linearly with the system size. We show, that in this case the constraints even have to grow quadratic in system size. Also for random J ij ∈ {−1, 1} the authors of [26] expect the constraints to scale linearly with the size of the problem. We find, for the case of µ/σ finite, the large size scaling of the ex-/2, as denoted in Eq. (1). If the corresponding strengths cij are chosen large enough, the original spectra is well separated from all the other eigenenergies (b , left). The required strengths can be lowered by allowing unwanted energy levels as low as the the first excited state e = l0 + ∆ l . For cij = c, finding the minimal strength c involves minimizing over subspaces with defined number of violated constraints and then taking the largest c. A systematic scheme to construct violating states is shown in panel (c) and (d). Starting from a state with no parity constraint violated and flipping the spins in the blue shaded region, one can construct all states with certain parity constraints violated.
pected optimal constraint strength is mainly determined by the sign of the expectation value µ. If µ is negative, the large size scaling becomes linear. Furthermore, in the case µ positive the scaling becomes quadratic. The point µ = 0 is interesting for symmetry reasons. By relying on results from extreme value theory we argue, that for standard Gaussian couplings, choosing the constraint strengths of order √ n log(n) could be enough to ensure that the physical ground state faithfully represents the logical ground state.

II. CONSTRAINTS
The constraints separate the subspace of allowed configurations from the unphysical subspace. The strength of the constraints have to be large compared to the energy of the local field energies in the system. In the following we derive both, upper and lower bounds for the constraint strength.

A. Minimal constraint problem
Our goal is to find the minimal strength of the constraints, such that lowest and first excited states of the physical and logical Hamiltonian coincide w.r.t. the parity translation. Intuitively, this can be understood as follows: In the extreme case, where the constraints are set to zero, each spin would point in the direction of the local field acting on the spin in order to minimize the energy of the system. On the contrary, if the constraints are infinitely large, these states are generally forbidden and the condition of an even number of spins up per plaquette is enforced.
We consider the case of finite constraints where the local field term and the constraint energies are competing. In this case it might be energetically favorable to violate a constraint for rearranging the spins with respect to their local fields. Our goal is to to find the lowest constraint energy such that this case can be ruled out. The minimal energy and the scaling w.r.t. the number of qubits depends on the statistics of the local fields which in turn is associated with classes of optimization problems. Therefore, we derive the minimal constraints for different classes of optimization problems.
We consider n logical spins [n] := {1, .., n} with all-toall connectivity. Thus, the system contains m := n(n − 1)/2 interactions that are mapped to m spins in the parity scheme. The interaction strengths can be viewed as weights on the edges of a complete Graph K n = (V n , E n ) with V n := [n] and E n := {(i, j) ∈ [n] 2 , i < j}. We label the physical spins with elements of E n [cf. Fig. 1(c)]. Plaquettes are labeled by elements from E n−1 and to distinguish them from sites, we replace the curly brackets ( ) with square ones [ ]. Furthermore, we denote the sample mean of a random variable X by X.
The space of physical states {−1, 1} m can be decomposed into a family of subspaces (S ω ) ω⊆En−1according to their pattern on individual plaquettes. In that sense S 0 := S {} should denote the logical subspace where the local constraints on every plaquette are satisfied.
More general, given a tuple ω ⊆ E n−1 of plaquettes, the subspaces S k are defined as states being simultaneous eigenstates to the stabilizers 1 2 if (i, j) ∈ ω and − 1 2 otherwise. Note, on the lowest row of plaquettes indexed with j = i + 1, the stabilizers P [i,j] are given by 3-local terms Fig. 1(c) we show an example for a state satisfying all constraints beside the one corresponding to the loop 23 − 34 − 42. That state belongs to in the subspace S [2,3] . Likewise Fig. 1(d) shows another state with two unsatisfied constraints i.e. being element of S { [2,3], [3,5]} .
The constraint strengths c ij can be chosen either to be all identical (homogeneous case) or we can individually set them to the optimal constraint strength for each plaquette. In the latter case, the objective cost function depends on all individual constraints cost(c 12 , ..., c n−1,n ), which in the linear case is the sum of the constraint strengths. Let us write H phys = H J + H c , with H J the part of the physical Hamiltonian related to the local fields and H c the term related to the constraints. We define the value a ω as the lowest eigenenergy of H J restricted to states belonging to the subspace S ω . With this definition, the problem of minimizing the constraints strengths can be written as a linear program: To this end, the costfunction has to be minimized under the restrictions where the value e denotes the first exited eigenenergy of the problem Hamiltonian.
In the homogeneous case c ij = c, the linear program Eq. (2) reduces to were a k := min{a ω : ω ⊆ E n−1 , |ω| = k} and q denotes the number of plaquettes. Here, the penalty Hamiltonian H c does not discriminate between states with the same number of parity constraints violated. Since a 1 is the lowest eigenenergy of H J w.r.t. the subspace , where a single parity condition is unsatisfied, the corresponding state gets a penalty of c. This penalty has to be chosen large enough to bridge the gap between a 1 and e, which explains the first term in Eq. (3). Similar to S 1 we define S 2 as the subspace of states with two unsatisfied parity constraints. a 2 is then given as the lowest eigenvalue of H J restricted to S 2 . Since all these states will penalised twice by H c , the strength of c has to be at least half the difference of a 2 and e. This explains the second term in Eq. (3). Finally, other cases with k > 2 follow by including states with more than two unsatisfied parity constraints. In general, every term appearing in Eq. (3) is of the form c −k := (e − a k )/k and can be seen as a lower bound for the optimal constraint strength. To get upper bounds, we consider the fact that the spectrum of a series of upper bounds can be derived according to Note, that we included the trivial bound 2|p 0 | and defined c 0 := e − p 0 in Eq. (5).

B. Single violator approximation
In order to make the problem numerically more accessible, we focus on the first lower bound c −1 rather than c. Thus, only states with one parity constraints violated are considered. We like to call them the single violators states [cf. Fig. 1(c)]. This is numerically well justified since for all models studied in this manuscript we observe the ordering [cf. Fig. 3 and Fig. 4].
It is easy to see, if the J ij are m i.i.d. random variables, the expected minimal constraint strength c cannot grow faster than quadratic in n, since by the central limit theorem we have for n → ∞, where N (µ, σ) denotes the normal distribution and µ abs and σ 2 abs are the mean and variance of the positive random variables |J ij |. Therefore, the trivial upper bound scales quadratic 2|p 0 | = Θ(n 2 ), and with c ≤ 2|p 0 | one further concludes that the minimal constraint strength cannot grow faster than quadratic in n i.e. c = O(n 2 ).
Furthermore, if J ij are i.i.d. random variables, with pdf f µ,σ 2 (x), the problem of determining the scaling of c does only depends on the ratio µ/σ. This can be seen by noting that a rescaling of the pdf f (x) → f (k −1 x) is equivalent to multiplying the random variables by a constant factor J ij → kJ ij . Hence, the strengths of the optimal constraints are multiplied by an overall factor of k whereas the functional dependency on the size, i.e. the scaling of the optimal constraints, is not affected. On the other hand, for each random variable it is true that (kJ ij ) = kJ ij and var(kJ ij ) = k 2 var(J ij ) i.e. rescaling of J ij does not alter µ/σ. In conclusion, the scaling of the optimal constraints can only depend on the ratio µ/σ.

III. RESULTS
Using the bounds Eq. (4) we evaluate the optimal constraints, for general ensembles of systems with different specific connectivity, bias and variance. In particular, the scaling of the average optimal constraint strength c with the system size for classes of problems. Let us first introduce two examples for typical optimization problems.
Let G = (E, V ) denote a simple graph. Then, the MaxCut problem asks for two disjoint sets of vertices V 1 and V 2 with V 1 ∪ V 2 = V , such that the number of cutting edges is maximal i.e. (e 1 , e 2 ) with e 1 ∈ V 1 and e 2 ∈ V 2 . As second example, the MinBisection (or graph-bipartitioning) problem for a graph with even number of nodes, requires to minimize the number of cutting edges while balancing the size of the two subsets These graph partitioning problems can be easily mapped onto an Ising problem by introducing one spin per node. The MaxCut problem can be reformulated as an antiferromagnetic Ising model, i.e. J ij = 1 for all (i, j) ∈ E, where the ground state corresponds to the solution of the optimization problem. If l 0 denotes the smallest eigenvalue of then the maximal cut is given by cut max = (−l 0 + |E|)/2. Similarly, the MinBisection problem can be encoded into an ferromagnetic Ising model with magnetization fixed to zero, i.e. J ij = −1 for all (i, j) ∈ E, with σ i z = 0. The corresponding Hamiltonian reads as where the second term of Eq. (12) guarantees, that the magnetization of the ground state is zero, given the energy penalty u is larger than min(4d max , n)/4, with d max the maximal degree of G [1].

A. Numerical Results
The general case of randomly distributed J ij values with a given bias µ and standard deviation σ can be treated numerically. In the following we investigate and compare three different distributions.
1.) Normal distribution N (µ, σ) with mean µ and variance  For our numerical results we sample from random instances of particular optimization problems, calculate their ground state and lowest single violator energies and interpolate the sample mean with a powerlaw fit. Finding a 1 involves minimisation over subspaces with a single parity defect. These single violator states we enumerate them by flipping spins starting from a state from the logical subspace [cf. Fig. 1(c)]. This allows us to do reasonable statistics up to sizes of n = 25. In the parity   picture this corresponds to m = 300 physical spins and q = 276 plaquettes.
Assuming i.i.d. random variables J ij = J, homogeneous constraint strengths c ij = c and single violator approximation c ≈ c −1 , the numerics in Figure 2 suggests that the scaling of the minimal constraint strength mainly depends on µ, the expectation value of J. The limits µ/σ → ±∞ are analytically well understood, showing a linear and a quadratic scaling respectively. Furthermore, at µ/σ = 0 all three analyzed distributions show a linear scaling, including the SK-SpinGlass model where J ij are standard normal distributed random variables [cf. Fig. 3]. Note, that this behaviour may only occur in small systems, as we will further elaborate in section III B(c).
As paradigmatic examples of combinatorial optimization problems -satisfying the assumption of independent random variables -we consider now two graph partitioning problems on random ErdsRnyi graphs, where each edge has a fixed probability p of being present. a. MaxCut: Solving MaxCut for random graphs, corresponds to independently choosing J ij with probability p to be either 1 or 0, respectively. Since all strengths fulfill J ij ≥ 0, it follows that µ/σ ≥ 0. Fig. 4 shows our numerical results for system sizes up to n = 25, where we find a sub-quadratic increase of c −1 with the system size. As we will see in the following analysis, this subquadratic scaling becomes quadratic for larger problem sizes. To this end we derive an efficiently calculable lower bound by utilizing a semidefinite program relaxation of finding the ground state configuration of Hamiltonian (8) with σ i z ∈ {−1, 1}. If opt sdp denotes the optimal value of the semidefinite program max (i,j)∈E 1 − X ij 2 , X ii = 1 ∀i ∈ [n], X 0 (10) then cut max ≤ opt sdp [27].
To be computationally efficient, we upper bound the minimal single-violator contribution a 1 ≤ a + 1 by sampling from quadratically many single-defect states. To achieve this, the set of vertices V = [n] is partitioned into three disjoint sets containing consecutive nodes A = {1, 2, ..., k}, B = {k + 1, ...j} and C = {j + 1, ..., n}, where all sets contain at least two elements. The lower right part of Fig. 5, demonstrates what such a partition of n = 6 nodes into sets A = {1, 2}, B = {3, 4} and C = {5, 6} looks like in the parity picture. Note, that this particular state has a single unsatisfied constraint at plaquette [2,4]. With l 0 + 2 ≤ l 2 and cut max = (−l 0 + |E|)/2 we can derive a lower bound on c −1 = l 2 − a 1 ≥ c −1,sdp by defining c −1,sdp := −2 · opt sdp + |E| + 2 − a + 1 . Figure 4 includes this lower bound c −1,sdp for the class of random graphs with edge probability p = 0.4 up to system sizes of n = 100. We find, that after a sub-quadratic increase, the growth rate of c becomes quadratic for large sizes n. b. MinBisection: It turns out, that the special case of MaxCut problems on the complete graph K n , is key to understand the behaviour of the minimal constraints in the MinBisection problem for large system sizes n. For fixed p, u scales at least linearly in n. Hence, the inverse u −1 faster approaches zero than n −1 . This means, if n is large enough is well approximated by the first term (i,j)∈En σ i z σ j z , i.e. equals the Hamiltonian for MaxCut problems, when restricted to the class of complete graphs K n . Therefore, for large n, the expected value of c −1 grows quadratic, independent of the choice of p, as we will further elaborate in the subsequent section [cf. Fig 4(b)].

Now we investigate two limiting cases (µ/σ → ±∞)
where the minimal constraint problem can be solved analytically. Furthermore, we discuss the case of normally distributed interaction strengths with µ = 0 and σ = 0.
a. Antiferromagnetic limit: For µ/σ → ∞ the minimal constraint problem can be connected to the Max-Cut problem on the complete graph K n . In this case all interactions are antiferromagnetic, and can be set to J ij = 1 due to the rescaling property mentioned above. Assuming n = |V | to be an even number of vertices, then the maximal cut is given by (n/2) 2 and thus, the lowest eigenvalue of the corresponding Ising Hamiltonian is given by l 0 = −n/2 and has a gap of 2 i.e. independent of n. For the instructive example of n = 3k (k ∈ N), the minimal single violator energy is given by a 1 = − n 2 1 + n 3 . Then the lower bounds c −i can be analytically found where the largest is given by The scaling of c = c −1 is therefore Θ(n 2 ) i.e. quadratic in n. For a graphical representation of the relevant states see Fig. 5.
b. Ferromagnetic limit: The limit µ/σ → −∞ is reached when setting J ij = −1 i.e. all pair interactions are ferromagnetic. In this case, the lowest eigenvalue is given by l 0 = −n(n − 1)/2 and shows a gap of 2(n − 1). In the parity picture, violating a single constraint can be done at minimal cost of a single spinflip. Starting from the ground state in S 0 flipping the spin (1, n) results in a state from S [1,n−1] with energy l 0 + 2 and thus c = c −1 = 2n − 4. The scaling of c is therefore Θ(n) i.e. linear in n. c. SK-SpinGlass: Of special interest is the case µ/σ = 0. The numerical results for c −1 as function of n are shown in Figure 2. The following arguments suggest that in the Gaussian case the scaling of c −1 goes as √ n log(n).
To begin with, we note, that for every fixed spin configuration σ z ∈ {−1, 1} n the eigenvalues of H logic ( σ z ) can be seen as a random variable w.r.t. the distribution determining the interaction strengths. If this distribution is Gaussian, then -according to the central limit theorem -the eigenvalues are also normally distributed with variance σ 2 = n(n − 1)/2. However, the 2 n different eigenvalues are clearly not independent. They are strongly correlated with covariance matrix [28] Note, that finding the minimum min σz H logic ( σ z ) is challenging due to the presents of these correlations.
In the case of independent random variables, the limiting order statistics can be classified into one of three universally classes according to extreme value theory [29]. Moreover, for m Gaussian variables {G 1 , ..., G m } the limiting distribution of their minimum M = min(G 1 , .., G m ) is given by the Gumbel distribution Gumbel(α, β) with Here e is the Euler constant and F −1 denotes the quantil function, which in the case of standard normal variables is given by the probit √ 2erf −1 (2p − 1). By fixing the distribution, the parameters α and β only depend on the number of variables m. We want to emphasise, that setting m = 2 n models all eigenvalues as independent. Since the factor σ −1 normalizes each eigenvalue, M ind := −σ(α + Γβ) is the expected minimal energy in the independent case (Γ is the Euler-Mascheroni constant).
Numerically we see, that choosing the eigenvalues to be independent, results (on average) in too small energies l 0 > M ind (m = 2 n ). However, we observe that by introducing a free parameter δ < 1 in order to decrease the number of realizations m = 2 nδ gives a reasonably good approximation l 0 ≈ M ind (m = 2 nδ ) for δ ≈ 0.798158 (4) [cf. Fig. 6] [30]. If one allows δ to be a function of n then the statement is trivial, but interestingly a constant δ gives rise to a good approximation. To which extend our analysis captures the large n behaviour is outside the scope of the present work. However, as we show in the appendix M ind ≈ − δ log(2)n 3 2 for large n. Since δ log(2) ≈ 0.743 this is in accordance with Parisis result [31,32] Hence, minimizing over all plaquettes gives the minimal single-violator energy a 1 . Due to symmetry, −J ij is again a standard normal Gaussian variable and therefore the eigenenergies associated to single violators are distributed according to N (0, σ 2 ). Similar to Eq. (14) these eigenstates are highly correlated, but nevertheless we find a numerically well justified approximation a 1 ≈ M ind (m = 2 nδ n(n + 1)/12), with δ as above. Here, the quadratic terms count the average number of spins that have to be flipped to induce a parity defect according to Fig. 1(c). As we show in appendix A, the energy difference between single violator ground states and logical ground state l 0 − a 1 scales to leading order in n as f 1 (n) := 1 2 δ log(2) √ n log n(n + 1) 12 .
Since the gap ∆ l can be neglected for large n, the scaling of c −1 ∝ l 0 − a 1 [cf. Fig. 6]. Similar arguments can be used for the scaling analysis of the remaining lower bounds 1 k (l 0 − a k ). Here, the subspace S k spanned by states with k parity constraints violated has O(n 2k ) elements.
Setting p k (n) to a polynomial of order 2k and modelling a k as 2 δn p k (n) i.i.d. Gaussian variables results in a scaling of c −k as 1 k √ n log(n 2k ) ∝ √ n log(n) analog to Eq. (18).
d. General case: Finally, let us discuss the case when J ij are neither centered distributed random variables nor close to the limits µ/σ = ±∞. In general the bound c −1 depends on the lowest eigenvalue l 0 , the gap ∆ 1 and the smallest single violator energy a 1 .
If µ = 0, the distributions of eigenenergies of the logical Hamiltonian are shifted in contrast to the previously considered case of µ = 0. As an example we consider in the following Gaussian distributed couplings J ij ∼ N (µ, 1). Our argument builds on the fact that two normal variables with parameters (µ 1 , σ 2 1 ) and (µ 2 , σ 2 2 ) add up to a single normal variable N (µ 1 + µ 2 , σ 2 1 + σ 2 2 ). Thus, the eigenvalue corresponding to the all-ones logical state (1, ..., 1) is distributed via N (µm, m), where m denotes the number of physical spins. Likewise, the distribution of a state with equally many −1 and +1 is centered around zero ∼ N (0, m). More generally, for fixed k, there are 2 n k combinations of eigenvalues, where every combination is distributed according to a Gaussian with mean and variance m. This 'splitting' is the reason for the different behaviour of the smallest eigenvalue in the two cases µ > 0 and µ < 0. In the case µ > 0, the probability that the lowest energies originate from one of the 2 2 n/2 states centered around zero is largest. Since these are exponentially many states, we assume the expectation value for the ground state energy to be similar to the µ = 0 case. The single violator ground state for the limit µ/σ → ∞, denoted by φ ∞ [cf. Fig. 5(bottom)], is Gaussian distributed w.r.t. J ij with mean −µn 2 ( 1 6 + O(n −1 )) and variance m. Therefore, the single violator ground-state energy lies on average quadratically deeper than the logical ground state energy.
On the contrary, for µ < 0, the larger |µ| is, the more likely it is, that the smallest eigenvalue is one of the ferromagnetic states (1, ..., 1) or (−1, ..., −1). As shown, in the total ferromagnetic case, the gap ∆ l grows linearly with n. Since l 0 − a 1 = 2 this linear increase is the main contribution to c −1 . In the large-n limit we expect that the gap scales linearly. Since our numerical simulation is limited up to n = 25, we see this linear behaviour only if |µ| is large enough. As one can further see in Fig. 2 as µ gets smaller, the scaling exponent drops due to the fact that the expected difference l 0 − a 1 tends to get smaller 8 for more negative µ. However, by further increasing |µ| the linear scaling of the gap becomes apparent.

IV. DISCUSSION
We have shown numerically and analytically the scaling of the minimal constraint strengths in the parity based encoding for various classes of optimization problems. In the parity scheme, the optimization problem is encoded in the local fields only, and thus the classes of problems differ only in the statistics of the local fields. We have shown that the large size scaling is mainly determined by the sign of the bias µ. The observed differences stem from two distinct effects: (a) linear growth of the gap and (b) quadratic deep lying single violator states. While the scaling of the gap is the important factor for problems with predominantly negative couplings, the properties of the single violator states govern the regime with predominately positive couplings. Thus, the large size scaling of the expected constraint strengths grows linearly or quadratically with the system size in the respective cases.
One special point in between these regimes arises when the couplings are centered with mean zero. Finally, a non-rigorous analysis suggests a sub-linear behaviour given by √ n log(n). We want to emphasize, that the minimization of the constraint strengths could have direct influence on the minimal gap during AQO. In the QAOA setting, the constraint values could serve as additional variational parameters. In the latter case the optimized values can serve as good initial choice of these parameters.