Evidence of scaling advantage for the quantum approximate optimization algorithm on a classically intractable problem

The quantum approximate optimization algorithm (QAOA) is a leading candidate algorithm for solving optimization problems on quantum computers. However, the potential of QAOA to tackle classically intractable problems remains unclear. Here, we perform an extensive numerical investigation of QAOA on the low autocorrelation binary sequences (LABS) problem, which is classically intractable even for moderately sized instances. We perform noiseless simulations with up to 40 qubits and observe that the runtime of QAOA with fixed parameters scales better than branch-and-bound solvers, which are the state-of-the-art exact solvers for LABS. The combination of QAOA with quantum minimum finding gives the best empirical scaling of any algorithm for the LABS problem. We demonstrate experimental progress in executing QAOA for the LABS problem using an algorithm-specific error detection scheme on Quantinuum trapped-ion processors. Our results provide evidence for the utility of QAOA as an algorithmic component that enables quantum speedups.


INTRODUCTION
Quantum computers have been shown to have the potential to speed up the solution of optimization problems.At the same time, only a small number of algorithmic primitives are known that provide broadly applicable speedups.These include amplitude amplification 1,2 and quantum walks more generally, 3,4 as well as the recently introduced short path algorithm. 5,6he dearth of provable speedups in quantum optimization motivates the development of heuristics.A leading candidate for demonstrating a heuristic speedup in quantum optimization is the quantum approximate optimization algorithm (QAOA). 7,8QAOA uses two operators applied in alternation p times to prepare a quantum state such that, upon measuring it, a high-quality solution to the problem is obtained with high probability.A pair of such operators is commonly referred to as one QAOA "layer."The state is evolved with a diagonal Hamiltonian encoding the optimization problem by the first operator and with a non-diagonal transverse-field Hamiltonian by the second operator.In this work, we consider the evolution times to be hyperparameters that are set by using a fixed, predetermined rule, analogously to the choice of a schedule in simulated annealing.
While QAOA has been studied extensively, [9][10][11][12] little is known about its potential to provide a scaling advantage over classical solvers.A recent numerical study 11 of random k-SAT with N ≤ 20 variables has shown that the time-to-solution (TTS) of QAOA with fixed param-eters and constant depth grows as 1.23 N .When QAOA is combined with amplitude amplification, the quantum TTS grows as 1.11 N , 11 whereas the best classical heuristic has TTS that grows as 1.25 N . 11Our work is motivated by this preliminary numerical evidence on small instances, which indicates that QAOA may potentially scale better than classical solvers when executed on an idealized quantum computer.
We study the scaling of QAOA TTS with the problem size on the Low Autocorrelation Binary Sequences (LABS) problem, 13,14 also known as the Bernasconi model in statistical physics. 15,16The LABS problem has applications in communications engineering, where the low autocorrelation sequences are used for designing radar pulses. 13,17To solve LABS, one has to produce a sequence of N bits that minimizes a specific quartic objective.
We choose LABS to study the scaling of QAOA TTS for the following three reasons.First, the complexity of LABS grows rapidly, with optimal solutions known only for N ≤ 66 and the best heuristics producing approximate solutions of quality decaying with N for N ⪆ 200. 18,19This makes it a promising candidate problem, since only a few hundred qubits are required to tackle classically intractable instances.Second, the performance of classical solvers for LABS has been benchmarked 18,19 in terms of the scaling of their TTS with problem size.We reproduce these results and observe that that the scaling of classical solvers at N ≤ 40 matches the behavior at large N reported in the literature.This provides evidence that the scaling we observe for QAOA at N ≤ 40 will similarly extrapolate to large N .Third, LABS has only one instance per problem size N .Combined with the hardness of LABS, this makes it possible to reliably study the scaling of QAOA at large problem sizes, where simulating tens or hundreds of random instances would be computationally infeasible.
We obtain the scaling by performing noiseless exact simulation of QAOA with fixed schedules.Our results are enabled by a custom algorithm-specific GPU simulator, 20 which we execute using up to 1,024 GPUs per simulation on the Polaris supercomputer accessed through the Argonne Leadership Computing Facility.We find that the TTS of QAOA with number of layers p = 12 grows as 1.46 N , which is improved to 1.21 N if combined with quantum minimum-finding.This scaling is better than that of the best classical heuristic, which has a TTS that grows as 1.34 N .We note that QAOA is a general quantum optimization heuristic with broad applicability, and no specific modifications have been done to adapt it to the LABS problem.
Our numerical evidence indicates that the proposed quantum algorithm scales better than the best classical heuristic in an idealized setting.However, we do not claim that QAOA is the best theoretically possible algorithm for the LABS problem.In particular, it may be possible to quadratically accelerate the best-known classical heuristic (Memetic Tabu 21 ) by applying ideas similar to those used in quantum simulated annealing. 3,22,23onetheless, our results highlight the potential of QAOA to act as a useful algorithmic component that enables super-Grover quantum speedups.
As a first step toward execution of QAOA for the LABS problem, we implement QAOA on Quantinuum trappedion quantum processors 24,25 on problems with up to N = 18.We further implement an algorithm-specific error detection scheme inspired by Pauli error detection 26,27 and demonstrate that it can reduce the impact of noise on solution quality by up to 65%.Our experiments highlight the continuing improvements to quantum computing hardware and the potential of algorithmspecific techniques to reduce the overhead of error detection and correction.

PROBLEM STATEMENT
We begin by formally defining the LABS problem, discussing the state of the art of classical LABS solvers, and describing how QAOA is applied to solve the problem.
For a given sequence of spins s i ∈ {±1}, the autocorrelation is given as The goal of the LABS problem is to find a sequence of spins that minimizes the so-called "sidelobe" energy, or, equivalently, maximizes the merit factor The time-to-solution (TTS) is defined as the time a solver takes to produce this sequence.The energy E sidelobe (s) is a polynomial containing terms of degree 2 and 4, visualized in Fig. 1a.It can be encoded on qubits by the following Hamiltonian: where z j is a Pauli z operator acting on qubit j.
The runtimes of state-of-the-art classical solvers for the LABS problem scale exponentially, with clear exponential scaling present at N ≤ 40 as shown in Fig. 1b.The best-known exact solvers are branch-and-bound methods that have a running time that scales as 1.73 N . 19The bestknown heuristic for general LABS is tabu search initialized with a memetic algorithm (Memetic Tabu) 21 , and has a running time that scales as 1.34 N . 28o see why LABS is harder to solve than other commonly studied problems such as MaxCut, we can examine the correlation between the Hamming distance to the optimal solution and the objective.The comparison is shown in Fig. 1c.This correlation is one example of problem structure used by both classical and quantum heuristics to solve the problem quickly. 7The absence of this correlation highlights the hardness of LABS compared with other commonly considered problems such as MaxCut.
As a consequence of the exponential scaling, the LABS problem becomes classically intractable at moderate sizes.Specifically, the value of the best-known merit factor decreases significantly for high N , whereas the asymptotic limit predicts that the merit factor should stay approximately constant. 29This failure of state-ofthe-art heuristics has been observed for N > 200. 18,19he clear failure of the classical method to obtain highquality solutions even at small sizes makes LABS an appealing candidate problem for quantum optimization heuristics. 29n this work, we tackle the LABS problem using QAOA.As shown in the circuit diagram Fig. 1d, QAOA solves optimization problems by preparing a parameterized state   I).c, The distribution over 21 ≤ N ≤ 31 (for LABS) and 34 random instances (for MaxCut on random 3-regular graphs with 20 nodes) of Pearson product-moment correlation coefficients relating the Hamming distance of bitstrings from the optimal solution with the objective value of the bitstring.LABS has a much lower correlation between the Hamming distance and objective, indicating that it is much harder than the commonly considered MaxCut problem.d, Diagram of QAOA circuit for a 5-qubit example.Starting from a uniform superposition of the computational basis states, we apply p layers of phase and mixing operators, followed by measurement in the computational basis.
where |+⟩ ⊗N is a uniform superposition over computational basis states, H C is the diagonal Hamiltonian encoding the problem, and x j is a Pauli x operator acting on qubit j.The operator e −iγH C is commonly referred to as the phase operator and e −iβ N j=1 xj as the mixing operator.The evolution times β, γ are hyperparameters chosen to maximize some figure of merit, such as the expected quality of the measurement outcomes or the probability of measuring the optimal solution.While β, γ can be optimized independently for each problem size, we consider them to be hyperparameters and use one fixed set of parameters for the LABS problem with a given QAOA depth p regardless of size.The fixed set of parameters is obtained by optimizing β, γ numerically for a number of small problem sizes and introducing an averaging and rescaling procedure to extrapolate parameters to any problem size (see the Methods section).
When choosing the parameters β, γ and evaluating the quality of the solution obtained by QAOA, two figures of merit are commonly considered.The first one is the expected merit factor of the sampled binary strings, given by We will refer to ⟨C⟩ MF as the "QAOA energy" as a short-hand.The second figure of merit is the probability of sampling the exact optimal solution, denoted by p opt and equal to the sum of squared absolute values of amplitudes of basis states corresponding to exactly optimal solutions.
In the numerical experiments below, we follow the protocol of Ref. 11 and focus on scaling of the QAOA TTS with problem size N as the QAOA depth p is held constant.QAOA TTS is defined as 1 p opt , i.e. the expected number of measurements required to obtain an optimal solution from the QAOA state.Ref. 11 rigorously shows that, for random k-SAT, the runtime of constant-depth QAOA grows exponentially with N at any fixed p, with the scaling exponent depending on p.While the nature of the LABS problem makes it difficult to obtain analytical results analogous to Ref. 11, our numerical results also show clear exponential scaling of TTS.We note that, in practice, TTS of QAOA is Θ(N 2 ) 1 p opt , where the Θ(N 2 ) prefactor comes from the cost of implementing the LABS phase oracle. 30However, we do not include it in our analysis because it does not affect the scaling exponent. .The gain at p = 1 is over the random guess.Only one line is plotted for amplitude amplification since the lines for the values of N considered are visually indistinguishable.For small p, a QAOA layer gives orders of magnitude larger gain than a step of AA. e, Fixed QAOA parameters for p = 30 chosen with respect to the QAOA energy ⟨C⟩MF ("MF") and probability of sampling the optimal solution ("p opt ").Different choice of optimization objective gives different resulting parameters.f.Probability of obtaining a binary string corresponding to a given energy level of the LABS problem (the zeroth energy level is the ground state or optimal solution; lower is better).When parameters are optimized with respect to the expected merit factor (labeled "MF"), the QAOA output state is concentrated around the mean and fails to obtain a high overlap with the ground state.On the other hand, when parameters are optimized with respect to p opt (labeled "p opt "), the QAOA state has a high overlap with both the ground state and higher energy states.The probability of obtaining the ground state is 27.3 times greater for QAOA with parameters optimized with respect to p opt at p = 40.

SCALING OF QUANTUM TIME-TO-SOLUTION FOR LABS PROBLEM
We now present the numerical results demonstrating the scaling of TTS of QAOA and QAOA augmented with quantum minimum-finding ("QAOA+QMF").The results are summarized in Table I.Throughout this section, we present the numerical results obtained using exact noiseless simulations.The runtime scaling is obtained by evaluating QAOA once with fixed parameters β, γ (i.e., with no overhead of parameter optimization) and computing the value p opt with high precision.We discuss the parameter setting procedure and the details of simulation in the Methods section.
We are interested in the scaling of the runtime of QAOA for large problem sizes N .An important question to address is the choice of the smallest N to include in the scaling analysis, since the algorithm's behavior at small sizes may not be predictive of its behavior at large sizes.Note that the largest N we include is limited by the capability of the classical simulator.We use the quality of the fit as the criterion for the choice of the cutoff on N .Figure 2a shows that if we set the cutoff at N ≥ 28, we obtain a robust high-quality fit (R 2 > 0.94), with the quality of the fit remaining stable as p grows.On the other hand, if smaller N are included, the quality of fit begins to decay with p. Therefore we include only N ≥ 28, obtaining the fit presented in Fig. 2b.We observe that TTS of QAOA grows as 1.46 N with problem size at constant QAOA depth p = 12.We present evidence that the scaling exponent for QAOA at p = 12 is not sensitive to the choice of N min in Supplementary Information. 29 a quantum optimization heuristic with constant depth, on a fault-tolerant quantum computer the QAOA performance can be improved by using amplitude amplification 11,30 or, more specifically, quantum minimumfinding 31 (see Methods).The resulting scaling of TTS of QAOA augmented with quantum minimum-finding ("QAOA+QMF") is 1.21 N .We observe that, beyond a certain value (p ≈ 12), increasing QAOA depth does not lead to better scaling of TTS.This behavior is demonstrated in Fig. 2c  19) as well as the much shorter time to find an optimal solution (TTS).We observe that QAOA with constant depth of p = 12 augmented with quantum minimum-finding ("QAOA+QMF") has better time-to-solution scaling than the best known classical heuristics.
not give any scaling advantage over amplitude amplification.This behavior is illustrated in Fig. 2d, which shows the increase in the success probability p opt from applying a given step of QAOA and amplitude amplification.For amplitude amplification, at step p we have p opt = sin((2p + 1) arcsin √ p 0 ) 2 , where p 0 = 8 2 N is the initial (random guess) success probability. 32Note that the 8 in the numerator is a consequence of a dihedral group symmetry, namely, D 4 .While asymptotically equivalent, amplitude amplification performs better than a realistic generalized minimum-finding algorithm, 31 as the formula used here considers the scenario where we know which states to amplify (i.e., the optimal merit factor is known).We observe that for small p, a step (layer) of QAOA gives orders of magnitude larger increase in success probability than does a step of amplitude amplification, implying an even larger improvement over direct application of quantum minimum-finding.We provide additional details on comparison between QAOA and amplitude amplification in the Supplementary Information. 29 observe that the QAOA dynamics with parameters optimized for expected solution quality ⟨C⟩ MF and success probability p opt are different.We plot the optimized parameters in Fig. 2e.We note that the parameters optimized with respect to one metric give performance that is far from optimal with respect to the other metric.This can be seen in Fig. 2f, which plots the energy distribution (with respect to the cost Hamiltonian) of the states appearing in the QAOA wavefunction weighted by probability.With the parameters optimized for ⟨C⟩ MF , the QAOA output distribution is concentrated around its mean, and the overlap with the ground state or p opt is very small.On the other hand, when the parameters are optimized with respect to p opt , the wavefunction is not concentrated and has large probability weight on the target ground state (i.e., high p opt ).This comes at the cost of significant overlap with high-energy states, which leads to poor expected solution quality.In the Supplementary Information 29 , we discuss the behavior of QAOA with parameters optimized with respect to different objectives.

EXPERIMENTS ON TRAPPED-ION SYSTEM
We now present the experimental results demonstrating the algorithmic and hardware progress toward the practical implementation of QAOA.Implementation of the phase operator is especially challenging for currently available quantum processors.It requires a large number of geometrically nonlocal two-qubit gates, demanding high gate fidelity.
Recent progress in trapped-ion platforms based on the QCCD architecture 24,25,29 has led to a rapid increase in the number of qubits while maintaining high fidelity, enabling large-scale QAOA demonstrations. 33,34These systems implement two-qubit gates between arbitrary pairs of qubits by transporting ions into physically separate gate zones, resulting in high-fidelity two-qubit gates with low crosstalk.We leverage this progress to execute QAOA circuits for the LABS problem on Quantinuum H-series trapped-ion systems.
To implement the QAOA circuit shown in Fig. 1d, we have to implement the phase operator.The four-body terms in the phase operator are decomposed into cnot gates and the native R zz (θ) = e −i θ 2 zz rotation as shown in Fig. 3a.To reduce the cost of implementing both the twoqubit and four-qubit interaction terms, we optimize the circuit by greedily canceling cnot gates (for algorithm details and gate count reduction see the Supplementary Information 29 ).The resulting circuit containing cnots and R zz s is then transpiled into the two-qubit R zz gates and single-qubit gates that can be natively implemented by the trapped-ion system. 29][35][36][37] In this work we execute QAOA circuits with p = 1 using parameters β, γ optimized in noiseless simulation, followed by a projective measurement in the computational basis.In Fig. 3b, we show the energy probability distribution of measured bitstrings for N = 13.We observe a broad distribution due to the limited number of layers and experimental imperfections.Nevertheless, even at high N , where two-qubit gate count is high and the gate errors can be significant, we observe a clear signal that indicates that QAOA is outperforming random guess.This is shown in Fig. 3c, which presents the ex- perimentally obtained expected merit factors for various problem size up to N = 18.We note that the merit factor drops quickly for larger N and is approaching random guess because of experimental imperfections.We also note that at this scale LABS is easy for classical heuristics, which obtain optimal merit factors in < 1 second. 29Implementing QAOA for LABS instances that are hard for classical solvers would likely require error correction as the current implementation leads to an estimated two-qubit gate count of ≈ 7.5 × 10 5 already at N = 67 and p = 12. 29o improve the performance in the presence of noise, we implement an algorithm-specific error detection scheme.Since only the phase operator requires twoqubit gates, we focus on detecting errors that occur in the corresponding part of the circuit.Our scheme is based on the Pauli sandwiching error-detecting procedure of Ref. 26, which uses pairs of parity checks to detect some but not necessarily all errors that occur in a given part of the circuit.Following Refs.38 and 39, we use the symmetries of the optimization problem to construct the parity checks.Specifically, we note that the LABS Hamiltonian preserves both z and x parities, that is, We compute the parities onto ancillary qubits and perform mid-circuit measurement to determine whether an odd number of z-or x-flip errors occur during the circuit execution.The circuit with one check is shown in Fig. 3d.In the hardware experiments shown in Fig. 3e, we use up to three parity checks and observe consistent improvements in QAOA performance after postselecting on their outcomes.After postselection, the difference of merit factor between experimental results and noiseless simulation is reduced by 54% on average and up to 65% for specific N .In the Supplementary Information 29 we present additional details on the error-detecting scheme performance, including how performance improves with the number of parity checks and the reduction in the algorithm runtime.We note that while error detection does not directly give samples with better merit factors, the potential improvement in runtime can be translated into performance gains at the algorithm level, for example by being able to take more samples within a given time budget. 29In our experiments, in all but two cases the optimal bitstring could be found within the post-selected sample, and in all cases within the total sample. 29

DISCUSSION
Our main finding is that quantum minimum-finding enhanced with QAOA scales better than the best known classical heuristics for the LABS problem.This provides evidence for the potential of QAOA to act as a building block that provides algorithmic speedups on an idealized fault-tolerant quantum computer.We envision QAOA being used in a variety of algorithmic settings, similarly to how amplitude amplification acts as a subroutine in quantum algorithms for backtracking, branch-and-bound and so on.
We take the first step toward the execution of QAOA for the LABS problem by implementing an algorithmspecific error-detection scheme on a trapped-ion quantum processor.However, further improvements in quantum error correction and hardware are necessary to implement the quantum minimum-finding augmented with QAOA.In particular, the overheads of fault-tolerance 40 must be significantly reduced to realize the quantum speedup.

Quantum minimum-finding enhanced with QAOA
In this work, we present the scaling results for QAOA combined with amplitude amplification (AA), or, more specifically, with quantum minimum-finding ("QAOA+QMF" in Table I).This reduces the scaling exponent by half as compared to directly sampling QAOA output.We now discuss in detail how QAOA is combined with the generalized quantum minimum-finding algorithm of Ref. 31 to obtain the stated scaling.
We begin by noting that standard AA is not sufficient.This is because the LABS problem is framed as optimization and not search, i.e. there is no oracle for marking a global minimum.The trick for handling optimization is to perform a standard reduction from optimization to feasibility.The reduction is performed by introducing a threshold on the cost as a constraint and performing a binary search using AA as a subroutine.The oracle used by AA marks the elements below the current threshold.This reduction was first introduced by Dürr and Høyer (DH). 1 However, the quantum minimum-finding algorithm of Dürr and Høyer utilizes standard Grover search, i.e. it requires the initial state to be the uniform superposition.A modification to it is required to leverage the improved success probability afforded by QAOA.
Ref. 31 provided a simple extension of DH that allows arbitrary initial states, with the overall cost scaling inversely with overlap between the initial state and state encoding the optimal solution.We leverage this extension in our quantum algorithm.We use constantdepth QAOA to prepare the initial state for the quantum minimum-finding algorithm.As QAOA state has overlap with the optimal state that is much larger than that of uniform superposition 29 and scales more favorably, we obtain better performance than the direct minimumfinding of Dürr and Høyer.Specifically, we provide numerical evidence that our algorithm obtains a super-Grover speedup over exhaustive search for the LABS problem, and scales better than the best known classical heuristics.We present our modification to include QAOA for outputting an optimal solution x * to the LABS problem in Algorithm 1 below.It is based on the generalized minimum-finding procedure outlined in Lemma 48 of Ref. 31.To keep the current work self-contained, we include the analysis of the algorithm below.We will use the following standard quantum subroutine based on Grover search that searches for an element with unknown probability in a quantum state.xres is set to an empty list.
Note that mt can be coherently evaluated using one query to VLABS.encodes an optimal solution to the N -bit LABS problem in a computational basis state, and we assume that p opt ≥ 1/N .Then, running Algorithm 1 with parameters M ≥ 1/ √ p opt and failure probability δ, runs with a gate complexity of O(poly(N ) log(1/δ)M ) and finds x * with probability at least 1 − δ.
Proof.See Supplementary Information. 29oice of QAOA parameters β, γ Our strategy for setting the QAOA parameters β, γ used in our experiments is twofold.First, we optimize QAOA parameters for small N using the FOURIER reparameterization scheme of Ref. 9. Second, we use the optimized parameters for small N to compute fixed QAOA parameters that are then used for larger N .To apply the fixed parameters to an instance with a given size N , we rescale the parameters γ by N . 29We discuss the parameter optimization scheme and the parameter rescaling in the Supplementary Information. 29We note that the results presented above can be improved if better parameter setting strategies are used.
The procedure for obtaining the set of fixed QAOA parameters is visualized in Fig. 4. Specifically, we optimize QAOA parameters for a set of small instances with sizes {N j } M j=1 attainable in simulation and set the fixed parameters to be the mean over the optimized parameters: where β * Nj , γ * Nj are the QAOA parameters optimized for the LABS instance of size N j and M is the number of optimized instances.Then the parameters used in QAOA for size N are given by β Fixed , γ Fixed N .We use 24 ≤ N j ≤ 31 (M = 8).

Error detection by symmetry verification
The error detection scheme relies on the symmetry of phase operator defined by Eq. 2. As it commutes with both ⊗ N i z i and ⊗ N i x i operators, one can measure the value of these operators and perform postselection on the measurement outcomes.That is, the state after the phase operator should have the same z and x parity as before it.In the presence of an odd number of bit flip or phase flip errors that occur during the implementation of phase operators, the resulting state will not be in the +1 eigenspace of the two syndrome operators.
Experimentally, we divide the whole phase operator into m splits such that each split has approximately the same number of two-qubit gates, and we perform syndrome checks at the end of each split to detect errors.The syndrome operators are mapped to ancillary qubits via sequential controlled-x or controlled-z gates and Hadamard gates applied before and after the partial phase operator.Since the number of two-qubit gates for the phase operators is higher than the number of gates used for the mapping by 2-3 orders of magnitude, additional errors introduced by ancillae are negligible.Furthermore, the crosstalk error probability during midcircuit measurements is on the order of 10 −5 , considerably lower than the typical two-qubit-gate infidelity of 2 × 10 −3 for the trapped-ion systems we used. 25As a result, our error detection scheme leads to large improvements in QAOA performance on hardware at the cost of the number of repetitions growing exponentially with the number of checks. 26We note that the performance of the error detection scheme can be further improved by implementing parity checks using fault-tolerant constructions. 42

Scaling of classical solvers
All scaling coefficients are obtained by fitting a leastsquares linear regression on the logarithm of TTS.The confidence intervals on the scaling coefficients are obtained by using the Student's t-distribution and are reported with 95% confidence.
We use commercial state-of-the-art branch-and-bound solvers in numerical experiments.Figure 1b and Table I show results obtained using Gurobi, 43 although we obtain similar results for CPLEX 44 (see the Supplementary Information).The use of commercial branch-and-bound solvers is motivated by the observation that their scaling closely matches that reported in Ref. 19.Specifically, we observe that for both solvers the time to produce a certificate of optimality (TTO) scales with an exponent within a 95% confidence interval of the 1.73 exponent reported in Ref. 19.We note that unlike the solver presented in Ref. 19, commercial solvers are not parallelizable and can take advantage of only one CPU with at most tens of cores.Since QAOA is a heuristic and does not guarantee optimality, we additionally run branch-and-bound solvers until a solution with an exactly optimal merit factor is found, at which point the execution is stopped.The resulting TTS scales more favorably: for Gurobi, the scaling is 1.615 N , with a 95% confidence interval of (1.571, 1.659).All the numbers reported correspond to the mean CPU time, with the mean taken over 100 random seeds for N ≤ 32 and 10 random seeds for N > 32.We present additional details of classical solver benchmarking in the Supplementary Information. 29ranch-and-bound algorithms are the best-known exact solvers for the LABS problem.In the regime where proving optimality is out of reach and the goal is simply to efficiently obtain sequences with high merit factors, heuristic algorithms are preferable.The best runtimes and runtime scaling reported in the literature 21 are from an algorithm known as Memetic Tabu.Memetic Tabu is a memetic algorithm, that is, an evolutionary algorithm augmented by local search.Specifically, an evolutionary algorithm is used to find initializations for tabu search, a metaheuristic that augments local neighborhood search with a data structure (known as the tabu list) that filters possible local moves if the potential solutions have been recently visited or diversification rules are violated. 45In terms of the runtime required to find optimal solutions in the regime where exact solutions have been found using branch-and-bound methods, 19 Memetic Tabu has been observed to outperform both non-evolutionary methods as well as memetic algorithms that use simpler neighborhood search schemes such as steepest descent.To verify the scaling of tabu search on the regime of interest for comparison with QAOA, we use the implementation of Memetic Tabu in Ref. 18.For each length, we average the runtime over 50 random seeds, obtaining the scaling of the time-to-solution of 1.35 N with a 95% confidence interval of (1.33, 1.38).This scaling closely matches the one reported in Ref. 28.We also note that solvers based on self-avoiding random walks 18 have been shown to be competitive with or outperform Memetic Tabu when the task is to find skew-symmetric 29 sequences with the lowest autocorrelation.These solvers are specialized to search for skew-symmetric sequences and do not naturally extend to the unrestricted LABS problem.
High-performance simulation of QAOA Our numerical results are enabled by a custom scalable high-performance algorithm-specific QAOA simulator.We briefly describe the simulator here; for additional details and benchmarks comparing the developed simulator with the state-of-the-art methods for simulating QAOA the reader is referred to Ref. 20.
In this work, the main goal of the numerical simulation of QAOA is to evaluate the expectation of the cost Hamiltonian ⟨C⟩ MF and the success probability p opt .Since p opt is exponentially small, it has to be evaluated with high precision.While many techniques can be leveraged for exact simulation, we opt to directly simulate the full quantum state as it is propagated through the QAOA circuit.We note in particular that tensor network techniques do not provide a benefit in this case since the circuit we simulate is deep and fully connected (see Ref. 20  for detailed comparison).
First, we leverage the observation that the cost Hamiltonian and hence the phase-separation operator are diagonal.This allows us to precompute the cost function evaluated at every binary input and multiply the exponentiated costs elementwise with the statevector to simulate the application of the phase-separation operator.This operation can be easily parallelized since it is an elementwise operation local to each element in the statevector.The same precomputed vector of cost function values is used to compute ⟨C⟩ MF by taking the inner product with the final QAOA state.The cost of precomputation is amortized over the large number of objective evaluations performed during parameter optimization and is thereby negligible.
Second, we note that the mixing operator consists of an application of a uniform x rotation applied on each qubit.Therefore, each rotation operation can be computed by multiplying a fixed 2 × 2 unitary matrix with a 2 × 2 n−1 matrix constructed from reshaping the statevector.This step is parallelized by grouping the pairs of indices on which the 2 × 2 unitary is applied.
We perform the simulations on the Polaris supercomputer located in Argonne Leadership Computing Facility.We distribute the simulation to 256 Polaris nodes with four NVIDIA A100 GPUs on each node and one AMD EPYC CPU.The CPU is used to manage the communication and the assembly of final results.Each GPU hosts a chunk of the full statevector and a chunk of the integer cost operator vector.Application of the cost operator does not require any communication since it is local to each element.The grouping in the mixing oper-ator depends on index i of the operator x i analogous to the grouping in the fast Walsh-Hadamard transform. 46or i ≤ n − log 2 (1024) = 29 the pairing is local within each GPU.For i > 29 we use CUDA-enabled MPI to distribute full chunks between nodes, which requires space to be reserved for two statevector chunks on each GPU.nor any of its affiliates makes any explicit or implied representation or warranty and none of them accept any liability in connection with this position paper, including, without limitation, with respect to the completeness, accuracy, or reliability of the information contained herein and the potential Legal, compliance, tax, or accounting effects thereof.This document is not intended as investment research or investment advice, or as a recommendation, offer, or solicitation for the purchase or sale of any security, financial instrument, financial product or service, or to be used in any way for evaluating the merits of participating in any transaction.
The submitted manuscript includes contributions from UChicago Argonne, LLC, Operator of Argonne National Laboratory ("Argonne").Argonne, a U.S. Department of Energy Office of Science laboratory, is operated under Contract No. DE-AC02-06CH11357.The U.S. Government retains for itself, and others acting on its behalf, a paid-up nonexclusive, irrevocable worldwide license in said article to reproduce, prepare derivative works, distribute copies to the public, and perform publicly and display publicly, by or on behalf of the Government.The Department of Energy will provide public access to these results of federally sponsored research in accordance with the DOE Public Access Plan http: //energy.gov/downloads/doe-public-access-plan.The problem of finding sequences with low sidelobe energies attracted attention in the 1960s and 1970s due to its applications to the reduction of the peak power of radar pulses. 1,2The merit factor F was first introduced by Golay 3 and defined as the ratio of central to sidelobe energy of a binary sequence.Improved merit factors were obtained over the years by exhaustive 4,5 and non-exhaustive 6 search methods.Explicit sequences asymptotically achieving the merit factor of F ≈ 6.34 are known. 7The conjectured asymptotic limit of F → 12.32 as N → ∞ was derived using arguments from statistical mechanics in Ref. 8. Bernasconi reframed the LABS problem as a spin model with long-range 4-spin interactions to apply simulated annealing to it. 9However, simulated annealing failed to obtain high-quality solutions, with the failure attributed to the "golf-course type" energy landscape. 9Bernasconi further conjectured that this property of the landscape will prevent stochastic search procedures from obtaining high-quality solutions for long sequences. 9This conjecture has held up so far.
A commonly considered class of sequences are those exhibiting skew-symmetry, which for sequences of odd length N = 2k −1 are defined as s k+l = (−1) l s k−l , l ∈ {1, . . ., k −1}.Skew-symmetric sequences are known to be optimal for many odd N instances.Since skew-symmetry reduces the search space from 2 N to 2 N 2 , only searching this subspace leads to better runtime scaling.Therefore many algorithms are restricted to only searching this subspace.The bestknown heuristic for skew-symmetric LABS uses a sequence of self-avoiding walk segments when searching and has a running time that scales as 1.15 N . 10In this work, we target the general LABS problem and therefore we do not consider solvers that are only capable of tackling the skew-symmetric instances.

SII. QAOA AS AN EXACT AND APPROXIMATE OPTIMIZATION ALGORITHM
We now provide additional numerical results highlighting the differences in QAOA behavior with parameters optimized for approximate and exact solutions.In this work, we use QAOA as an exact solver, with time to solution as the target metric.However, QAOA is typically used as an approximation algorithm, 11 with most theoretical results focusing on the expected solution quality obtained by QAOA.On the LABS problem, we observe that QAOA can provide poor approximations in polynomial time while still offering speedups as an exact exponential-time solver.
We begin by discussing the parameters themselves.QAOA parameters are typically chosen with respect to QAOA energy [11][12][13][14][15][16][17][18] or the probability of sampling the optimal solution. 19Figure S1a shows the difference in QAOA parameters in the two scenarios for the specific case of the LABS problem.First, we note that QAOA parameters that maximize ⟨C⟩ MF are significantly different from those maximizing p opt .Specifically, while the values of γ are similar, the value of β for p opt is much larger than that for ⟨C⟩ MF .Qualitatively, this implies that the probability amplitudes are allowed to "mix" more, making QAOA state not concentrated with respect to Hamming distance.This behavior is seen when examining the QAOA output distribution, which is shown for both parameter schedules in Fig. S2 and discussed in detail below.S1. a, Fixed QAOA parameters for p = 30 chosen with respect to the QAOA energy ⟨C⟩MF ("MF") and probability of sampling the optimal solution ("p opt ").When the parameters are optimized with respect to p opt , the value of β is significantly larger throughout QAOA evolution.Subfigure reproduced from the main text.b, QAOA performance for N = 25, p = 30 with parameters linearly extrapolated between fixed parameters for p opt (t = 0) and ⟨C⟩MF (t = 1).QAOA parameters optimized for ⟨C⟩MF give very poor values of p opt and vice versa.The probability of obtaining a binary string corresponding to a given energy level of the LABS problem (the zeroth energy level is the ground state or optimal solution, lower is better) for varying p (a-d).When parameters are optimized with respect to the expected merit factor (labeled "QAOA MF"), the QAOA output state is concentrated around the mean and fails to obtain a high overlap with the ground state.On the other hand, when parameters are optimized with respect to p opt (labeled "QAOA p opt "), the QAOA state has a high overlap with both ground state and higher energy states.The probability of obtaining the ground state is 27.3 times greater for QAOA with parameters optimized with respect to p opt at p = 40 (d).
Second, QAOA parameters that give good performance with respect to one metric are far from optimal with respect to the other metric.Fig. S1b shows QAOA performance with parameters linearly extrapolated between the parameters (β ⟨C⟩MF , γ ⟨C⟩MF ) that give a high value of ⟨C⟩ MF and (β p opt , γ p opt ) that give high p opt : γ = tγ p opt + (1 − t)γ ⟨C⟩MF and β = tβ p opt + (1 − t)β ⟨C⟩MF .We note that the parameters β ⟨C⟩MF , γ ⟨C⟩MF (t = 1) give a very low value of p opt and vice versa.This suggests that significant performance gains are possible if parameters are chosen differently with respect to the two figures of merit, rather than using one as a proxy for the other as is commonly done in QAOA research. 20,21Similar observations have been made in Refs.22 and 23, though the difference observed between the two figures of merit is more drastic in our case due to the hardness of the problem considered.
In the numerical experiments presented in the main test, we use time to solution as the target metric.For completeness, we include the results showing QAOA performance as an approximation algorithm.We observe that QAOA performs poorly on the LABS problem with respect to the expected merit factor ⟨C⟩ MF .Specifically, we observe that QAOA fails to outperform even simple classical techniques at high depth.Fig. S3 shows the expected merit factor of QAOA for fixed p = 100 as a function of N , as well as examples of how the expected merit factor grows with p for two fixed values of N .We observe that as N grows, QAOA at p = 100 achieves an expected merit factor ⟨C⟩ MF of less than 5.Note that explicit analytical sequences achieving merit factor > 6 are known. 7Moreover, we see that ⟨C⟩ MF grows increasingly slowly as N and p increase, suggesting that a prohibitively high value of p would be required to achieve a high expected merit factor.
We can understand this behavior by examining the values of the merit factor attainable by binary strings (in physics terms, we are examining the spectrum of the LABS Hamiltonian).For the N = 25 problem, presented in Fig. S2, only the 4 lowest energy levels correspond to a merit factor better than 6.This means that QAOA must concentrate all the wave function mass on a superposition of a small number of computational basis states.Due to QAOA preserving the D 4 symmetry of the problem, 24 this is a highly entangled state, which is hard to prepare. 25 S3.a, Performance of QAOA as an approximation algorithm.Explicit constructions of skew-symmetric sequences exist that achieve F ≈ 6.34 for large N . 7Simulated annealing achieves F ≈ 5 for large N . 9For QAOA, the expected value of merit factor ⟨C⟩MF is plotted.The expected merit factor of QAOA output is below the values easily attainable classically.b, For both N = 24 and N = 32, the optimal merit factor is 8 (dashed line).Improvement over random guess b a FIG.S4. a, Time-to-solution (TTS) of QAOA with parameters chosen with respect to p opt .b, Improvement over random guess.For the largest case numerically considered, which is the N = 39 problem at p = 33, the expected number of shots required to solve it is 1.2 × 10 4 .This is a 5.4 × 10 6 factor improvement over random guess.
As a result, QAOA requires a very large value of p to obtain a high expected merit factor.If, on the other hand, we choose the probability of sampling the exact optimal solution p opt (overlap between QAOA state and the ground state of the LABS Hamiltonian) as the target metric, a much lower value of p is needed to obtain good success probability.Qualitatively, Fig. S2 shows how this state preparation succeeds by allowing a significant part of the QAOA state to "leak" to high energy levels.This can be observed by noticing how the population of energy levels > 50 stays relatively high for QAOA with parameters optimized with respect to p opt , but becomes negligible if QAOA parameters are chosen with respect to ⟨C⟩ MF .
The success of QAOA in preparing states with a large overlap with the ground state of the LABS Hamiltonian (i.e.states having a high probability of measuring exactly optimal solution) motivates the choice of time to solution as the target metric for QAOA evaluation.We show the time to solution at the largest p explored numerically (p = 33) in Figure S4.We remark that the QAOA succeeds at achieving high overlap at N ≤ 40.
Our results suggest a new way of viewing the potential of QAOA to provide algorithmic speedups and provide an important caveat to theoretical results bounding the approximations attainable by QAOA with constant depth. 26ven in the regime where the expected solution quality of QAOA is bounded, it can be still useful as a tool to obtain a high probability of measuring the exact optimal solution.

SIII. DETAILS OF NUMERICAL STUDIES
We now present in detail how the fixed parameters were chosen.Our procedure for doing so is as follows.First, we optimize the QAOA parameters using the FOURIER 12 reparameterization.We show evidence that the optimization of the reparameterized QAOA gives the same performance as the more extensive optimization of the standard parameterization.Second, we set our fixed parameters to be the arithmetic mean of the (appropriately rescaled) optimized parameters for smaller N .We provide evidence that for smaller N where directly optimized parameters are available, the fixed parameters lead to QAOA performance that is close to that with the optimized parameters.We note that better parameter setting schemes can only improve our performance.
A. Optimized QAOA parameters for LABS change with N First, we observe that the optimized QAOA parameters are not invariant with N .Specifically, we observe that the optimized value of γ * goes down with N as 1 N .S5a plots this for p = 1.We note that β * is roughly constant with N .We observe this scaling for all p.As an example, S5b plots optimized parameters for two different values of N .After rescaling γ * by N , the two sets of parameters are visually indistinguishable.Below, we take advantage of this scaling of γ * in two ways.First, we improve the convergence of local optimization runs by rescaling the initialization and the initial step size of the local optimizer.Second, we use it to correctly account for scale when executing QAOA with fixed parameters.
The scaling of γ * with extremal properties of the objective function has been observed before for other problems.For example, the normalized value of the maximum cut on D-regular graphs grows with D (to the first order) and γ * decreases as 1 √ D . 27,28Analogous scaling has been observed for weighted problems. 29In LABS, the energy E sidelobe (s) grows as N 2 , so γ * decreases as 1 N .Formally establishing this connection is a promising direction for future research.

B. QAOA parameter optimization with FOURIER scheme
For a given figure of merit, we optimize the QAOA parameters as follows.In all cases below, we use the nlopt 30 implementation of BOBYQA 31 gradient-free local optimization algorithm.In all cases, we run BOBYQA until convergence, with convergence specified by relative tolerances on changes in parameters and in objective function value of 10 −8 .BOBYQA has been shown to outperform other local optimizers on the task of optimizing QAOA parameters. 13We expect similar results with any other local hill-climbing algorithm, albeit at a potentially different cost in terms of the number of iterations.
For p = 1, we optimize the QAOA parameter exhaustively by running the local optimizer from 400 initial points.We set the initial step size (rhobeg) to 0.01/N .The initial points β init , γ init are chosen uniformly at random from ) and with parameters optimized using the FOURIER[∞, 0] scheme (⟨C⟩ F MF ) for N > 12.We observe that the difference in the quality of the parameters obtained by the two optimization schemes is small.
when optimizing with respect to ⟨C⟩ MF , and β init ∈ [0.15, 0.3], γ init ∈ [0.6, 1.2/N ] when optimizing with respect to p opt .The regions for initializations are read off from grid search results for p = 1.
For p > 1, we follow the FOURIER[∞, 0] scheme of Ref. 12. Specifically, we change the QAOA parameterization to the frequency domain as follows: Then we optimize over the new parameters u, v.We take the optimized parameters u * p−1 , v * p−1 for p − 1 and run one local optimization from u * p = (u * p−1 , 0), v * p = (v * p−1 , 0).The initial step size (rhobeg) for local optimizer is set to 0.01/N .An initial step size that is small and decreasing with N is central to the robust convergence of a local optimizer, due to QAOA parameters having different scales for different N .We do not find it necessary for obtaining high-quality parameters to perform objective function rescaling of the type explored in Refs.29 and 32, though we expect that it may reduce the number of iterations required by the local optimizer to converge.

C. Evidence of the success of the FOURIER reparameterization heuristic
To evaluate the success of the FOURIER parameter optimization heuristic, we compare the quality of optimized parameters it finds with the quality of the parameters obtained by running a local optimizer with 100p seeds from initial points sampled uniformly from We find that the very expensive direct optimization performs very similarly to one local optimization run used in the FOURIER scheme, as shown in Figure S6.Specifically, the mean difference between the two schemes is < 0.05%, and in the worst case of the ones considered, FOURIER gives parameters that are only < 0.5% worse.Therefore below, we simply consider parameters optimized using the FOURIER[∞, 0] scheme.

D. Procedure for obtaining the fixed parameters
The procedure we follow for obtaining the fixed parameters is described in the main text.We reiterate it here for completeness.We optimize QAOA parameters for smaller values of N for which the simulation is relatively inexpensive and set the fixed parameters to be the mean over the optimized parameters: ) where β * N , γ * N are the QAOA parameters optimized for N .The fixed parameters used in QAOA applied to a LABS instance of size N are then given by β Fixed , γ Fixed N .This process is visualized for parameters optimized with respect to ⟨C⟩ MF in Figure S7.We use optimized parameters for 20 ≤ N ≤ 27 when computing parameters for ⟨C⟩ MF and 24 ≤ N ≤ 31 for p opt .

E. Evidence of the success of the fixed parameter scheme
To evaluate the quality of fixed parameters, we compare the QAOA performance with fixed parameters and with directly optimized parameters.Figure S8 presents the comparison.We observe that the two are very close for small p, with the ratio between the two growing for higher p.Specifically, for parameters optimized with respect to the expected merit factor ⟨C⟩ MF , the median difference in ⟨C⟩ MF between QAOA evaluated with fixed and directly optimized parameters is less than 0.01% for p = 50, with the difference even lower for smaller p (S8a).For parameters optimized with respect to p opt , the difference in p opt is larger and growing with p (S8c).Note that due to the exponentially small value of p opt , small absolute differences (including due to precision limitations) translate into large relative differences.Nonetheless, we observe good performance with fixed parameters at high N , as visualized for N = 32 in Figure S8d.QAOA performance with fixed parameters with respect to both figures of merit monotonically improves with p for all values of p considered.As the performance gap between fixed and optimized parameters grows with p, further improvements to fixed parameters are likely to yield even better scaling of QAOA TTS.
F. Scaling coefficient of QAOA TTS is not sensitive to the choice of Nmin In the Main text we motivate the choice of the cutoff N min for the range of N s to be included in the fit by examining the stability of the quality of fit with varying p.We obtain N min = 28 as the minimum value required to maintain a stable fit, and estimate QAOA scaling at 1.46 N at p = 12.We now provide evidence that this value is not sensitive to the choice of N min .
Fig. S9 shows the scaling exponent for QAOA TTS for varying choices of the cutoff N min .Taking the average over exponents obtained when performing a fit with 20 ≤ N min ≤ 35 gives the estimated scaling of 1.45 N (Fig. S9a), which is slightly better than the one reported in the main text.As shown in Fig. S9b, for sufficiently large p and small N min  c, Shaded area shows 95% confidence interval.b,d, Despite the relative differences between performance with optimized and transferred parameters growing with p, we observe that for all cases considered, QAOA performance improves monotonically as expected.Since the absolute differences are small, the performance with fixed and directly optimized parameters is visually indistinguishable.
the exponent changes as N min is increased, indicating that a larger regime of N must be explored to obtain a stable linear scaling.

G. Scaling coefficient of QAOA TTS does not follow power law
One of the important findings of this work is that for LABS problem increasing QAOA depth p beyond a certain small constant does not lead to better scaling.This puts the findings of this work in contrast to those of Refs.19, 33, and 34.We observe that unlike in Ref. 19, the scaling coefficient c in TTS = Θ(2 −cN ) does not follow a power law.This is shown in Figure S10.For the coefficient c to follow a power law, it must depend on p as c 1 × p c2 for some constants c 1 , c 2 .When the cutoff is chosen to ensure good fit (N = 28), we see clear deviation from a power law.We note that if we include smaller values of N in the fit and ignore the low quality of the fit, we observe a clear power law scaling of c ∼ 0.77 × p −0.18 .

H. Comparison of performance between QAOA and amplitude amplification
In this work we propose a strategy for using QAOA as a building block for algorithmic speedups by combining it with amplitude amplification (AA) or, more specifically, generalized minimum-finding as described in Appendix C of Ref. 35.Specifically, we propose running the quantum minimum-finding algorithm with constant-depth QAOA as a subroutine.If the QAOA circuit prepares the state with p opt overlap with the ground state of the LABS Hamiltonian, the minimum-finding algorithm would need apply the QAOA unitary and an oracle for computing the LABS cost function O 1/ p opt times to obtain a constant probability of measuring a bitstring corresponding to the optimal  solution.We note that our numerical results suggest that increasing the QAOA depth is always beneficial as compared to doing a smaller number of QAOA steps and increasing the number of amplitude amplification steps.This is visualized in Figure S11.As this figure shows, for sufficiently high p the gain from a step of QAOA is very close to the gain from amplitude amplification, which provides an upper bound on the expected gain from one step of minimum-finding.At the same time one step of QAOA is much simpler to implement as it only requires the phase oracle and a series of one-qubit gates (mixer).We note that we do not provide exact gains from a step of the minimum-finding algorithm as they are non-trivial to estimate.However, asymptotically it is equivalent to amplitude amplification with an oracle that marks optimal solutions.

FIG. S11.
The ratio between the gains in the probability of measuring the optimal solution from one step of QAOA and amplitude amplification (AA).QAOA provides orders of magnitude larger improvements for the first few steps, and then behaves approximately like amplitude amplification.

I. Proof of Theorem 1
We now present the proof of Theorem 1 of the main text.Our technique is based on the generalized minimumfinding procedure outlined in Lemma 48 of Ref. 35.For reader's convenience and completeness, we begin by restating our algorithm and the Theorem.
Lemma 1 (Exponential Quantum Search, Ref. 36).Let |ψ⟩ = U |0⟩ ⊗N be a quantum state in a 2 N -dimensional Hilbert space with computational basis elements indexed by N -bit bitstrings, and m : {0, 1} N → {0, 1} be a marking function such that {x|m(x)=1} |⟨ψ|x⟩| 2 ≥ p.There exists a quantum algorithm EQSearch(U, m, δ) that outputs an element x * such that m(x * ) = 1 with probability at least δ using O  Proof.We will first analyze the randomized Algorithm 1 as a Las Vegas algorithm, i.e. we assume that the internal while loop is infinite.In the upcoming analysis we will assume that every call to EQSearch in the internal loop behaves as intended.Note that we choose the failure probability of each such call to be 1/(6 • 2 N ).The total number of calls cannot be more than 2 N and so by the union bound, every call succeeds except with probability at most 1/6.
The algorithm generates a monotonically decreasing sequence of samples, uses EQSearch in each iteration to search for a sample that is strictly lesser than the previous one.Since the sequence of samples is strictly decreasing and there are N possible distinct samples, the algorithm eventually returns the minimum.Since correctness is eventually guaranteed, we can simply bound the expected number of iterations before the sequence finds the minimum.If this expected number is m, we can run the internal loop 2m times to ensure that we find the minimum with probability at least 1/2.Consequently, 2m log 1 δ iterations suffice to ensure that we find the minimum with probability at least 1 − δ.It remains to show that the expected number of iterations before the minimum is found is at most O 1 √ popt .
For this argument, we define the following quantities.(x 1 = x * , x 2 , . . ., x n ) is the list of bit-strings sorted in ascending order of the value of E sidelobe , and define P (ξ(X)) to be the probability of event ξ(X) when X is a bitstring sampled by measuring |ψ⟩ in the computational basis.Suppose in some iteration t, the sample returned by EQSearch is s t .In the next iteration, EQSearch searches for an element with sidelobe energy less than s t .The central observation is the following: for any x k where k ∈ [N ], the probability that some s t = x k given that t is the first iteration where x k appears in the list of obtained samples is given by P (X = x k )/P (X ≤ x k ).To see this, we observe as in Ref. 35 that: Notice that since a value can be sampled at most once, and the minimum is obtained within n steps, the probability that a given value x k occurs in the list of observed samples is P (X = x k )/P (X ≤ x k ).
To bound the expected number of queries before the minimum (x 1 ) is found by the algorithm, we associate with each bitstring x l ∈ {x n , x n−1 , . . ., x 2 } the probability that it is an obtained sample in some iteration, and the cost of performing the corresponding search for an element less than x l .The cost of the search, for our chosen parameters is at most C log(6•2 N ) √ , where the last inequality holds whenever N ≥ 3.
The total expected number of queries before the minimum x 1 is found, is therefore given by If we run the inner loop of Algorithm 1 more than 3 times the expected number of queries required to find x * , as prescribed if M ≥ 1/ √ p opt , we fail to find x * with probability at most 1/3 by the Markov inequality as long as no query to EQSearch fails.Additionally, by the earlier discussion, a query in the internal loop fails with probability at most 1/6.Therefore, by a union bound, each application of the inner loop finds x * with probability of at least 1/2.Repeating the inner loop log(1/δ) times ensures that x * is found with probability at least 1 − δ (if not, the inner loop has to fail to find x * in log(1/δ) independent trials).

J. Details of the classical solver scaling
The scaling of the commercial branch-and-bound solvers is presented in Figure S12.For each solver, we run it with 100 random seeds for N ≤ 32 and 10 random seeds for N > 32 and report the mean.The minimum N to include in the fit was chosen to maximize the quality of fit.We set the Gurobi parameters as follows: Cuts=0, Heuristics=0.For the other parameters in Gurobi and CPLEX the defaults are used.We observe that the performance of the two commercial solvers considered is within a 95% confidence interval of each other, with the TTO scaling matching that reported in Ref. 37.
We report complete results for the Memetic Tabu solver in Figure S13.The scaling is obtained by extrapolating the number of cost function evaluations made by the Memetic Tabu algorithm at different lengths.This quantity is fixed over repeated seeds for a given seed and length, unlike the execution time that may fluctuate depending on the runtime environment.The fluctuation in running time is much larger for Memetic Tabu as compared to branchand-bound due to lower absolute value of the runtime (< 1 sec for N = 40) The time to evaluate the cost function on a sequence of length N scales only as N 2 , which does not produce a consistent effect on runtime scaling at small lengths.The TTS scaling is therefore essentially the same as the scaling of the number of function evaluations, 10 and we choose to report the latter, more stable quantity.The seeds chosen for the runs are a contiguous block of 50 integers chosen from a random starting point.The Memetic Tabu solver is run in single-threaded mode for all our experiments, to avoid the overestimation of cost function evaluations arising from race conditions between exploration and termination checks.

SIV. EXPERIMENTS ON TRAPPED-ION SYSTEMS A. Experimental system
The experiments in this work were performed on Quantinuum H1 and H2 platforms. 38,39The system design is based on the QCCD architecture with multiple separate gate zones.Each gate zone is used to perform operations on an arbitrary pair of two qubits at a time, suppressing crosstalk and maintaining high fidelity.The hyperfine approximate clock states of 171 Yb + in the 2 S 1/2 state are used to encode qubit information.Namely, |0⟩ ≡ |F = 0, m f = 0⟩ and |1⟩ ≡ |F = 1, m f = 0⟩.After loading, the qubits can be prepared in |0⟩ via optical pumping. 38,39he systems have all-to-all connectivity with two-qubit gates between pairs of qubits implemented by ion transport, which brings the pairs into the same gate zone.To implement two-qubit gates, a phase-sensitive Mølmer-Sørensen (MS) gate sandwiched between single-qubit wrapper pulses is used.It in turn gives the zz gate R zz (γ) = exp(−iγzz/2), where the rotation angle can be precisely controlled by changing the parameters in the MS gate. 38,39Both the H1-1 and H2 systems used in this work have a typical average two-qubit infidelity of 2×10 −3 , with single qubit infidelity two orders of magnitude smaller.The qubit state can be read out via state-dependent resonance fluorescence measurement.Note that mid-circuit measurement and reset can be implemented while causing a small crosstalk error due to the stray light from the measurement and reset laser beams.

B. Circuit compilation and optimization
We now present the circuit compilation procedure for the experiments on the trapped-ion quantum processor.The H-series devices used in this work have at most five gate zones, 39 i.e. at most five two-qubit gates can be executed in parallel.This implies that optimizing the circuit for full parallelism may result in diminishing returns past five parallel gates.In any case, the highest error operation on the devices is the two-qubit gate, so the primary limiting factor for achieving high-fidelity results is the two-qubit gate count.Thus, we chose to first optimize the two-qubit gate count.In addition, the cost operator is the composition of diagonal gates, and thus we are free to apply them in any order.We optimize the order to maximize the number of gate cancellations.A similar approach has been used in Ref. 40 for devices with nearest-neighbor connectivity.
We start by decomposing the four-body terms R zzzz (γ) into four cnots and a single R zz (γ), where R zzzz (γ) and R zz (γ) denote evolution under zzzz and zz coupling with angle γ, respectively.Note that the R zz (γ) = e −i γ 2 zz is the native gate for the Quantinuum H-series trapped-ion processors.The goal of the first step of the compilation procedure is to schedule the R zzzz (γ) gates corresponding to four-body terms in a way that greedly cancels as many cnots as possible (see Figure S14 for the decomposition).The second step deals with the two-body terms and attempts to schedule each R zz (γ) gate near a four-body term where one of the cnot acts on the same qubits as the two-body term.This is to leverage two-qubit gate resynthesis implemented in tket. 41These steps are described in detail in Algorithm 2.Then, the resulting circuit is passed to tket circuit optimizer to transpile the circuit into the H-series native two-qubit gates: R zz (γ) and R zz (π/2).The preliminary step of greedly optimizing the layout of the interactions reduced the two-qubit gate count by 1.7 times on average compared to tket alone.The improvement in two-qubit gate count from the greedy cnot cancellation is shown in Fig. S15.Lastly, we compared a variety of circuit optimizers and found that tket resulted in the most significant gate-count reduction.

Algorithm 2 Greedy cost-operator circuit optimization
Require: List of four-tuples of (i, j, k, l), with i < j < k < l, where (i, j) and (k, l), respectively correspond to the qubits of top two cnotij and bottom two cnot lk in decomposition of Rzzzz presented in Figure S14.Note the indices corresponding to the control and target, respectively, are reversed for (i, j) and (k, l) List of two-tuples (i, j), with i < j, indicating which qubits to apply each Rzz rotation to.Ensure: Output a single list that contains all of the terms (both four-and two-body) in the order in which they should be applied, according to the greedily-optimized circuit.We note that there exists an alternative proposal 42 for reducing the cost of implementing the LABS cost operator.In this approach, the phase operator is replaced by the evolution under the Hamiltonian corresponding to where . . . . . . . . .
FIG. S16.Overview of one step of the error-detection scheme.A part of the circuit U phase is "sandwiched" between two parity checks.Any error on data qubits that does not commute with the check is guaranteed to be detected.
This is in contrast to the approach mentioned in the main text: The Hamiltonian corresponding to the absolute value of the autocorrelations has the same ground state space and energy levels as the one in Equation S9.While this reduces the asymptotic complexity for computing the energy to Θ(N 2 ), it requires quantum arithmetic, putting it beyond the capability of current hardware.Thus, we focus on optimizing the cost operator corresponding to Equation S9.Further gate count reductions may be possible by fixing some of the variables and applying the techniques of Ref. 43, though doing so is outside of the scope of this work.

C. Summary of the error-detection scheme
We now briefly summarize the error-detection scheme.The proof that our scheme is capable of detecting an arbitrary single-qubit error in the phase operator circuit under the assumption of noiselessly implemented parity checks is a special case of Ref. 44, Theorem 1.We include a brief discussion of the scheme here for completeness and refer interested readers to Ref. 44 for a detailed discussion.We first consider the case of when just one parity is checked (either x or z), and then show how the analysis generalizes to the case when both parities are checked simultaneously.Figure S16 presents the overview of the circuit.Let the state of the data qubits before the check be ρ init and assume the ancilla is perfect and initialized to the pure state |0⟩.The z parity check is given by the operator C z = 1 2 i ⊗ |0⟩ ⟨0| + z ⊗N ⊗ |1⟩ ⟨1| .Denote the part of the phase operator sandwiched between the checks as U phase .Note that U phase can encompass the full phase operator or only a part of it.At (1), the state is ρ (1) = ρ init ⊗ |0⟩ ⟨0|.At (2), the state is ρ (3) = ρ init ⊗ |0⟩⟨0|+|1⟩⟨1| 2 .After the first check is applied, at (3) the state becomes and the final state Then measuring the ancillary qubit will return 1, and the error will be detected.A Pauli error that commutes with the check, however, will go undetected.When both x ⊗N and z ⊗N parities are checked, no odd-weight Pauli commutes with both checks.Therefore, any odd-weight Pauli error will be detected by the implemented scheme.If the checks are noiseless, increasing the frequency of checks by reducing the size of the circuit U phase between the checks can only improve the final fidelity.However, in practice the checks are noisy, introducing a trade-off between the increase in the errors detected and the errors introduced by the checks themselves.

D. Performance of the error-detection scheme
In this section, we provide details on the implementation of the proposed error-detection scheme.For a circuit with m checks, we separate the phase operator into m parts, where each part has approximately the same number of two-qubit gates.We observe that increasing the frequency of the parity checks up to m = 3 improves the QAOA performance.Fig. S17a shows improvements in expected merit factor after post-selection.More measurements increases the ratio of samples with detected errors, increasing the overhead of the error-detection scheme in terms of the number of repetitions.Fig. S17b shows how this overhead increases with N by plotting the decay of the ratio of samples with no error detected to all samples ("post-selection ratio").This ratio drops to below 10% at N ≥ 15 and m = 3.To trade off the number of repetitions required against the fidelity of the final result, we set m = 3 in our experiments.We further note that in our experiments, the post-selection typically keeps the bitstring with the highest merit factor sampled from experiments, as shown in Fig. S18.Note that in a practical optimization setting, the best of all bitstrings corresponding to valid solutions would be chosen as the output.
An important benefit of our error-detection scheme is reduced time to a high-quality sample, i.e. a sample with no errors detected.The time improvement comes from the ability to stop the execution when a mid-circuit measurement detects an error.This capability is particularly relevant to trapped-ion systems with relatively low clock speeds and very long coherence times enabling such classical feedback.Although available hardware supports this feature, we do not implement the early stopping in our hardware experiments.The time savings provided below are estimates.
We denote the probability that no detectable error occurs during part i as p i .For the case without any mid-circuit syndrome measurement, the average time needed to reach a measurement result with a high merit factor, i.e., no parity error detected for all the check measurements, is given by t1 = t 0 /( m i p i ), (S21) The first data point corresponds to the case when we keep all the results from measurement on the data qubits and do not do post-selection.b, Post-selection ratio when splitting the phase operator into three parts and performing parity syndrome measurement at the end of each part.The curves show the simulation results using realistic parameters of the Quantinuum H2 trapped-ion device.The measured results on both H1 and H2 hardware with three checks are in good agreement with simulations.where t 0 is the total circuit time.With mid-circuit check and feed-forward discard of the remaining circuit conditioned on the check result, the average time to get a bitstring result for which the merit factor has a high value reads as t2 = t 0 /( with p 0 = 1.Here we neglect the gate time between data qubits and ancillary qubits.The comparison between the two is shown in Fig. S19, indicating that our error check method would reduce the time to get a bitstring with a high merit factor.

FIG. 1 .
FIG.1.Classical and quantum algorithms applied to the LABS problem.a, Diagram of the LABS problem (with example of N = 5).The problem involves non-local two-body (black lines) and four-body (blue lines) interactions.b, Timeto-solution (TTS) of classical solvers.For the sizes considered, we observe clear exponential scaling with exponents matching their asymptotic values reported in the literature (see TableI). c, The distribution over 21 ≤ N ≤ 31 (for LABS) and 34 random instances (for MaxCut on random 3-regular graphs with 20 nodes) of Pearson product-moment correlation coefficients relating the Hamming distance of bitstrings from the optimal solution with the objective value of the bitstring.LABS has a much lower correlation between the Hamming distance and objective, indicating that it is much harder than the commonly considered MaxCut problem.d, Diagram of QAOA circuit for a 5-qubit example.Starting from a uniform superposition of the computational basis states, we apply p layers of phase and mixing operators, followed by measurement in the computational basis.

FIG. 2 .
FIG.2.QAOA runtime scaling and dynamics under different parameter schedules.a, The quality of the exponential fit for different choices of minimum N to include in the fit.N ≥ 28 results in a robust fit, the quality of which does not deteriorate with p. N = 40 is omitted as it was only simulated up to p = 22.b, TTS of QAOA at p = 12.Clear exponential scaling is observed.c, The scaling exponent of QAOA runtime for different QAOA depths p. Shaded area shows 95% confidence interval.Increasing p beyond p ≈ 12 does not lead to better scaling.d, The gain in success probability p opt from applying step p of QAOA and amplitude amplification (AA).The gain is defined as p opt at step p /p opt at step (p−1) .The gain at p = 1 is over the random guess.Only one line is plotted for amplitude amplification since the lines for the values of N considered are visually indistinguishable.For small p, a QAOA layer gives orders of magnitude larger gain than a step of AA. e, Fixed QAOA parameters for p = 30 chosen with respect to the QAOA energy ⟨C⟩MF ("MF") and probability of sampling the optimal solution ("p opt ").Different choice of optimization objective gives different resulting parameters.f.Probability of obtaining a binary string corresponding to a given energy level of the LABS problem (the zeroth energy level is the ground state or optimal solution; lower is better).When parameters are optimized with respect to the expected merit factor (labeled "MF"), the QAOA output state is concentrated around the mean and fails to obtain a high overlap with the ground state.On the other hand, when parameters are optimized with respect to p opt (labeled "p opt "), the QAOA state has a high overlap with both the ground state and higher energy states.The probability of obtaining the ground state is 27.3 times greater for QAOA with parameters optimized with respect to p opt at p = 40.

FIG. 3 .
FIG.3.Experimental results on trapped-ion system.a, Decomposition of four-body interaction terms into a two-body Rzz gate and four two-body cnot gates, which can be realized via native Rzz gates.b, Energy density plot from experimental measured bitstrings for N =13.Energy index is arranged in energy ascending order.As a comparison, the distributions for noiseless p = 1 QAOA simulation and random guess (assuming uniform distribution of all possible bitstrings) are shown.c, Experimental results up to 18 qubits on a trapped-ion quantum device (H1-1) with QAOA layer p = 1 with optimized QAOA parameters.The error bars are calculated with 99% confidence intervals hereafter.d, Illustration of parity check circuit.The z and x parities of states are mapped to ancillary qubits after implementation of full (or part of) phase operators via cz and cnot gates, respectively, followed by mid-circuit measurement on the ancillary qubits to extract the parity syndrome result.e, Experimental results for circuit with parity check.Three mid-circuit z-parity and x-parity checks were performed using six ancillary qubits.The ancillae can also be reused after appropriate reset during the circuit.The red data points were run on the Quantinuum H2 hardware while the blue data were from the H1-1 device.Data run on the H1-1 device without any ancillary qubits are shown in grey.Circles (diamonds) are the data without (with) post-selection.The abbreviation ED refers to the error detection via the parity checks.Number of mid-circuit parity checks is fixed to be two for N = 10, 11 and three for all other N .Improvement in expected merit factor after post-selection according to parity syndrome measurement is observed.

FIG. 4 .
FIG.4.Visualization of how the fixed parameters are obtained.a, Optimized QAOA parameters β (top lines) and γ (bottom lines) for p = 21.γ is multiplied by N/24 (constant factor of1 24 added for figure readability in both subfigures).b, Fixed parameters obtained by taking the arithmetic mean over the optimized parameters.
devised the project.J. Larson, N. Kumar, and R. Shaydulin implemented QAOA parameter optimization and the parameter setting schemes.R. Shaydulin and Y. Sun implemented the single-node version of the QAOA simulator.D. Lykov implemented the distributed version of the QAOA simulator and executed the large-scale simulations on Polaris.R. Shaydulin analyzed the simulation results.D. Herman and S. Hu developed the circuit optimization pipeline.C. Li implemented and analyzed the error detection scheme.M. DeCross and D. Herman executed the experiments on trapped-ion hardware, and C. Li analyzed the results.S. Chakrabarti, D. Lykov and P. Minssen benchmarked classical solvers.S. Chakrabarti, D. Herman and R. Shaydulin analyzed the generalized quantum minimum-finding enhanced with QAOA.J. Dreiling, J.P. Gaebler, T.M. Gatterman, J.A. Gerber, K. Gilmore, D. Gresh, N. Hewitt, C.V. Horst, J. Johansen, M. Matheny, T. Mengle, M. Mills, S.A. Moses, B. Neyenhuis, and P. Siegfried built, optimized, and operated the trapped-ion hardware.M. Pistoia led the overall project.All authors contributed to technical discussions and the writing of the manuscript.DISCLAIMER This paper was prepared for informational purposes with contributions from the Global Technology Applied Research center of JPMorgan Chase & Co., Argonne National Laboratory and Quantinuum LLC.This paper is not a product of the Research Department of JPMorgan Chase & Co. or its affiliates.Neither JPMorgan Chase & Co.

CONTENTS
SI. Background on the LABS problem SII.QAOA as an exact and approximate optimization algorithm SIII.Details of numerical studies A. Optimized QAOA parameters for LABS change with N B. QAOA parameter optimization with FOURIER scheme C. Evidence of the success of the FOURIER reparameterization heuristic D. Procedure for obtaining the fixed parameters E. Evidence of the success of the fixed parameter scheme F. Scaling coefficient of QAOA TTS is not sensitive to the choice of N min G. Scaling coefficient of QAOA TTS does not follow power law H.Comparison of performance between QAOA and amplitude amplification I. Proof of Theorem 1 J. Details of the classical solver scaling SIV.Experiments on trapped-ion systems A. Experimental system B. Circuit compilation and optimization C. Summary of the error-detection scheme D. Performance of the error-detection scheme References arXiv:2308.02342v1 [quant-ph] 4 Aug 2023 SI.BACKGROUND ON THE LABS PROBLEM FIG.S1.a, Fixed QAOA parameters for p = 30 chosen with respect to the QAOA energy ⟨C⟩MF ("MF") and probability of sampling the optimal solution ("p opt ").When the parameters are optimized with respect to p opt , the value of β is significantly larger throughout QAOA evolution.Subfigure reproduced from the main text.b, QAOA performance for N = 25, p = 30 with parameters linearly extrapolated between fixed parameters for p opt (t = 0) and ⟨C⟩MF (t = 1).QAOA parameters optimized for ⟨C⟩MF give very poor values of p opt and vice versa.
FIG. S2.The probability of obtaining a binary string corresponding to a given energy level of the LABS problem (the zeroth energy level is the ground state or optimal solution, lower is better) for varying p (a-d).When parameters are optimized with respect to the expected merit factor (labeled "QAOA MF"), the QAOA output state is concentrated around the mean and fails to obtain a high overlap with the ground state.On the other hand, when parameters are optimized with respect to p opt (labeled "QAOA p opt "), the QAOA state has a high overlap with both ground state and higher energy states.The probability of obtaining the ground state is 27.3 times greater for QAOA with parameters optimized with respect to p opt at p = 40 (d).
FIG.S3.a, Performance of QAOA as an approximation algorithm.Explicit constructions of skew-symmetric sequences exist that achieve F ≈ 6.34 for large N .7  Simulated annealing achieves F ≈ 5 for large N .9For QAOA, the expected value of merit factor ⟨C⟩MF is plotted.The expected merit factor of QAOA output is below the values easily attainable classically.b, For both N = 24 and N = 32, the optimal merit factor is 8 (dashed line).
FIG.S5.a, Scaling of the optimized value of QAOA parameter γ with N for p = 1 when optimized with respect to expected merit factor ⟨C⟩MF ("QAOA MF") and probability of obtaining optimal solution p opt ("QAOA p opt ").Optimized γ * decreases with N as 1 N .b, QAOA parameters optimized with respect to ⟨C⟩MF for N =∈ {22, 25}.For N = 22, the parameters γ are scaled by 22/25.After rescaling, the parameters for N = 22 and N = 25 are visually indistinguishable.
FIG. S6.The ratio between the expected merit factor of QAOA with parameters optimized by directly running local optimization from many initial points (⟨C⟩ O MF ) and with parameters optimized using the FOURIER[∞, 0] scheme (⟨C⟩ F MF ) for N > 12.We observe that the difference in the quality of the parameters obtained by the two optimization schemes is small.
FIG. S7.Visualization of how the fixed parameters with respect to ⟨C⟩MF are obtained.Visualization for p opt is presented in the main text.a, Optimized QAOA parameters β (top line) and γ (bottom line) for p = 21.γ is scaled by N/24, with the constant factor of 1/24 added for figure readability.b, Fixed parameters obtained by taking the arithmetic mean over the optimized parameters.
FIG. S8.Comparison of QAOA performance with fixed (⟨C⟩ fixMF , p opt fix ) and optimized (⟨C⟩ O MF , p opt O ) parameters.a,c, Shaded area shows 95% confidence interval.b,d, Despite the relative differences between performance with optimized and transferred parameters growing with p, we observe that for all cases considered, QAOA performance improves monotonically as expected.Since the absolute differences are small, the performance with fixed and directly optimized parameters is visually indistinguishable.
FIG.S9.The scaling exponent for QAOA TTS for varying choices of Nmin included in the range of values of N for fit at a, p = 12 and b, for varying p.If smaller Nmin are included, the scaling exponent continues to improve with p, with the quality of fit decaying with p (see Main text and Fig.S10).For sufficiently high Nmin, the exponent does not improve beyond p ≈ 10.At p = 12, the scaling exponent is not sensitive to the choice of Nmin (a).
FIG.S10.The scaling coefficient c in TTS = Θ(2 −cN ) as a function of p.The blue line is included for illustration as including N < 28 leads to low quality of the fit.When the quality of the fit is high (28 ≤ N ≤ 38), the scaling coefficient does not follow a power law.

Theorem 1 .
Suppose a constant-depth QAOA circuit U QAOA prepares a state |ψ⟩ = U QAOA |0⟩ ⊗N with N ≥ 3, such that we have |⟨x * |ψ⟩| ≥ 1/ √ p opt , where |x * ⟩ encodes an optimal solution to the N -bit LABS problem in a computational basis state, and we assume that p opt ≥ 1/N .Then, running Algorithm 1 with parameters M ≥ 1/ √ p opt and failure probability δ, runs with a gate complexity of O(poly(N ) log(1/δ)M ) and finds x * with probability at least 1 − δ.

FIG. S14 .
FIG. S14.Decomposition of four-body interaction terms into a two-body Rzz gate and four cnot's.Figure reproduced from the main text.
FIG. S15.Comparison of two-qubit gate count (number of Rzz(γ) + Rzz(π/2)) of QAOA circuit at p = 1 with random term ordering and with the greedy optimization.The "random" line is the average over 20 random orderings of the two-and fourbody terms.Both random and greedy are further optimized by tket, 41 and the resulting gate count is plotted.Cubic fit line added to guide the eye.
FIG.S17.a, Expected merit factor as a function of the number of error checks used when post-selecting the data qubit results for the N = 15 experiment on the H2 hardware system with 5000 repetitions.Here m = 3 is the number of z-and x-parity checks.The error bars become larger as we discard more shots due to more errors being detected with more parity checks.The first data point corresponds to the case when we keep all the results from measurement on the data qubits and do not do post-selection.b, Post-selection ratio when splitting the phase operator into three parts and performing parity syndrome measurement at the end of each part.The curves show the simulation results using realistic parameters of the Quantinuum H2 trapped-ion device.The measured results on both H1 and H2 hardware with three checks are in good agreement with simulations.
FIG. S18.Experimentally sampled bitstring that has the highest merit factor for different N .The number of shots taken is 2000 for N = 8, 9 and 5000 for other N .Circles (diamonds) are the data without (with) post-selection.They have exact same values except for N = 13 and N = 17.The best sampled bitstrings before post-selection have the same merit factor as true optimal bitstrings for all instances studied here.

TABLE I .
Scaling exponents for quantum and classical algorithms.Confidence intervals (CIs) are 95%.The reported asymptotic exponential scaling of classical state-of-the-art solvers is reproduced at N ≤ 40.For branch-and-bound, we include both the time to obtain a certificate of optimality (TTO reported in Ref. Unitary UQAOA acting on C 2 N such that |⟨x * |UQAOA|0⟩ ⊗N | ≥ 1/ √ popt for unknown popt, VLABS for computing E sidelobe into a register, and δ ∈ (0, 1), positive number M ≤ 2 N , C is the constant corresponding to the O(•) in Lemma 1 Ensure: If M is greater than 1/ √ popt, output x * with ≥ 1 − δ probability using O(log(1/δ)M ) calls to UQAOA and VLABS (and their inverses).xresissetto an empty list.fori← 1 to ⌈log(1/δ)⌉ do t ← 0; s0 ← ∞ while number of calls to UQAOA & VLABS is < 3CM N do t ← t + 1Define mt : {0, 1} N → {0, 1} such that mt(x) = 1 if and only if E sidelobe (x) < st−1.Note that mt can be coherently evaluated using one query to VLABS.Set st = EQSearch(UQAOA, mt, 1/(6 • 2 N )). Require: circuit ← empty list for each collection of four body terms (i, j, k, l) grouped by locality d := j − i (= l − k for LABS) do current ← uniformly randomly sample (and remove) a term (i, j, k, l) from the collection of terms of locality d. add current to the circuit list tops ← list initialized with tuple (i, j) bottoms ← list initialized with tuple (k, l) while there are still more terms in the collection do for each term (r, s, t, v) in the collection do if (r, s) ∈ tops or (t, v) ∈ bottoms then Assign the term a score of +1 elseAssign the term a score of −1.end if if ∃ m | (m, r) ∈ bottoms or ∃ a | (a, t) ∈ tops thenSubtract 1 from terms score.// This implies that inserting this term to the circuit would mean that there is some cnotmr or cnotta currently in the circuit that can never be cancelled.
for each four body term (r, s, t, v) in circuit do insert (i, j) after (r, s, t, v) in circuit if (i, j) = (r, s) or (i, j) = (t, v) and break inner loop