The effect of classical optimizers and Ansatz depth on QAOA performance in noisy devices

The Quantum Approximate Optimization Algorithm (QAOA) is a variational quantum algorithm for Near-term Intermediate-Scale Quantum computers (NISQ) providing approximate solutions for combinatorial optimization problems. The QAOA utilizes a quantum-classical loop, consisting of a quantum ansatz and a classical optimizer, to minimize some cost function, computed on the quantum device. This paper presents an investigation into the impact of realistic noise on the classical optimizer and the determination of optimal circuit depth for the Quantum Approximate Optimization Algorithm (QAOA) in the presence of noise. We find that, while there is no significant difference in the performance of classical optimizers in a state vector simulation, the Adam and AMSGrad optimizers perform best in the presence of shot noise. Under the conditions of real noise, the SPSA optimizer, along with ADAM and AMSGrad, emerge as the top performers. The study also reveals that the quality of solutions to some 5 qubit minimum vertex cover problems increases for up to around six layers in the QAOA circuit, after which it begins to decline. This analysis shows that increasing the number of layers in the QAOA in an attempt to increase accuracy may not work well in a noisy device.


Introduction
Many promising quantum algorithms, offering polynomial and exponential speed-ups over their classical counterparts, have been proposed 1 .These algorithms, including Grover's unstructured search algorithm 2 and Shor's algorithm for finding the prime factors of an integer 3 , have generated much excitement over the prospects of quantum computing.However, their practical realization is generally accepted to be a long term project due to constraints such as noise and state fidelity 1 .Error-correction schemes that would yield fault-tolerant quantum computers have been devised, but they require quantum computers with many more qubits than we have available at present 4 .
In the meanwhile, there is significant interest in quantum algorithms that are applicable to noisy intermediate-scale quantum (NISQ) computers currently available in the near to intermediate future 4 .These algorithms are predominantly variational and use hybrid quantum-classical routines to leverage existing quantum resources.They include the Variational Quantum Eigensolvers (VQE) 5 and the Quantum Approximate Optimization Algorithm (QAOA) 6 .As the number of the qubits in these near-term quantum computers increases, they become increasingly difficult to simulate with classical computers 1 .
The VQE and QAOA algorithms utilize parameterized quantum circuits U (θ ) to evolve the state of the Hamiltonian H representing the problem of interest.Using the expectation value ⟨U (θ )| H |U (θ )⟩, a classical optimizer is used to train the parameters of the quantum circuit.In this way, the algorithm uses the ansatz to prepare trial solutions to the problem, and the classical optimizer searches for better approximations of the ideal solutions 5,6 .
In QAOA, the ansatz is constructed from p layers of exponentiated cost and mixer Hamiltonians obtained from the problem cost definition 6 .As p → ∞, the solution prepared by QAOA approaches the ideal solution, but, in the context of NISQ computing it is not feasible to utilize such deep circuits due to the effects of noise and state decoherence 4,6 .The noise in the NISQ computers also adversely affects the efficacy of the classical optimization procedure.The performance of the classical optimizer has recently been studied on the QAOA 7 , as well as in other variational quantum algorithms 8 .Optimization protocols of several variations of the QAOA have also been recently studied 9 .In this work, we investigate classical optimizers and circuit depths p to find the optimal optimizer choice and ansatz depth for the minimum vertex cover problem under realistic device noise.We utilized a noise model sampled from the IBM Belem quantum computer to simulate the effects of noise on the efficacy of the algorithm.To the best of our knowledge, this is the first investigation of optimal circuit depth for QAOA with noise.
The remainder of the paper is structured as follows: Section 1 revises the details of the minimum vertex cover problem and the QAOA algorithm, Section 2 describes the optimizers used, and the test methodology followed in this work, and Section 3 presents our findings and discusses their significance.

Minimum Vertex Cover Problem
The Minimum Vertex Cover problem is an example of a binary optimization problem that is NP-complete.A vertex cover of a graph G = (V, E), is a set of vertices V ′ ⊆ V , such that for every edge e = (u, v) ∈ E, u ∈ V ′ ∪ v ∈ V ′ .The minimum vertex cover is a set of vertices V * , that is the smallest possible set satisfying the above condition for a given graph G.The minimum vertex cover problem is to find a set V * .
The minimum vertex cover problem can be formulated as the following binary optimization problem: Subject to: and: In Figure 1, we provide examples of graphs that illustrate the minimum vertex cover problem.Binary optimization problems, like the minimum vertex cover problem, have their solutions encoded in a bit string and require an algorithm capable of finding the appropriate bit string to minimize the cost function.For the minimum vertex cover, each bit in the bit string corresponds to a vertex in the problem graph.A bit value of 1 indicates that the vertex is in the cover set, and a bit value of 0 indicates that the vertex is not in the cover set.The QAOA is one such quantum algorithm capable of finding an approximate solution in the form of a bit string, read out of the quantum device directly through measurement.Each qubit corresponds to a vertex in the graph, and the measured value of 0 or 1, forms a bit string solution for the problem.

Quantum Approximate Optimization Algorithm
The quantum approximate optimization algorithm (QAOA) is used to solve combinatorial optimization problems using a hybrid quantum-classical framework 6 .Many real-world problems can be formulated such that the solutions are N-bit binary strings of the form which minimize the classical cost function for m clauses, C α (z) = 1 if clause α is satisfied by z and 0 otherwise 6 .Through the substitution of spin-operators σ z i for each z i in z, one can build the cost Hamiltonian H C , The cost hamiltonian for the minimum vertex cost problem is given by: for an appropriate choice of A and B 11 .These constant terms are introduced because the minimum vertex cover problem contains hard constraints, which are not compatible with the QAOA, which solves only quadratic unconstrained binary optimization problems.The constant terms A and B refer to the weighting constants in the Hamiltonian.B weights the primary objective, minimizing the size of the vertex cover, while A weights a constraint or penalty term, that every edge have at least one of its vertices in the minimum cover.Since QUBO problems are unconstrained by nature, soft constraints in the form of penalty terms, forming soft constraints, are required.Next, one can define a mixer Hamiltonian H M , Through the application of layers of alternating cost and mixer Hamiltonians to the initial state |+⟩ ⊗N , an equally-weighted superposition of all states in the computational basis, the QAOA circuit from Figure 2 is constructed 6 .This yields where p > 1 is the number of layers in the circuit with 2p parameters, ⃗ γ i and ⃗ β i with i = 1, 2, . . ., p.A classical optimizer can be used to alter the parameters, to minimize the expectation value,

3/13
Gradient-Free Constrained Optimization by Linear Approximation α (COBYLA) If ⃗ γ * and ⃗ β * minimize F p , and if the value of the true solution is given by z * , then the approximation ratio is given by, The approximation of the solution z * can then be obtained through sampling of the state prepared with the optimal parameters ⃗ γ * and ⃗ β * .In a fully fault-tolerant setting, the performance of QAOA improves as p increases, however due to limitations in the hardware currently available, the behaviour of QAOA at lower values of p is of great interest 10 .Some previous works have investigated strategies for improving the performance of QAOA at these small p's, such as using heuristic strategies for selecting the initial parameters, reducing the length of training required 10 .

Classical Optimizers
Variational quantum algorithms, QAOA included, find solution to given problems through the optimization of the ansatz parameters.There are a variety of classical optimizers that can be employed in these variational quantum algorithms.These optimizers can be grouped broadly into two categories: gradient-based and gradient-free.Gradient-based methods use gradient values during the optimization process.The gradient value evaluations can be done analytically for the expectation value of a quantum ansatz on a qubit hamiltonian, through the parameter-shift rule 12 , or more primitively, they can be estimated through finite differences.The parameter-shift rule is preferable because the analytic gradient is calculated through much larger variations of the ansatz parameters (and are therefore less susceptible to noise compared to finite differences), while still only requiring the original ansatz circuit in order to calculate the gradient (the same as finite differences).Gradient-free methods require only cost function evaluations, operating as a black-box optimizer.The optimizers compared are listed in Table 1.
This paper utilizes the parameter-shift rule for gradient calculations in all the gradient-based optimizers.

Classical Optimizer Comparison
The comparison of the aforementioned classical optimizers employed in the QAOA on the Min-Vertex Cover problem is described as follows.
The QAOA algorithm problem test set contains all 21 non-isomorphic 5-vertex graphs.The QAOA is applied to the Min-Vertex Cover problem ten times for each graph in the problem test set.This is repeated for each optimizer and each noise-level.Three levels of noise are considered, all resulting from the type of quantum simulation applied.These simulations are state vector, shot-based fault-tolerant, and shot-based with a sampled noise model, resulting in the noise-levels of noise-free, 4/13

QAOA Ansatz Depth Comparison
Once the most suitable classical optimizer is found, the next comparison finds the optimal depth of the QAOA ansatz circuit.In a state vector simulation of the QAOA algorithm, the accuracy increases as the number of layers increases, with the exact answer being achieved at the limit at which the number of layers approaches infinite.On a noisy quantum device, a deeper ansatz circuit will be more affected by noise, and hence the less the simulation will approximate the state vector simulation.This creates a trade-off between the theoretical accuracy increase achieved by increasing the number of layers, with the decrease in accuracy of the simulation caused by the decreased noise resistance that comes with the increased depth of the ansatz.More layers mean a more expressible ansatz (hence a better answer), but also more effects from noise (hence a worse answer).
For the three graphs in Figure : 3, we run a set of one hundred noisy simulations for each depth ranging from 1 to 10 layers, with the same noise model sampled in the optimizer comparison.We run the same simulations on a state vector simulator to show the theoretical convergence is achieved as the number of layer increases.

QAOA Ansatz Depth Recommendation Verification
Once the optimal depth is estimated from the experiments above, we seek to verify that these do indeed maximize the performance of the QAOA Algorithm on the Min-Vertex Cover problem for graphs of this size (5 qubits, 5 edges) in Figure : 4. We compare the QAOA with differing numbers of layers, and show that the solutions that are sampled from the QAOA with optimized parameters, are, on average, better when sampled from the QAOA with the optimal number of layers.It is expected that when the optimal number of parameters are used in the QAOA, the solutions sampled from the optimized QAOA ansatz will be on average better.c) respectively.Each box is made up of the 10 runs for each of the 21 non-isomorphic graphs.The three graphs capture the change in performance as one moves from application in a noiseless regime, to the application of the methods in the presence of realistic noise from a quantum device.This is demonstrates the change in performance that could be expected on a real device and it can be observed that as more noise is introduced to the system, there is a decrease in the performance of the classical optimizers, with some optimizers being more affected than others.Each problem instance optimization is done with a maximum of 5000 cost function evaluations.

Comparison of Classical Optimizers
The results for the different classical optimizers, gradient-free and gradient-based, for each ansatz depth from 1 layer to 5 layers are shown in Figure 5 for the state vector, shot-based and noisy simulations in 5a, 5b and 5c respectively.
In the state vector simulation, all ten optimizers appear to perform similarly, with no major distinction between the gradient-free and gradient-based optimizers.There is a clear trend that more layers yields a better approximation ratio.
When shot-noise is accounted for, the differences between the optimizers become noticeable.SPSA slightly outperforms the other optimizers in this setting.The gradient-based optimizers all perform equally well, equivalent to the performance of Powell.COBYLA and Nelder-Mead are noticeably affected by the inclusion of shot-noise.For COBYLA, the approximation ratio for five layers is equivalent to the approximation ratio achieved with four layers, suggesting that this optimizer had trouble effectively making use of the extra parameters in the ansatz due to the shot-noise.All other optimizers show noticeable improvement with the increase in the number of layers.
Finally, when the realistic noise model is incorporated into the simulation, the differences between classical optimizers become even more pronounced.SPSA is the best performing optimizer overall, achieving the best 4 and 5 layer approximation ratio in the presence of realistic noise-levels.COBYLA and Nelder-Mead are seriously negatively affected by noise, the approximation ratio for 5 layers is worse than that with 4 layers.All gradient-based methods show similar performance, equivalent to that of Powell.
Following these results, SPSA is used to run the noisy simulations for the QAOA ansatz depth comparison in Section 3.2 and the ansatz depth verification in Section 3.3.COBYLA is used for the accompanying state vector simulations as it was found to be the best optimizer in the state vector simulation.
As an interesting note, when limiting the number of function evaluations permitted by the classical optimizer, the gradientbased optimizers' ability to accurately converge is severely affected.Thus, when utilizing algorithms with an ansatz based on Hamiltonian simulation on quantum hardware, it is not recommended that one make use of these gradient-based optimizers.As these optimizers require more calls to the quantum device to evaluate the gradient, limiting this causes the observed degradation in performance.For the different types of variational circuits similar results were reported in 23,24 .Figures 6, 7 and 8, give the comparison between the state vector and noisy simulations for the three graphs respectively.

QAOA Ansatz Depth Comparison
In the state vector simulations, the expected trend is clearly apparent; that the approximation ratio of the QAOA algorithm improves steadily as additional layers are added.It is also clear that adding another layer seems to improve the approximation ratio less than the addition of the layer before.There is a diminishing return on accuracy increase with the number of layers added.
In the noisy simulations, as the number of layers increases, the effect of noise in the simulation becomes apparent.With the best approximation ratio achieved at around six layers for all graphs.Once this best approximation ratio is achieved, additional layers then begin to slowly worsen the approximation ratio.

QAOA Ansatz Depth Recommendation Verification
Figures 9a, 9b, 9c show the probabilities of sampling the correct solution for the minimum vertex cover problem on graphs 4, 5 and 6, for the noise and state vector simulation respectively.
It is clear from these graphs, that 6 layers appears to be optimal for these instances of the QAOA too, allowing the correct solution for the minimum vertex cover on these graphs to be sampled with the greatest probability.After 6 layers, additional layers decrease the probability of sampling the correct solution for the minimum vertex cover problem.

Discussion and Conclusion
The results from Section 3.1, the comparison of classical optimizers in the QAOA, show that the choice of classical optimizer has a significant effect on the algorithm in the presence of noise.A classical optimizer's performance in a state vector simulation does not accurately reflect its performance in a realistic noise setting.It appears that SPSA is the best classical optimizer for current levels of noise.This is a result of both its built-in stochastic nature making it more resistant to noise, and its efficient gradient approximation requiring only two cost function evaluations for any ansatz.These results are similar to those found in 8 , where SPSA was found to be the best classical optimizer in the noisy simulation, while COBYLA and Nelder-Mead were found to be the worst.
The results from Section 3.2, the comparison of depths for the QAOA ansatz, show that while in theory the more layers present in the QAOA the greater the accuracy, it is actually the case that in the presence of noise in the circuit, there is actually an optimal number of layers that provide the greatest accuracy.It is therefore important to use the correct number of layers in  order to utilize the QAOA algorithm to its full potential.Previous works have suggested other guidelines for ansatz depth based on factors such as time complexity of execution 25 .Section 3.3 shows that the probability of sampling the correct solution for the QAOA problem is also greatest at the optimal number of layers, and decrease as the number of layers increase beyond that.
It is left as a future work to fully characterize the trade-off level of noise in the circuit and the accuracy improvement yielded by adding additional layers to the QAOA.It is hoped that it will then be possible to estimate the optimal number of layers for a QAOA circuit for a given problem and a device's noise level.

Figure 1 .
Figure 1.In the three graphs above, the red nodes show the set of vertices forming each graph's respective minimum vertex cover.Each edge in the graph under consideration must have, at least one vertex in the cover.The cover forms the minimum cover of a graph, when it contains the fewest number of vertices, whilst ensuring each edge is still incident to at least one.

Figure 2 .
Figure 2. The QAOA circuit consists of p layers of the cost and mixer Hamiltonians, H C and H M respectively.The initial |+⟩ ⊗N state is prepared and every qubit is measured after applying the QAOA circuit.

Figure 3 .
Figure 3.The three graphs used in the QAOA depth comparison, referred to as graph 1, 2 and 3 respectively

Figure 4 .
Figure 4.The three graphs used in the QAOA depth recommendation comparison, referred to as graph 4, 5 and 6 respectively

Figure 5 .
Figure5.A comparison of the approximation ratio using 10 classical optimizers on the QAOA, each having depths 1 to 5 from left to right, using a statevector, shot-based, and noisy simulation in (a), (b) and (c) respectively.Each box is made up of the 10 runs for each of the 21 non-isomorphic graphs.The three graphs capture the change in performance as one moves from application in a noiseless regime, to the application of the methods in the presence of realistic noise from a quantum device.This is demonstrates the change in performance that could be expected on a real device and it can be observed that as more noise is introduced to the system, there is a decrease in the performance of the classical optimizers, with some optimizers being more affected than others.Each problem instance optimization is done with a maximum of 5000 cost function evaluations.

Figure 6 .Figure 7 .Figure 8 .Figure 9 .
Figure 6.Number of layers vs approximation ratio of a state vector (a) and noisy (b) simulation of QAOA for the minimum vertex cover on graph 1, with a maximum of 5000 cost function evaluations per problem instance.Each bar represents 100 runs of the QAOA for each number of layers, for the same minimum vertex cover problem.

Table 1 .
Table of Classical Optimizers Considered -The implementation of all classical optimizers are taken from the Scipy (α) and Qiskit (β ) Python libraries, under the respective functions scipy.optimize.minimizeand qiskit.aqua.components.optimizers.(References are given in the far-right column)