Characterizing local noise in QAOA circuits

Recently Xue et al. [arXiv:1909.02196] demonstrated numerically that QAOA performance varies as a power law in the amount of noise under certain physical noise models. In this short note, we provide a deeper analysis of the origin of this behavior. In particular, we provide an approximate closed form equation for the fidelity and cost in terms of the noise rate, system size, and circuit depth. As an application, we show these equations accurately model the trade off between larger circuits which attain better cost values, at the expense of greater degradation due to noise.


I. INTRODUCTION
We study noise in Quantum Approximate Optimization Algorithm (QAOA) circuits [2]. We will consider the original formulation of QAOA, with a transverse field mixer, though a generalization to the Quantum Alternating Operator Ansatz exists [3]. In the version we study, a QAOA circuit of N qubits is specified by a cost Hamiltonian diagonal in the computational basis [4] as well as 2d angles ( γ, β), d being the high-level depth of the circuit U (i.e., number of QAOA rounds): e −iβ k Hx e −iγ k Hc .
The Hamiltonian H x with off-diagonal elements, known as the 'mixing' Hamiltonian, is given by H x = N i=1 σ x i . We will assume throughout (though this is not necessary) the initial state is an equal superposition over all computational basis states, |ψ 0 = |+ ⊗N , with |+ = 1 √ 2 (|0 + |1 ). The goal of QAOA is to find approximate solutions to optimization problems phrased as Eq. (1), which can be achieved by finding 'good' angles ( γ, β) in order to minimize or maximize the expected cost, A very natural question to ask is how noise affects the performance of a QAOA algorithm. There are of course several possible ways to address this, from misspecification of angles, state preparation and measurement errors, environmental noise etc. Recently Xue et al. [1] introduced a model for how local environmental noise (acting on each qubit separately and independently) affects the circuit output which is amenable to a mathematical analysis, by decomposing the density matrix as a sum of pure states. Following this work, we will take a simple, yet realistic model to study the behaviour of the cost expectation . This simplification allows us to answer in more detail questions pertaining to how local noise causes deviations in the expected cost and what the relative trade-offs are with respect to system size, circuit size (depth) and the noise rate.

II. NOISE MODEL
Similar to Ref. [1], we assume that after each QAOA round a layer of local noise E p is applied to each qubit (see Fig. 1), of the form where we just specify one set of angles γ, β (i.e. d = 1), and writing ρ 0 = |ψ 0 ψ 0 |. Note that the noise layer can also be written as At this level, we can interpret the probability of the noise not acting at all as simply (1 − p) N , or, if one repeats this for d rounds, it is (1 − p) dN .
We focus initially on d = 1. Let us define |ψ 1 = U |ψ 0 (i.e. the ideal, noiseless QAOA-1 output). We can interpret the map E (n) p |ψ 1 ψ 1 | statistically, as in Ref. [1], as applying noise operator K (n) j (defined as identity on all qubits, and K j on n-th qubit [9]) with probability p j |ψ 1 . For ease of notation (though this restriction is not necessary), and considering we will later focus on such models, let us assume the K j are unitary (e.g. as in depolarizing noise), in which case we can drop the normalization coefficients, so that p (n) j = p M . Let us also adopt the notation |ψ . . . |ψ 1 , where the vector notation for j, n should be clear.
Then Eq. (5) can be written as where n m , j m are length m vectors, with each entry in n m distinct (i.e. no repeats). That is, m specifies the total number of noise operators acting; m = 0 is the case where no noise acts, and the other extreme, where m = N means every qubit has a noise operator applied. The second sum in Eq. (6) is therefore over M m × N m unique terms, using that each index in j can run from 1 to M (repeats allowed), and that n has elements ranging from 1 to N , but with the restriction each element is unique.

III. APPROXIMATE ANALYTICAL EXPRESSIONS FOR QAOA PERFORMANCE UNDER NOISE
A natural question is how ρ 1 compares to the ideal QAOA-1 output, |ψ 1 . Two key quantities of interest are the fidelity between the two, and the difference in expected cost. jn | 2 for a typical N = 8 problem instance [4] under depolarizing noise. Solid blue is through the exact mean overlap, fm, and red dash is fit κ −m to this using non-linear least-squares (on the log of the data). The horizontal black dash-dot line is the Haar random overlap value. The fit gives α = 0.9958; κ = 2.71. Inset: We plot for the depolarizing channel, the exact (blue) fidelity (Eq. (7)) as a function of p. We see the fitted curve (red dash) Eq. (10), with parameters extracted from fm in the main figure, matches very well for nearly all p.

A. Fidelity
First, let us consider the fidelity, or overlap, between ρ 1 and |ψ 1 In Ref. [1], it is posited, and shown numerically for the parameters studied, that the fidelity fits to a generic poly- . One aim of this note is to justify such an equation in the case of local unitary noise, such as depolarizing/dephasing noise. First we write, using Eq. (6) and (7), where the 'overline' represents the average (mean) value of the overlaps (i.e. over all possible n m , j m ). A reasonable assumption is that the average on the right depends only on the parameter m, and can therefore be written as a function f m = | ψ 1 |ψ ( nm) jm | 2 . With this, we can write the fidelity Eq. (7) as An example of this is shown in Fig. 2 for a typical problem instance. It now becomes clearer where the binomial form discovered in Ref. [1] comes from. First note we have f 0 = 1 by definition, and we expect f N ≈ 2 −N (the overlap squared of two random states). While it is tempting to set here f m = 2 −m , as the scaling is in general noise dependent, we use the slightly less restrictive form (satisfying f 0 = 1) and write instead f m ≈ 1 + α(κ −m − 1), and so, by the binomial theorem Due to the generality of the form f m we expect that many systems of interest will follow Eq. (10), though we mention that in our simulations here, we only consider depolarizing noise. Notice that this equation, derived from a few reasonable approximations, predicts a slightly different dependence on N compared to the assumed form from Ref. [1], though both are equivalent for sufficiently small noise rate p. Indeed, in the limit of small p 1, as may be expected in a reasonable quantum circuit, one can relate the exponent δ found in Ref. [1] as δ = α κ−1 κ . If we take, as in Fig. 2, α = 0.996, κ = 2.71, then δ = 0.63. In this case, we nearly have f m = κ −m , and simply As demonstrated in the inset of Fig. 2, our formula (Eq. (10)) also applies to when the noise is dominant (e.g. p > 0.5), and not just for 'small' p. Notice that the full curve over p ∈ [0, 1] cannot be replicated by a formula (1 − p) δN , which does not give the appropriate behaviour in the large p limit (it would diverge to −∞ if plotted on the same scale as the inset).

B. Cost
The expected cost for our noisy QAOA-1 is given by H c = Tr[H c ρ 1 ]. For well chosen angles (γ, β) using QAOA to minimize, we expect the cost to increase under the noise, i.e. C Ref. [1] posits that the dependence can be written as for some η > 0, where we use, matching the condition in the large noise limit p → 1, the Haar random expectation (12) For m = 0, the average term on the right is precisely C ideal , and for m = N it can be approximated by the Haar random expectation, which here is 0. As before, if for a typical N = 8 problem instance [4] under depolarizing noise. Blue (solid) line is through the exact cm, and red (dash) is fit χ −m to this using non-linear least-squares. The fit gives α = 1.04;α = −7.41; χ = 1.32. Inset: We plot for the depolarizing channel, the exact (bluesolid) cost as a function of p. We see the fitted curve (reddash) Eq. (13), with parameters extracted from cm in the main figure, matches very well for nearly all p. Note, the ground-eigenenergy of Hc is -14.
this quantity on the right only depends (to a good enough approximation) on m, we can replace it by a function . This structure suggests writing, as before, c m ≈ α +αχ −m , where α +α ≈ C (1) ideal . With this, again by the binomial theorem, showing a slightly different form from Ref [1]. In the small p limit however, one can extract the relation η = Using the values from Fig. 3, we get η = 0.28 for this example.
As in the case of fidelity, our equation also works to good accuracy in nearly the full range p ∈ [0, 1], i.e. in particular where p > 0.5 is not a 'small' parameter, which is shown in the inset of Fig. 3.

C. QAOA-d
For QAOA with d rounds (QAOA-d) we can extend the above analysis, which sheds light onto the depth vs. noise trade-off. Again, the model we follow is a round of QAOA, followed by N local noise channels, repeated d times, as in Fig. 1. Each noise channel is assumed to act identically on each qubit, and in each round. This means we can write an exact expression for ρ d (i.e. in the form Eq. (6)), where one replaces N with N d, since this is the total possible number of noise terms (the K noise (p) as a function of depolarizing probability for various circuit depths d, for a typical N = 6 instance [4]. For larger values of p > 0.3 the curves do not cross again, and all converge to the Haar random cost of 0 as p → 1. The black dash-dot line is the optimal cost (ground state), which is found to a good approximation for d ≥ 4 in the noiseless case (p = 0). Each colored dash line is from a non-linear least squares fit to a function of form C can act: where the second sum has a new index, l representing the QAOA layer (round) in which the noise operator acts. That is, the elements of the length m vectors ( l m , n m , j m ) tell us respectively which layer, qubit and operator the noise is acting. The second sum is over M m N d m terms. For notational transparency, the general form of the term in the second sum is: where U k is the ideal QAOA unitary in round k, and are the noise operators acting in that round (with k m k = m). Following the form of Eqs. (10), (13), one would simply replace 'N ' by 'N d' for F d and C (d) noise . Note that the fitting parameters e.g. α, χ, also depend on the depth d, since different depths correspond to completely different circuits.
Of practical interest is the relative trade-off between performing a greater number of rounds of QAOA, thus ideally obtaining a lower cost, but at the expense that there is additional noise acting to raise the cost. Clearly for large p, there is no benefit to going beyond d = 1 since the output will be close to the maximally mixed state.
In Fig. 4 we see the complex interplay between depth d and noise level p. Only for very small noise levels is it beneficial to go to large depths. Once the depolarizing probability exceeds 2% (p = 0.02) there is no reason to go beyond d = 3. For this example, if p > 0.25, the curves have inverted from the order at p = 0, i.e., here d = 1 is always the best choice.
If one can estimate the parameters (α, χ) for a few circuit depths of interest (e.g. as is done in Fig. 3), it is possible therefore to determine optimal depth circuit for a given noise rate p.

IV. CONCLUSION
We have extended a previous work on noise in QAOA. Ref. [1] analyzed noise in QAOA circuits by decomposing the density operator as a sum of pure states with different numbers of noise operators acting. We follow this, and through approximations backed up by numerical simulation, we show that a slightly different binomial form is achieved, thus generalizing the posited fidelity and cost functions found in that previous work. We find this equation predicts a trade-off between noise level and depth, i.e., accuracy of optimization, showing that for large enough noise rates, it is best to keep a shorter circuit.
This line of research opens up the route for further studies into noise in optimization circuits. In particular, the proposed approach can be used as one of many benchmarks in a suite of easily implementable algorithms tailored to assess performance of quantum devices. Since the method is scalable and tunable with respect to noise parameters -depth and number of qubits can play a proxy role for noise strength -one may compare theoretical predictions (subjected to a reasonable noise model) with experimental outcomes.

ACKNOWLEDGMENTS
We are grateful for support from NASA Ames Research Center, the AFRL Information Directorate under grant F4HBKC4162G001, the Office of the Director of National Intelligence (ODNI) and the Intelligence Advanced Research Projects Activity (IARPA), via IAA 145483, and NASA Academic Mission Services, Contract No. NNA16BD14C. We used QuTiP in our simulations [12]. The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of ODNI, IARPA, AFRL, or the U.S. Government. The U.S. Government is authorized to reproduce and distribute reprints for Governmental purpose notwithstanding any copyright annotation thereon. in the σ z , though in our simulations we only consider up to quadratic terms. In our simulations we use fully connected problems of various sizes, without local fields (hi = 0), where Jij ∈ {−1, 1} randomly chosen.
[5] We also find the same general effect in our simulations: optimal angles in the noiseless case are approximately optimal angles when performing optimization with noise. We use the BFGS algorithm to optimize QAOA angles, using random initial angles each time. We also repeat over several initializations to help avoid being trapped in local optima. Note that optimized angles for different circuit depths in general are completely different, since we optimize over the entire set of angles, and not round by round. In our simulations the angles are in range [0, 2π].
[6] For now we assume no restriction on the Kj other than that Ep is a quantum map, and so to preserve the trace, j K † j Kj = M I.
[7] For qubit-local depolarizing noise, M = 4, with Ki = σi, the identity and three Pauli x, y, z operators. Although we could move the identity term out of the sum in Eq. (4) (and changing '1 − p' to '1 − 3p/4'), it is convenient to leave it in as the second term is simply the maximally mixed state I, which is i) easier to handle mathematically, and ii) provides the natural interpretation of the channel, that one either receives the ideal state with probability (1 − p), or the maximally mixed state.
[8] To be more formal, we can write E (n) p = Id ⊗ · · · ⊗ Id ⊗ Ep ⊗ Id ⊗ · · · ⊗ Id, with the map Ep in the n-th position, and the remaining N − 1 identity Id channels acting on the other qubits, IdX = X.
[10] In Ref. [1], the 'N ' in F1 is the number of noise operations applied (number of gates). In our model, for QAOA-1, N is indeed the number of qubits, since we apply noise after each round and individually on each qubit.
[11] In Ref. [1], if there are terms proportional to the identity aI in Hc, so that Tr[Hc] = 0, the effect is to add a term (1 − (1 − p) ηN )a to Cnoise. In our analysis we do not need to do this as the equation we derive handles such a case through the freedom of parameters.