The dominant eigenvector of a noisy quantum state

Although near-term quantum devices have no comprehensive solution for correcting errors, numerous techniques have been proposed for achieving practical value. Two works have recently introduced the very promising error suppression by derangements (ESD) and virtual distillation (VD) techniques. The approach exponentially suppresses errors and ultimately allows one to measure expectation values in the pure state as the dominant eigenvector of the noisy quantum state. Interestingly this dominant eigenvector is, however, different than the ideal computational state and it is the aim of the present work to comprehensively explore the following fundamental question: how significantly different are these two pure states? The motivation for this work is two-fold. First, comprehensively understanding the effect of this coherent mismatch is of fundamental importance for the successful exploitation of noisy quantum devices. As such, the present work rigorously establishes that in practically relevant scenarios the coherent mismatch is exponentially less severe than the incoherent decay of the fidelity—where the latter can be suppressed exponentially via the ESD/VD technique. Second, the above question is closely related to central problems in mathematics, such as bounding eigenvalues of a sum of two matrices (Weyl inequalities)—solving of which was a major breakthrough. The present work can be viewed as a first step towards extending the Weyl inequalities to eigenvectors of a sum of two matrices—and completely resolves this problem for the special case of the considered density matrices.


I. INTRODUCTION
Quantum devices can already prepare complex quantum states whose behaviour cannot be simulated using classical computers with practical levels of resource [1,2].Sufficiently advanced quantum computers may have the potential to perform useful tasks of value to society that cannot be performed by other means, such as simulating molecular systems [3].However, the early devices are incapable of error correction as required for faulttolerant universal systems that we expect to emerge eventually.Since the implementation of general quantum error correcting codes (QECs) is prohibitively expensive, the early machines do not have a comprehensive solution to accumulating noise [4].Nevertheless, very promising applications have been proposed for exploiting Noisy Intermediate-Scale Quantum (NISQ) devices: variational quantum eigensolvers VQE and similar variants are expected to be able to solve important, practically relevant problems, such as finding ground states or optimising probe states for quantum metrology and beyond [5][6][7][8][9][10][11][12][13][14][15][16][17][18][19].Refer also to the recent reviews [20][21][22].
The control of errors is thus fundamental to the successful exploitation of quantum devices and numerous proposals have been put forward to mitigating errors in noisy machines.These typically aim to learn the effect of imperfections on expectation values of observables and try to predict their ideal, noise-free values.A very promising approach has recently been introduced by two independent works and was named Error Suppression by Derangements (ESD [23]) and Virtual Distillation (VD [24]).This technique prepares n copies of the noisy quan- * balint.koczor@materials.ox.ac.uk tum state and in turn allows to suppress errors in expectation values exponentially when increasing n.The approach relies on the assumption that the dominant eigenvector of a noisy quantum state, as modelled by a density matrix ρ, approximates the ideal computational state.
This brings us to the core question of the present work.Given a noisy quantum state ρ, how well does its dominant eigenvector approximate the state that one would obtain from a perfect, noise-free computation?It was already noted in refs.[23,24] that even incoherent errors will in general introduce a drift in the dominant eigenvector.This drift was named 'coherent mismatch' and 'noise floor' by the two works.The aim of the present work is to comprehensively answer the above question by deriving rigorous lower and upper bounds and scaling results.The motivation for this work is two-fold.
First, the very promising ESD/VD approach crucially relies on the above assumption that the dominant eigenvector is a good approximation of the ideal computational state.However, a drift in the dominant eigenvector, the coherent mismatch, can crucially influence the efficacy of the error suppression as illustrated in Fig. 1.It is therefore vital for the successful exploitation of the technique to comprehensively understand the drift in the dominant eigenvector.Fig. 1 also shows that the previous 'pessimistic' upper bound √ c is quadratically reduced as c if the aim is to prepare eigenstates (see Sec. II A).This is very encouraging since in fact most near-term quantum algorithms aim to prepare eigenstates [20][21][22].
Second, understanding how noise affects quantum states is of fundamental importance.While the mathematical formalism for describing noise processes has been much investigated in the literature, there are still open questions.Indeed, understanding noise in quantum systems is vital for the successful exploitation of noisy quantum devices, however, the appropriate modelling of quan-tum systems has significant implications in mathematics.As such, the present work makes exciting connections to important problems in mathematics, such as bounding eigenvalues of a sum of two matrices and bounding norms of commutators.
Let us now briefly summarise the most important results in relation to the above two points ordering them thematically -while a more detailed discussion of the results is presented in Sec.V that follows their order of appearance in the manuscript.In Sec.III we explicitly construct a family of worst/best-case extremal quantum states that saturate the present upper and lower bounds of the coherent mismatch.These extremal states then allow us to generally understand the coherent effect of incoherent noise channels in quantum systems and to argue about the efficacy of the ESD/VD approach in complete generality -and prior perturbative approximations fail in this regime [23,24] which is discussed in Sec.III A 2. As such, in Sec.III A 5 we rigorously prove that even in the worst-case scenario one needs at least 3-4 copies in practice to suppress incoherent errors to the level of the coherent mismatch: thus near term quantum devices will be guaranteed to be oblivious to such coherent effects if they are limited in preparing a large number of copies.
In Sec.IV we analyse typical quantum circuits used in near-term quantum devices: We derive guarantees that the coherent mismatch decreases when increasing the size of the computation (even exponentially when increasing Rényi entropies of the errors, see Sec.III A 3).We finally conclude that the coherent mismatch is exponentially less severe when increasing the circuit error rate than the severity of the incoherent decay of the fidelity -where the latter can be suppressed exponentially with the ESD/VD approach.We also prove in Sec.III B that our lower and upper bounds nearly coincide in the practically most important regions thus tightly confining the possible values the coherent mismatch can take up.
As mentioned above, the present work is closely related to important themes in mathematics, such as bounding the eigenvalues of a sum of two matrices (Weyl's inequalities) and we discuss these connections in Sec.II B. As such, the present work can be viewed as a first step towards extending Weyl's inequalities for eigenvalues to the highly non-trivial case of the eigenvectors of a sum of two matrices -and we present a complete resolution of this problem for the special case of the considered density matrices.Furthermore, another open question in mathematics was concerned with bounding the norm of a commutator and this problem was only very recently solved [25][26][27][28][29][30][31].The present work significantly tightens those bounds for the special case of the considered density matrices in Sec.III B 1.
We note that the following sections of the manuscript will gradually build on each other and the appearance of results might differ from the thematic ordering of the above summary.Let us now introduce the core problem in more detail in Sec.I A and then recapitulate the most important notions in the context of the ESD/VD An important application of the present work is that it allows one to determine the ultimate precision of the ESD/VD error suppression technique [23,24].The trace distance (blue line) used in ref [24] is given by the square root of the coherent mismatch c from [23] and generally upper bounds the error | ψ id |O|ψ id − ψ|O|ψ | ≤ 2 √ c in estimating expectation values (with O ∞ = 1) with the ideal computational state |ψ id vs. the dominant eigenvector |ψ .Most randomly generated quantum states (blue dots) are significantly below this pessimistic general bound (blue line) -which was already noted in ref [24].We show that this error bound is quadratically smaller as 2c (orange line) in the specific but pivotal case of preparing eigenstates (orange rectangles) -the aim of most near-term quantum algorithms.Refer to Sec.II A. approach in Sec.I B.

A. Problem definition
Let us first introduce the most important notions used in this work.Recall that a pure quantum state |ψ id is an element of a d-dimensional Hilbert space.In an ideal quantum computation this quantum state is prepared by a unitary quantum circuit (unitary transformation) as |ψ id := U c |0 that acts on a reference state.The quantum circuit is typically decomposed into a product of (universal) gates as In a realistic setting where the quantum gates are imperfect (or when the errors are not corrected) the actual quantum state needs to be modelled by a density matrix ρ := Φ c ρ 0 that is prepared via a CPTP [32] map Φ c .For example, this noisy circuit is typically decomposed into a series of individual noisy gates as but this in general is only an approximation due to the presence of possible correlated noise.
Let us introduce another 'representation of noise': We show in Appendix A that a large class of density matrices admit the decomposition ρ = ηρ id + (1 − η)ρ err for some constant η > 0.Here ρ err is a valid density matrix that can be interpreted as an error state that occurs with probability 1 − η and is incoherently superimposed (mixed) with the ideal computational state ρ id := |ψ id ψ id | which occurs with probability η.
Let us now consider a simple, but practically very important example to illustrate the previous point: an error model Φ c = Φ ν • • • Φ 2 Φ 1 in which errors happen during the execution of an individual quantum gate with probability and thus the corresponding Kraus-map representation of the k th noisy quantum gate can be defined as Here M jk corresponds to some (arbitrary) error event and K determines the Kraus rank of the error model while U k is the ideal unitary gate.A large class of noise channels that are typically used to model errors in quantum circuits admit this form, for example dephasing, bit flip and depolarising errors [32].Within this error model we can straightforwardly obtain the decomposition in Eq. ( 4) into an ideal state ρ id and an error density matrix via the probability η = (1 − ) ν ; indeed the error matrix via (1 − η)ρ err = Φ c ρ 0 − ηU c ρ 0 U † c can be shown to be a valid density matrix.The completely general case is discussed in Appendix A.
Before stating our main problem, let us recall that a density matrix is a trace-class operator with trace norm ρ 1 = 1 and trace Trρ = 1, and thus it can be written in terms of its spectral resolution as where |ψ , |ψ k are eigenvectors and λ, λ k are nonnegative eigenvalues.We assume descending order throughout this work as λ > λ 2 ≥ λ 3 . . .and assume that the density matrix has a distinguished, unique dominant eigenvalue λ (no degeneracy).
The core problem considered in the present work is the following: the dominant eigenvector |ψ of the noisy quantum state ρ will be different from the ideal computational state |ψ id (and from eigenvectors of ρ err ), except in the special case when ρ err and ρ id commute.The reason is that in the commuting case the two density matrices share the same eigenvectors and thus their sum will share the same eigenvectors too.However, in realistic physical systems ρ err and ρ id are highly unlikely to commute.Surprisingly, even a completely incoherent noise channel-such as depolarising and dephasing as described below Eq. ( 1)-can introduce a coherent mismatch resulting in a coherently shifted dominant eigenvector as |ψ = √ 1 − c|ψ id + √ c|ψ ⊥ in Eq. (2).Our aim is to characterise and generally upper bound this coherent mismatch c.Let us first formalise our definition of the coherent mismatch c and then briefly motivate this work via important scenarios where this coherent mismatch plays a crucial role.For example, the present problem is very closely related to the well-known case of bounding the eigenvalues of a sum of two matrices which we discuss in Sec.II B Definition 1.We define the coherent mismatch as the infidelity between the dominant eigenvector |ψ of a noisy quantum state ρ from Eq. 2 and the ideal computational state |ψ id as (3) Here we also define the fidelity F := ψ id |ρ|ψ id .For some of the arguments later we will make use of the decomposition into a sum (for some η > 0) of the ideal computational state ρ id := |ψ id ψ id | and a suitable error density matrix ρ err (see text above).For this decomposition we can define the ratio of eigenvalues as δ := (η −1 − 1) µ 1 , where µ 1 is the largest eigenvalue of ρ err .
Notice that the the eigenvectors |χ k above are generally different than the ones in Eq. ( 2) (non-commuting case).Let us remark that while the decomposition in Eq. ( 4) is very useful for illustrating and motivating the present problem, it is not necessary and some of the later results in this work will be independent of this decomposition.Refer to Appendix A for more details.

B. Error suppression
Two recent works [23,24] have introduced an approach which can suppress errors exponentially when preparing n copies of a noisy quantum state -and which was named error suppression by derangements (ESD) and virtual distillation (VD).The core idea behind the approach is that it prepares n identical copies of a noisy computational quantum state ρ and uses the copies to 'verify each other' by applying a derangement operation (generalisation of the SWAP operation that permutes the n registers).This filters out all error contributions that break global permutation symmetry among the copies, hence allows for exponential suppression when increasing n.
While ref. [24] mostly focuses on the n = 2 scenario and proposes a resource efficient variant that does not require an ancilla qubit when n = 2, ref.[23] presents explicit constructions of the approach for n ≥ 2. A possible implementation is illustrated in Fig. 2 which uses a controlled-derangement operation and allows one to measure expectation values of the form Tr[ρ n O]/Tr[ρ n ] with respect to an observable O.In this regard ref. [23] notes that for n > 2 a large number of possible derangement patterns exist while a qubit-efficient one was proposed in the follow-up work [33].
When increasing the number of copies n, the 'virtual' quantum state ρ n := ρ n /Tr[ρ n ] approaches the dominant eigenvector from Definition 1 in exponential order.Since Quantum circuit of a possible implementation of the ESD/VD error suppression approach [23,24] (figure adopted from [23]).n copies of the noisy quantum state ρ are prepared and entangled via a controlled-derangement operator Dn -a generalisation of the SWAP operation that permutes the n quantum registers.The probability prob 0 when measuring the ancilla qubit is proportional to the expectation value of the observable Tr[ρ n O] in the 'virtually distilled' states ρ n .For large n, the approach ultimately allows to measure expectation values in the pure state |ψ as the dominant eigenvector of ρ from Eq. ( 2).A qubit-efficient construction was proposed in [33].
the dominant eigenvector |ψ is generally different from the ideal computational state |ψ id via Definition 1, the coherent mismatch limits the ultimate precision of the ESD and VD approaches.Ref. [23] defined the coherent mismatch c (see Definition 1) to determine this discrepancy.
Similarly, the 'noise floor' was defined in ref [24] to express the discrepancy between the 'virtual' quantum state ρ n and the ideal computational state ρ id in the limit of a large number of copies via the trace distance T (ρ n , ρ id ).We prove in Appendix B that this noise floor is equivalent to the coherent mismatch up to a squareroot as which confirms that indeed the notions of the coherent mismatch and noise floor are equivalent: ultimately they both express the infidelity between the pure states |ψ id and |ψ .Both works used a perturbative expansion of the dominant eigenvector |ψ to approximate this infidelity.While such perturbative series may be accurate in the limit of very low noise η → 1 in Eq. (4), they are not applicable to the practically relevant scenario when quantum states accumulate a large amount of noise.Furthermore, we establish in Remark 2 that the perturbative series diverges in the worst-case scenario region.It is thus the aim of the present work to derive generally applicable upper bounds and approximations of the coherent mismatch that are generally applicable in any scenario.As such, our bounds in Sec.III are saturated by extremal worst-case quantum states.We will use these bounds to generally argue about the efficacy (number of copies, entropies etc.) of the error suppression technique in complete generality in Sec.III -which is beyond the scope of perturbation theory.
Ref. [24] argued that the coherent mismatch is zero if the error channel maps only to orthogonal states.Indeed such special density matrices are an instance of the general class when ρ err and ρ id commute as discussed above.Interestingly, we show that the worst case scenario quantum states, which maximise the coherent mismatch in Theorem 2, have eigenvectors that are all orthogonal to the ideal state except for the dominant error eigenvector.This highlights that, somewhat counter-intuitively, the orthogonal error models proposed in ref [24] produce quantum states (with c = 0) that are actually close in state space to the worst-case quantum states (with almost all eigenvectors orthogonal to the ideal state) that maximise c.
Ref. [23] noted that the coherent mismatch is necessarily zero when noise density matrices ρ err commute with the ideal state, and gave the example of single qubit systems undergoing depolarising noise.Ref. [24] numerically simulated this kind of scenario via non-entangling (random) circuits undergoing depolarising noise and found that the noise floor is indeed zero.Indeed, local depolarising noise in single-qubit systems maps to errors ρ err = Id/d that commute with the ideal, unentangled state and one trivially finds that c = 0, regardless of whether the circuits are random or not.As such, ref. [24] demonstrated that the noise floor √ c is indeed non-zero and significant even for relatively deep, random entangling circuits.Results in Sec.IV can be applied to such random circuits and confirm the numerical observations that the coherent mismatch is non-zero and decreases when increasing the depth of the circuit.
Ref. [23] additionally observed numerical scaling results of the coherent mismatch in terms of the number of gates and number of qubits in noisy quantum circuits.We confirm these scaling results in Sec.IV using general upper bounds.Before stating the main results, let us first motivate the practical relevance of the present work.

II. MOTIVATION A. Ultimate precision in error suppression
The previously introduced ESD and VD approaches allow one to estimate the expectation value ψ|O|ψ for sufficiently large n.This expectation value can be biased due to the coherent mismatch of the state |ψ and will generally deviate from the ideal expectation value ψ id |O|ψ id .
While we define and compute the coherent mismatch in terms of distance measures on the quantum states, one can indeed relate it to the more practical question of how much error the discrepancy between |ψ id and |ψ introduces into the measurement of an observable ψ|O|ψ .Ref. [24] proposed that the trace distance generally up-per bounds these observable measurement errors as where O ∞ is the absolute largest eigenvalue of the observable, refer to Appendix B for a proof.The second equality relates the trace distance to the coherent mismatch c.While this trace-distance measure is a general upper bound, it was already noted in ref. [24] that this bound is very pessimistic in practically relevant scenarios.We demonstrate this in Fig. 1 (blue): We randomly generate 10 4 quantum states and normalised observables (i.e., O ∞ = 1) for randomly selected dimensions between 2 ≤ d ≤ 100 and compute the actual error in the observable measurements as | ψ id |O|ψ id − ψ|O|ψ |.While in Fig. 1 (blue) some random states get relatively close, indeed, most of the randomly generated states are orders of magnitude below the upper bound.
To support the observation of ref. [24] with a rigorous statement, we determine an alternative bound in Appendix B for the specific but pivotal case when the aim is to prepare eigenstates of the observable.Note that the majority of quantum algorithms that target early quantum devices actually aim to prepare eigenstates of certain Hamiltonian operators as O ≡ H, see e.g., the review articles [20][21][22].Remarkably, we show in Appendix B that if the quantum device prepares an eigenstate of the observable then the error in estimating the ideal expectation value is upper bounded as which is a quadratically smaller (in c) bound than the one in Eq. (5).We demonstrate in Fig. 1 (orange) that the measurement errors in case of eigenstates are indeed orders of magnitude below the pessimistic bounds (blue) and are generally upper bounded by the orange line.Furthermore, in Fig. 6 we illustrate that in practical applications, such as the variational quantum eigensolver (VQE), even approximate ground states produce errors significantly below the general bound.
The above bounds all depend on the actual value of c, and it is thus the aim of the present work to comprehensively determine the coherent mismatch.

B. Related problems in mathematics
Let us now relate the present work to important themes in mathematics.In particular, it is a well-known problem in mathematics to generally bound eigenvalues of a sum of two Hermitian matrices.The problem was first proposed by Weyl in 1912 [34]: given two Hermitian matrices A and B with eigenvalues α k and β k , how does one determine the eigenvalues s k of the sum of the two matrices S = A+B?Weyl's partial solution to this problem determines the possible range that the eigenvalues of S can take via the inequalities where d is the dimension of the matrices and the eigenvalues are arranged in descending order.A typical application of these inequalities is to bound the possible eigenvalues of the sum as s k ≤ a k +β max with β max ≡ β 1 .These partial results can be proven by minmax methods which can already be a considerable task.
Following a series of major breakthroughs in mathematics, this problem has only been solved relatively recently to a full extent using honeycomb structures [35][36][37][38].The final resolution specifies a set of inequalities in terms of the eigenvalues a k , b k , s k .We refer the interested reader to the excellent article [39].
This highlights the complex and difficult nature of predicting the eigensystem of the sum of two matrices.While bounds on eigenvalues have been completely solved by the application of the honeycomb structures, much less is known about the eigenvectors of the sum of two matrices.It is the aim of the present work to determine general bounds on the dominant eigenvector of the sum of two matrices as introduced in Definition 1.
The current problem is, however, special: while we do not make any assumption about the matrix ρ err , our matrix ρ id is a rank-1 projector and thus its eigenvalues are a k = 0 for all k ≥ 2. Due to this special structure, Weyl's inequalities are significantly simplified in the present scenario, and this allows us to obtain the following straightforward bounds.
Remark 1. Straightforwardly applying Weyl's inequalities generally guarantees that λ 2 < λ and thus the dominant eigenvector corresponds to |ψ id as long as δ < 1 due to the following bounds.In particular, applying Weyl's inequalities to Definition 1 suffices to generally upper bound the two largest eigenvalues λ and λ 2 of the noisy density matrix from Eq. (2) (or similarly any other eigenvalues) as Here η and δ were defined in Definition 1.
Although this work considers relatively special matrices, it is a considerable task to go beyond eigenvalues and to determine eigenvectors of a sum of two matrices, i.e., as relevant for the coherent mismatch.Let us highlight how the present problem crucially deviates from the previously discussed case of eigenvalues.
The Weyl inequality in the above remark is saturated when the two matrices have the same dominant eigenvectors leading to an extremal shift in the dominant eigenvalue.This however implies that ρ err and ρ id commute thus leading to a coherent mismatch that is zero, i.e., no shift in the dominant eigenvector.On the other hand, in Sec.III A 3 we determine extremal states that maximise the coherent mismatch and their structure is indeed in stark contrast to the case of the eigenvalues.
It is worth noting that the present work makes connections to and uses results from other topics in mathematics: analytical results are used for computing eigenvalues and eigenvectors of arrowhead matrices in Sec.III A 1 and new bounds are established in Sec.III B 1 for the matrix norm of commutators -this improves upon known general results in the considered specific scenarios.Let us now derive our results.

A. General upper bounds and extremal states
Let us first exploit that the present work considers a relatively special structure since the matrix ρ id is a rank-1 projector: we now introduce a special decomposition of the matrix ρ which will allow us to compute c analytically and thus to construct extremal, worst-case scenario quantum states, i.e., families of quantum states that are guaranteed to saturate upper bounds on c.
We have applied a unitary transformation ρ : , . . ., d} with d denoting the dimension, and all other matrix entries are zero.
Refer to Appendix C for a proof.These so-called arrowhead matrices have unique properties and have been investigated in the literature extensively.For example, certain matrix algorithms use arrowhead matrices to speed up computations [40] and further applications include, e.g., the description of radiationless transitions in isolated molecules [41] or of oscillators vibrationally coupled with a Fermi liquid [42].Let us mention two remarkable properties of these special matrices.
First, Cauchy's interlacing theorem guarantees that the entries D k satisfy the general interlacing inequalities with the eigenvalues λ k from Eq (2) as refer to, e.g., ref. [43] for more details.
Second, if one knows the explicit representation of the arrowhead matrix, i.e., knowing the matrix entries D k , C k and F , then the eigenvalues λ and λ k can be obtained as roots of the secular function [43]

Analytically solving the coherent mismatch
The most important consequence of the previously introduced arrowhead structure is that, given the knowledge of the decomposition of the density matrix ρ into the arrowhead form, we can analytically solve its eigenvectors and obtain an analytical expression for the coherent mismatch.
Statement 2. We can analytically compute the coherent mismatch in terms of the dominant eigenvalue λ from Eq. (2) and in terms of the arrowhead matrix entries Refer to Appendix D for a proof.The above formula allows us to analytically compute the coherent mismatch if the arrowhead form of the density matrix is known.Even though we do not necessarily know such a decomposition explicitly for arbitrary quantum states ρ, the above formula is a very important ingredient for our following derivations and allows us to derive general upper and lower bounds on the coherent mismatch.Before stating these results, let us briefly remark on the striking resemblance of the above equation to perturbation theory.
Remark 2. Using first-order perturbation in order to approximate the dominant eigenvector (refer to, e.g., Eq. (5.1.44)in [44] and to Eq. (10.2) in [45]) enables us to estimate the coherent mismatch as This approximation is formally similar to the exact analytical formula of the coherent mismatch from Statement 2, but note that here we need to divide with the factor (F − D k ) 2 and not with (λ − D k ) 2 .This approximation breaks down in the region when quantum states accumulate a large amount of noise and F ≈ D k .
Refer to Appendix E for a proof.It is interesting to note the connection to first-order perturbation theory , which also confirms that, indeed, the above expression is accurate when the noise in the state (via η → 1 in Eq. ( 4)) is vanishingly small and thus we obtain F ≈ λ with F D 2 .

Upper bound via extremal states
We will now use the above introduced arrowhead decomposition of density matrices and derive a family of quantum states that maximise the coherent mismatch.We analytically solve this optimisation problem in Appendix F and find that the maximum of the coherent mismatch is attained only by the following extremal density matrices: In the arrowhead representation of these states the only non-zero off-diagonal component is given by C 2 (all other off-diagonal components are zero as C k = 0 for k > 2) while the diagonal entries F and D k can be arbitrary.
Due to this simplified structure, we can analytically compute the coherent mismatch which then serves as a general upper bound.
Theorem 1.The coherent mismatch is generally upper bounded as where δ was defined in Definition 1.This upper bound is saturated by an infinite number of worst-case error density matrices ρ err whose dominant eigenvector |χ has a non-zero overlap with the ideal state |ψ id as and all other eigenvectors of ρ err are orthogonal to the ideal state |ψ id .The coherent mismatch is maximised when α = (1 + δ)/2 and note that the two basis vectors are orthogonal φ 2 |ψ id = 0.
The above theorem establishes that the worst kind of error density matrices ρ err are the ones in which only the dominant eigenvector has a non-zero overlap with the ideal state ρ id while all other eigenvectors are orthogonal to the ideal state.Only these kind of errors can saturate the general upper bound on c, however in stark contrast, quantum circuits in near-term quantum devices typically produce error density matrices whose eigenvectors are highly unlikely to be orthogonal to the ideal state.It thus stands to reason that the extremal error density matrices are highly unlikely to appear in practice, and thus practically relevant noisy quantum states are expected to be significantly below this bound.
An important implication of the above theorem for practical applications is that the error bound depends on the dominant eigenvalue µ 1 of the noise state ρ err (since δ is proportional to µ 1 ).This eigenvalue depends exponentially on the Rényi entropy µ 1 = e −H∞ which generally lower bounds all other Rényi entropies as H ∞ ≤ . . .H 2 ≤ H 1 .We are thus guaranteed that the coherent mismatch decreases exponentially with Rényi entropies of the error density matrix eigenvalues.Similar exponential scaling results were obtained in ref. [23] for the ESD approach and it was noted that near quantum hardware are be expected to produce large entropy quantum states.As such, a significant advantage of the present upper bound is that the parameter δ depends only on spectral properties of the quantum state, i.e., eigenvalues and Rényi entropies, which may be estimated in experiments [46][47][48][49][50][51][52].Fig. 3 shows the coherent mismatch in case of 5 × 10 4 randomly generated quantum states.Orange rectangles (blue dots) in Fig. 3 correspond to quantum states whose dimension d was generated uniformly randomly in the range 2 ≤ d ≤ 8 (2 ≤ d ≤ 1024).Indeed, saturating the upper bound (dashed black line) is significantly less likely in larger dimensions (blue rectangles are significantly below upper bound).This is expected since the extremal quantum states occupy a rapidly decreasing portion of the full volume of state space.Refer to Appendix L for more details.

Limiting scenarios
We have found in the previous section that the error states ρ err that saturate the error bound depend on the parameter δ which quantifies the ratio of the eigenvalues (ideal state vs. dominant eigenvalue of the error state ρ err , see Definition 1).
In the limiting scenario when the contribution of the error density matrix ρ err is much smaller than the ideal state we obtain the limit δ → 0. In this limit the dominant eigenvector of the extremal error state ρ err is an equal superposition due to Theorem 1 where |φ 2 is an arbitrary error state that is orthogonal to |ψ id .This also informs us that the extremal quantum states in the practically relevant regime (i.e., for small δ) have dominant error vectors of the form, i.e., |χ ≈ (|ψ id + |φ 2 )/ √ 2. On the other hand, when the contribution of the error state is as strong as the ideal state via δ → 1 then the worst-case error vector is almost orthogonal to the ideal state via some small ω 1 Surprisingly, we find that the global worst-case error, i.e., when c = 1/2, can only be saturated by the quantum state in the limit when ω → 0 (one must compute the limit only after computing c) as To illustrate this, in the second equation above we have computed the matrix representation of the quantum state in the 2-dimensional subspace spanned by the orthonormal vectors |ψ id and |φ 2 to leading order in ω.Indeed, the dominant eigenvector of this density matrix is the vector (1, 1) T / √ 2 (up to an error O(ω)) and this vector has a fidelity 1/2 + O(ω) to the ideal computational state (1, 0) T .The limit of the coherent mismatch lim ω→0 c = 1/2 is thus well-defined, however, note that the state itself in the limit becomes trivially the identity matrix (commuting case).
Interestingly, here we find exactly the opposite behaviour when compared to the case of eigenvalues in Weyl's inequalities in Sec.II B. Recall that for the sum of two matrices ρ = (|ψ id ψ id | + ρ err )/2 the extremal shift to the eigenvalues (Weyl inequalities) is saturated when the dominant eigenvector of ρ err is actually |ψ id .In stark contrast, we have found above that the extremal coherent mismatch (extremal shift in the dominant eigenvector) is saturated only in the limit when the dominant eigenvector of ρ err is orthogonal to |ψ id .

Application to error suppression
Let us finally remark on the implications of the above results to the performance of the ESD and VD approach.Recall that ref. [23] established general scaling results on how many copies n are required to reach a precision E in suppressing the noise when measuring expectation values in the dominant eigenvector.
Let us now assume that the aim is to suppress this error level E to the level of error caused by the coherent mismatch (assuming normalised observables O ∞ = 1).Consistent with Theorem 1, we assume that the quantum state and the noise is of the form of Eq. ( 4) as ηρ id +(1−η)ρ err and assume that the quantum states are considerably noisy with 1 − η being sufficiently large as relevant in practice, i.e., η ≤ 2/3 in general and η ≤ 4/5 when we aim to prepare eigenstates.These two conditions correspond to circuit error rates ξ > 0.41 and ξ > 0.22, respectively, which is reasonable to assume in practice.If we set our target precision to be the general trace distance bound from Sec. II A as E = 2 √ c, then we obtain the following result in Appendix G: we find that we need at least 3 copies to reach the target precision with the worst-case extremal states.Interestingly, ref. [33] found in numerical simulations of noisy derangement circuits that, for the considered circuits, at least 3 copies were required to reach a noise floor determined by the coherent mismatch and by the noise in the controlled-SWAP operations.
On the other hand, if our aim is to prepare eigenstates as discussed in Sec.II A, then the coherent mismatch is guaranteed to cause a quadratically smaller error.We thus set the target precision to E = 2c and find in Appendix G that we need at least 4 copies to reach the noise floor in the practically relevant region where states are considerably noisy (via circuit error rates ξ > 0.22).This confirms prior numerical simulations (Fig. 4 in ref. [23]).
While we have derived these results for the extremal quantum states, it stands to reason that in more realistic scenarios one may need significantly more copies to reach the precision as limited by the coherent mismatch.Furthermore, these arguments establish that as long as the quantum device is limited to preparing only a small number of copies (e.g., 2, 3 or 4 depending on hardware constraints) of the noisy quantum state, then the error introduced by the coherent mismatch will be guaranteed to be smaller than the error caused by having too few copies (not sufficient suppression).4).Another significant advantage is that this upper bound comes with a similarly scaling lower bound: for small c ≤ 10 −3 all randomly (uniformly with respect to the Haar measure) generated states (blue dots and orange rectangles) nearly saturate the upper bound (dashed black lines) due to the asymptomatically coinciding lower and upper bounds.

B. Lower and upper bounds via commutators
While the previously derived bounds are tight as they are saturated by the extremal states, they can be generally very pessimistic since the extremal states are very unlikely to be relevant in practice.This is nicely illustrated in Fig. 3 where most randomly generated quantum states are significantly below this bound (dashed black lines) especially as the dimensionality grows (orange vs. blue).In fact, the previous bound can be arbitrarily pessimistic, since generally there is no lower bound of c in terms of δ: when δ is non-zero then c can still be zero when ρ err and ρ id commute.This leads to our next point: to derive general upper and lower bounds in terms of the commutator.These bounds will in turn be independent of the non-unique decomposition in Eq. ( 4) and will also allow us to derive scaling results due to the asymptotically coinciding lower and upper bounds.

Expressing the commutator norm
As we discussed above, if the error matrix ρ err commutes with the ideal state ρ id than the coherent mismatch must vanish.Similarly we would expect that if the commutator is 'large' than the coherent mismatch should also be large.In the following we would like to introduce a measure of how large the commutator is.For this purpose we will use a suitable matrix norm • which we will aim to upper bound.
Interestingly, it has been an open problem in mathematics to upper bound the Hilbert-Schmidt or Frobenius norm of the commutator between two matrices and was only very recently solved for general matrices, refer to refs.[25][26][27][28][29][30][31] for more details.In particular, it was found that the norm of the commutator of two generic matrices is upper bounded as As opposed to generic matrices, in the present case we aim to express the norm of the commutator of two density matrices [ρ id , ρ].Although we make no assumption about ρ (except that it is a density matrix), ρ id is a special matrix, i.e., a projector, since it represents a pure quantum state.This property allows us to express the commutator norm more explicitly.
Statement 3. We analytically solve the eigenvalues and eigenvectors of both the matrix C from Statement 1 and the commutator [ρ id , ρ].We establish that both matrices have only two non-zero eigenvalues as Spec(C) = {±σ} and Spec([ρ id , ρ]) = {±iσ}.It follows that their matrix norms are equivalent for all 0 ≤ p ≤ ∞.The eigenvalue can be computed as which expresses a generalised variance of the density matrix and F is the fidelity.
Refer to Appendix H for a proof.Interestingly, we can directly relate the off-diagonal entries of the arrowhead matrix-as determined by C-from Statement 1 to the commutator [ρ id , ρ].Furthermore, the above result establishes that the commutator norm is exactly given by a generalised uncertainty Var[ρ] which is a notion widely used in quantum theory to express the variance of measurement statistics of an observable in, e.g., quantum metrology [53] and beyond [44].In the present case the observable is the operator ρ and the state is the ideal computational state |ψ id .It is also interesting to note that this commutator norm σ 2 is proportional to the quantum Fisher information [54] of the quantum state |ψ id in a unitary parametrisation generated by the Hamiltonian H ≡ ρ.
Let us now illustrate how using the above expressions yield improved bounds when compared to the general bounds considered in the literature.As such, it is straightforward to show that and this bound is indeed considerably tighter than the prior general result in Eq. 11 since for states of low purity (i.e., Trρ 2 1) we find that λ ρ HS .Furthermore, assuming the decomposition from Eq. ( 4) we obtain the general bound σ ≤ ηδ/2, where δ was defined in Definition 1 and ηδ was the extremal shift in the Weyl inequalities in Remark 1.

Upper bound via commutator norm
We are now prepared to derive a general upper bound of the coherent mismatch based on the previously obtained norms of the commutator.
Theorem 2. Let us define the metric ∆ := σ r /(1 − Q) that depends only on two parameters: the relative commutator norm σ r := σ/λ, where the commutator norm σ was defined in Statement 3 and the ratio of the two dominant eigenvalues is Q := λ 2 /λ.For any fixed ∆ there exist an infinite number of worst-case scenario states that saturate the upper bound of the coherent mismatch as These extremal states ρ have eigenvectors |ψ k that are orthogonal to the ideal state |ψ id , except for the two dominant eigenvectors |ψ , |ψ 2 that correspond to the two dominant eigenvalues λ, λ 2 .
The above upper bound is saturated by extremal states similar to the ones in Theorem 1.The crucial difference, however, is that this upper bound is completely independent of the (not necessarily unique) decomposition into an ideal and noisy quantum states from Eq. ( 4).The present bound can thus be applied to more general scenarios too (note that the definition of the extremal states above is independent of ρ err ).
Fig. 4 shows the coherent mismatch as a function of the metric ∆ for 5×10 4 randomly generated density matrices in various dimensions (blue dots and orange rectangles).The upper bound (dashed black lines) is significantly more likely to be saturated by random states in lower dimensions (orange rectangles) since the extremal states occupy a negligible volume of the increasingly higher dimensional state space.We can identify two distinct regions in the plots.
First, for large ∆ ≥ 0.2 most of the randomly generated states are significantly below the bound, similarly as in Fig. 3.Note also that the metric ∆ can in principle be larger than 1/2 and in such a scenario Eq. ( 12) is not defined.For this reason Fig. 4 shows the general bound c ≤ 1/2 in this region.We note, however, that this region is not relevant in practice since typical quantum circuits in near-term quantum devices produce errors that typically result in relatively small commutator norms ∆ 1 as discussed in Sec.IV.
Second, in the practically more relevant region where c ≤ 10 −3 is sufficiently small, one can observe that all the randomly generated states nearly saturate the upper bound.The reason for this behaviour will be clarified in the next section where we derive a general lower bound on c and show that it approaches the upper bound as c decreases -thus tightly confining the possible values that c can take up.Let us now introduce this lower bound.

Lower bound via commutator norm and application to error suppression
Using the same technique as in Theorem 2 we can derive a directly analogous lower bound for the coherent mismatch.
Lemma 1.Let us define the metric ∆ m := σ r /(1 − Q m ) that depends only on two parameters: the relative commutator norm σ r := σ/λ and the ratio Q m := λ m /λ where λ m is the smallest non-zero eigenvalue of ρ.For any fixed ∆ m there exist an infinite number of best-case scenario states that saturate the lower bound of the coherent mismatch as The dominant eigenvector |ψ of the extremal state ρ and its eigenvector |ψ m that corresponds to the smallest nonzero eigenvalue λ m have non-zero overlaps with the ideal computational state |ψ id .All other eigenvectors |ψ k of ρ with k ∈ {2, 3, . . .m − 1, m + 1, . . .d} are orthogonal to the the ideal state |ψ id .
Refer to Appendix J for a proof.The above Lemma guarantees that the coherent mismatch is always at least as large as the above lower bound for a fixed ∆ m .Note that the upper bounds in Theorem 2 are similarly determined by σ 2 r : the most important consequence is that for a sufficiently small coherent mismatch c → 0, the possible values that c can take up are tightly confined by the upper and lower bounds.This is illustrated in Fig. 4: all randomly generated states with small c ≤ 10 −3 nearly saturate the upper bound.
To substantiate this observation let us compute the ratio of the lower and upper bounds as Let us now consider 3 different scenarios in which the above ratio approaches 1 and thus the lower and upper bounds coincide.First, the ratio approaches 1 when the suppression factor is very small Q 1.Such a small suppression factor guarantees high efficacy of the ESD/VD approach as established in ref. [23], but it may not be reasonable to expect vanishingly small suppression factors for realistic noisy circuits with a large number of gates, refer to Sec.IV.On the other hand, even a realistic Q ≈ 1/2 would result in approximately a factor of 2 ratio between the lower and upper bounds which is already reasonably tight.
Second, the approximation in Eq. 14 depends on the difference between the largest λ 2 and smallest λ m 'error' probabilities (eigenvalues of ρ from Eq. ( 2)).Indeed, Q need not vanish in order for the ratio in Eq. ( 14) to approach 1: it is sufficient that the smallest and largest 'error' probabilities are close via λ 2 ≈ λ m .This is naturally the case for the extremal, rank-1 error states ρ err from Sec. III A 4 for which λ 2 ≡ λ m and we are thus guaranteed that the bounds coincide and are simultaneously saturated.
Third, one can generally expect that the above difference between the largest and smallest error probabilities is determined by the entropy of the error probability distributions.In particular, ref. [23] introduced the error probability vector p := ( λ2 1−λ , λ3 1−λ . . .λ d 1−λ ) T and established that the efficacy of the ESD/VD approach depends on the Rényi entropies H n (p) of this probability vector.Indeed the difference λ 2 − λ m ≤ e −H∞(p) generally decays exponentially with the entropy and regardless of the value of Q the difference of the eigenvalues is negligibly small for high-entropy probability distributions.One can thus generally expect that for high-entropy experimental states the possible values of the coherent mismatch are tightly confined by the lower and upper bounds.

IV. APPLICATION TO QUANTUM CIRCUITS A. Approximating commutators in noisy quantum circuits
Let us now consider noisy quantum circuits that prepare quantum states ρ via mappings Φ c ρ 0 as discussed in Sec.I A. Since the commutator norm σ has a special significance (see Sec. III B) our aim in the following is to approximate the commutator norm for these quantum circuits.
First, let us consider the limiting global worst-case scenario in which case the ideal unitary computation is followed by a global error channel with probability as This is a special case of Eq. 1 in which all gates are perfect, except for the last one.The commutator norm in this case is generally upper bounded as σ 2 ≤ 2 /4 and the bound is saturated when the mapping prepares the extremal states in Eq. (III A 4).
Let us now consider the error channel from Eq. 1 and assume that every gate has an identical error probability .Let us now make another simplification for ease of notation and focus on the case when K = 1 for all k: such as in case of dephasing noise.While these assumptions greatly simplify the following derivations we remark that the present results can be generalised straightforwardly as discussed in Appendix K 2.
The considered error model maps the density matrix to an incoherent superposition (mixture) of 2 ν (where ν is the number of gates) pure states which correspond to individual error events.For example, the pure state represents the event where an error happens during the execution of the k th gate but all other gates are noiseless -this occurs with probability (1 − ) ν−1 , refer to Appendix K for more details.In general we find that there are overall ν l different events where l errors happen and each of these have probabilities As such, we can approximate η from Eq. 4 via the probability that no error happens as where we have introduced the usual circuit error rate ξ := ν to denote the expected number of errors in the full circuit.Indeed, for a sufficiently large number ν of gates the probability that no error happens decays exponentially with ξ.
We compute the norm (from Statement 3) of the commutator [ρ id , ρ] in Appendix K assuming the above error model and obtain the expression Here the index set I indexes all distinct error events and there are exponentially many |I| = 2 ν − 1 of them.Here, p k are probabilities of the individual error events, while L kl are real numbers that depend on the scalar products between the different erroneous states and are thus generally upper bounded as The diagonal terms L kk in the above sum are strictly non-negative and we can obtain a general upper bound by analytically evaluating the summation as In contrast, the off-diagonal terms in the summation in Eq. ( 16) depend on the relative phase between the state vectors of the erroneous quantum states.We can generally upper bound the summation in Eq. ( 16) and 19 up to a constant, where ξ is the expected number of errors in the circuit and ν is the number of gates.For very small ξ 1, the bound (black solid lines) is approximately by a factor of ν smaller than the worst-case scenario (dashed black lines).σ is maximal when ξ ≈ 1/2 and its maximum is at σmax = const × ξ/ν which decreases when increasing the number of gates.Our approximate upper bound (solid black lines) may break down for very large error rates ξ 10.Remarkably, in the practically most important regime ξ ≤ 5 the same kind of scaling can be observed for a large variety of circuits (even for highly deterministic ones) as shown in Fig. 7.
obtain the completely general upper bound σ 2 ≤ (1 − η) 2  which is approximated by ξ 2 for small error rates.This bound is indeed pessimistic: even the global worst-case scenario discussed above has a guaranteed bound σ 2 ≤ ξ 2 /4 which is by a factor of 4 smaller.
In order to be able to establish a more meaningful upper bound, we now consider a rather artificial assumption: we assume that the off-diagonal terms L kl with k = l in Eq. ( 16) are random variables with mean 0 and some variance s kl .This is equivalent to assuming that complex phases (relative to the ideal state |ψ id ) of the 2 ν − 1 erroneous pure states uniformly cover the complex plane.We stress that this assumption is not equivalent to non-entangling random circuits undergoing single-qubit depolarising noise considered in ref. [24].Those circuits map to noise ρ err = Id/d that commutes with the ideal state and indeed one trivially finds that σ = c = 0.In contrast, ref. [24] demonstrated that relatively deep entangling random circuits result in a coherent mismatch that is non-zero and comparable to that of non-random circuits.
The above point can be illustrated via the following analogy: suppose that we sum up n random real numbers (drawn from a distribution of mean 0 and variance s).The sum of these numbers is highly unlikely to be 0. In fact, the result is another random number that is upper bounded with high probability by some multiple of the square-root of the total variance that we can compute as √ ns.In analogy to this observation, we compute the total variance in Appendix K and approximately upper bound the summation from Eq. ( 16) as where f was defined in Eq. ( 17) as the general upper bound on the diagonal entries.Interestingly, we thus find that assuming randomly distributed off-diagonal entries, the total sum is only by a constant multiplicative factor larger than the upper bound of the diagonal entries.Let us now analyse this upper bound.

B. Analysing the approximate bound
Let us now analyse in detail the upper bound function f from Eqs. ( 17)- (18).In particular, in Appendix K 1 we obtain the approximation up to a negligible multiplicative error (that vanishes for large ν) that we neglect for ease of notation.This approximation is plotted in Fig. 5 as a function of the circuit error rate ξ.In the plot one can recognise the following 3 distinct regions.(a) When the circuit error rate is small ξ 1 we find that the upper bound increases in quadratic order as const × ξ 2 /ν.We can compare this expression to the global worst-case scenario scaling ξ 2 /4 and deduce that the present bound decreases inversely proportionally with the number ν of gates (at a fixed error rate ξ).This is illustrated in Fig. 5 where the function f (ξ) (solid black lines) are indeed significantly below the global worst-case bound (dashed black lines), approximately by a factor ν up to the constant factor from Eq. ( 18).
(b) The maximum of the function f (ξ) is at and this position is independent of the constant multiplicative factor from Eq. (18).It is also interesting to note that the global maximum of the function is proportional to = ξ/ν.This informs us that the maximum of the bound is decreased inversely proportionally when increasing the number of gates similarly to (a).In fact, one can generally state that the upper bound in Eq. ( 18) scales as σ 2 = O(1/ν) for any fixed ξ.
(c) The function f (ξ) starts to decrease in the third region where ξ > 1/2 and decreases in exponential order asymptomatically for ξ 1.On the other hand, we observe that in the region where ξ 1, our approximation breaks down: in some instances we numerically observe a different scaling in this regime, especially when the circuits are highly deterministic.We have performed additional simulations to illustrate this point: in Fig. 7 the commutator norm decreases more slowly for highly deterministic circuits (constant rotation angles in the quantum gates) in the region ξ > 1/2.Nevertheless, this region is not particularly relevant in practice for the following reason.In the context of the ESD/VD approach the number of circuit repetitions required to suppress shot noise scales exponentially with the circuit error rate via Eq.(15).It in fact generally holds for error mitigation techniques that their costs grow exponentially and one thus needs to guarantee a bounded ξ.For example, assuming a quadratic (standard shot noise) scaling of the measurement costs, the overhead at ξ = 5 is approximately a factor of 2.2 × 10 4 which is certainly prohibitive in practice [23,55].
On the other hand, we remarkably find that our bounds hold surprisingly well in all scenarios in the practically most important region when ξ ≤ 5.In particular, these bounds seem to hold remarkably well even for highly deterministic circuits in Fig. 7, such as circuits with constant rotation angles -despite that we assumed randomly distributed phases for our approximate bounds.Furthermore, even error models that are beyond the scope of Eq. ( 1), such as damping in Fig. 5, seem to result in exactly the same kind of scaling.The numerical data seem to be independent of the number of qubits too (compare blue, red and black in Fig. 5 and in Fig. 7) as long as the number of gates is fixed, which is consistent with the theoretical bounds.Most remarkably, up until the point ξ ≈ 1 each of the large variety of circuits simulated in this work resulted in exactly the same type of scaling with respect to ξ and ν up to only a small (relative to ν) global multiplication factor.
These observations are supported in Appendix K 2 where extensions of our bound to more general error models are discussed: the form of the upper bound function in Eq. ( 18) is expected to be the same even if one allows higher rank Kraus maps as in Eq. ( 1) or when one allows different error probabilities for different gates via k .Interestingly, if a fraction of the gates commutes with the error Kraus maps then our bound function f (ξ) still holds up to a minor re-scaling of its argument ξ (via a multiplication with a constant).Let us now apply our results to bounding the coherent mismatch.Let us now consider the upper bound for the coherent mismatch via Theorem 2 that depends on the commutator norm.Let us assume that the commutator norm σ is bounded via Eq.( 18) and we then obtain where we have used that the probability of the ideal state η is upper bounded as η ≥ η via Eq.( 15).Let us now remark on 3 important consequences of the above approximate bound and how it confirms prior numerical observations.
(a) Eq. ( 21) establishes that the coherent mismatch scales as c = O(ν 2 ) when assuming a fixed Q, where Q was defined in Theorem 2 as the ratio of the two largest eigenvalues.This scaling is consistent with previous numerical observations: It was numerically observed in ref. [23] (ref.[24]) that if one increases the per-gate error probability in a fixed quantum circuit then the coherent mismatch (noise floor) grows quadratically (linearly) as 21) establishes a scaling c = O(ν) when increasing the number ν of gates at a fixed per-gate error rate.This is consistent with the observation of ref. [23] that increasing the number of gates in a circuit of fixed per-gate error probability increases the coherent mismatch proportionally as c = O(ν), while ref.[24] similarly observed in numerical random-circuit simulations that the noise floor √ c slightly increases when increasing ν.As noted in the above section, the scaling results in this work were derived assuming sufficiently complex quantum circuits, but these results appear to hold remarkably well for even highly deterministic circuits too as long as the circuit error rate does not significantly exceed ξ ≈ 5.
The crucial implication of this scaling for practical applications is the following.Consider a computational task that is defined for N qubits.A quantum circuit of depth a(N ) then requires overall ν = O(N a(N )) gates to implement the computation.This ensures us that the coherent mismatch decreases even for constant depth as c = O(ξ 2 N −1 ) when the size of the computation (via N ) is increased at a constant circuit error rate ξ.In practice one needs to keep ξ at least bounded to ensure a bounded sampling cost which was discussed in the previous section.
(c) Another important consequence of these scaling results is the following.Recall that the probability that no error happens decays exponentially with the circuit error rate as η ≈ e −ξ .This is approximately constant for a fixed value of ξ.In stark contrast, we have found that the coherent mismatch depends on the number of gates and scales as c = O(ξ 2 /ν).Let us now compare the fidelity F that decreases due to incoherent errors and the coherent fidelity 1−c that decreases due to the coherent mismatch in the dominant eigenvector.The fidelity can be approximated as F ≈ η ≈ e −ξ and decays exponentially due to incoherent errors, while the fidelity 1 − c due to the coherent mismatch decays as 1 − c = 1 − O(ξ 2 /ν).The ratio of these two fidelities can then be approximated as Indeed the above ratio increases exponentially when increasing ξ within a finite range, e.g., when ξ < 10 and when the number ν of gates is sufficiently large.This is consistent with numerical observations of ref. [23]: increasing the number of gates in a sufficiently complex circuit decreases the incoherent fidelity (F ) exponentially faster than it decreases the coherent fidelity (1 − c).Very importantly, this ensures us that the coherent mismatch of the dominant eigenvector (which cannot be suppressed) causes an exponentially smaller error when compared to the incoherent decay of the fidelity F .Here the latter can indeed be suppressed exponentially by increasing the number of copies in the ESD/VD approach.

V. DISCUSSION AND CONCLUSION
The present work considered the fundamental question: given a noisy quantum state, how well does its dominant eigenvector |ψ approximate a corresponding ideal, noise-free computation |ψ id ?While it is of fundamental importance to understand how noise affects quantum systems, this particular question has crucial practical relevance.The recently introduced ESD/VD error suppression techniques are ultimately limited by the coherent mismatch.
This work has established general upper bounds and scaling results for the coherent mismatch and presented a comprehensive analysis of its implications in practically relevant scenarios.As such, it was established that the coherent mismatch is indeed negligibly small for sufficiently complex noisy quantum circuits, typically used in variational quantum algorithms and other near-term quantum algorithms [20][21][22].It is interesting to note that since variational quantum algorithms rely on optimising a cost function, this optimisation can be expected to anyway minimise the effect of the coherent mismatch.Let us briefly summarise the most important results.
(a) The bound based on the noise floor √ c in ref. [24] was improved and quadratically smaller bounds are obtained for the pivotal case of preparing eigenstates -see Sec.II A.
(b) A general upper bound for the coherent mismatch was obtained in Sec.III A 3 by explicitly constructing worst-case scenario extremal quantum states that saturate it (for this we analytically computed the coherent mismatch in Sec.III A 2 using our arrowhead decomposition obtained in Sec.III A 1).The present problem is closely related to an important problem in mathematics: bounding the eigenvalues of a sum of two matrices (Weyl inequalities).While those bounds are well-known to be saturated by identical dominant eigenvectors, it was shown in Sec.III A 4 that bounds obtained in this work are in stark contrast saturated by the close-to-orthogonal dominant eigenvectors of the extremal quantum states.
(c) In the ESD/VD approach, even for extremal quantum states, one needs at least 3 − 4 copies of the noisy state to suppress errors to the noise floor set by the coherent mismatch, see Sec.III A 5. The coherent mismatch is thus guaranteed to be negligible in practical applications where the quantum device is limited in its ability to prepare a large number of copies.
(d) Another closely related problem in mathematics is upper bounding the matrix norm of the commutator of two matrices.We obtained considerably tighter bounds then prior results in the specific case of the matrix norm of the commutator between two density matrices, see Sec.III B 1. Interestingly, the commutator norm is given by the generalised quantum-mechanical variance of the density matrix which quantity is also proportional to the quantum Fisher information.
(e) General upper and lower bounds were obtained in Sec.III B 2 and Sec.III B 3 for the coherent mismatch in terms of the commutator norm from (d).It was established that in the practically important region the upper and lower bounds are close to each other and thus tightly confine possible values of c -while the bounds asymptomatically coincide.It was also shown that the coherent mismatch generally decays exponentially with Rényi entropies of the error probabilities -indeed, similar scaling results were obtained in ref. [23] for the efficacy of the ESD/VD approach and it was noted that near-term quantum devices are expected to produce high-entropy errors.
(f) We finally applied the above general results to the specific but pivotal case of noisy quantum circuits in Sec.IV.The resulting approximate bounds confirmed scaling results of ref. [23]: the coherent mismatch in sufficiently complex noisy circuits is decreased inversely proportionally when increasing the size of the computation (by increasing the number of qubits at a fixed error rate).Furthermore, in the practically important regions, the incoherent deterioration of a quantum state is exponentially more severe than the drift in the dominant eigenvector.This establishes that the coherent mismatch is indeed negligible in relevant applications of the ESD/VD approach.
Results obtained in this work pave the way towards developing advanced error mitigation techniques that will be crucial for the successful exploitation of noisy quantum devices.A number of apparent questions will be worth investigating in the future, such as developing twirling techniques (and generalisations thereof) that potentially decrease the coherent mismatch without affecting the ideal part of the computation.In particular, one could obtain a series of quantum circuits Φ (l) c whose unitary component U c is identical for every l while the noise component is different.The average of such channels is thus guaranteed to increase the entropy of errors resulting in a smaller coherent mismatch.
Another open question is related to similar themes in mathematics: Analogously to the Weyl inequalities for the eigenvalues, is it possible to generalise the present results to obtain a series of upper and lower bounds for infidelities in all eigenvectors (not just the dominant one)?Answering this question will be highly non-trivial since the generalisation to arbitrary matrices will require to go beyond the analytical expressions obtained for c and σ which assumed that ρ id is rank-1 thus having only a single dominant component.
Let us finally remark that arguments presented in this work naturally generalise to infinite-dimensional quantum states ρ as general trace-class operators.Example of a variational quantum optimisation using 8 qubits.The ground state of a spin-ring Hamiltonian with nearest neighbour XX, Y Y and ZZ couplings and randomly generated on-site frequencies ω k Z is searched via a VQE optimisation.The distance from the exact ground-state energy (brown) approaches 0 as the number of iterations is increased.If the errors in the noisy quantum circuit (circuit error rate ξ ≈ 2) are suppressed via the ESD/VD approach then one can measure the exact expectation value with respect to the dominant eigenvector of the noisy quantum state -this causes an error (black) when compared to the ideal expectation value in a noiseless circuit.This error is generally upper bounded by the noise floor (red, trace distance) which is very pessimistic and as the quantum state approaches the ground state then the error is guaranteed to be upper bounded by the coherent mismatch (blue, infidelity).The latter bound seems to hold even for approximate ground states (low iteration depth).
One can straightforwardly show that the trace distance upper bounds measurement errors with respect to any bounded observable O as where d k and |χ k are eigenvalues and eigenvectors of the difference of the two density matrices.
The quantity 2 √ c O ∞ thus upper bounds the measurement error of any bounded observable.Interestingly, if the ideal computational state approximates an eigenvector of the measurement operator we then find the following.Let us write the dominant eigenvector as a linear combination of two vectors with real, non-negative c (since we are free to choose the global phase of a state vector).It follows that the measurement of an observable yields In the special case when O|ψ id = E|ψ id for some real E then we obtain ψ ⊥ |O|ψ id = 0 and finally, the measurement error of the observable is Proof.We compute the matrix representation of the operator ρ by choosing an orthonormal basis that defines the unitary transformation U such that U ρU † = ρ.Let us choose the leading basis vector as |ψ id and thus U |ψ id =: | ψid = (1, 0, . . .0) T .We can choose the rest of the basis vectors |φ k arbitrarily as long as ψ id |φ k = 0 for all k = {2, 3, . . .d}.We define |φ k such that they are eigenvectors of P ρP , where P = Id − |ψ id ψ id | projects onto the orthonormal subspace.Furthermore, we are free to choose the global phase of the basis vectors and we note that this global phase has no effect on the diagonal entries since Here D k are non-negative since ρ is by definition positive semi-definite.We can implicitly define the global phase of the vectors |φ k such that the off-diagonal entries are real and non-negative as We have thus established a matrix representation of ρ such that D k , C k ∈ R and D k , C k ≥ 0, and ρ is diagonal in the subspace orthogonal to |ψ id .We can finally explicitly write the arrowhead matrix using the above established orthonormal basis {ψ id , φ 2 , φ 3 . . .φ n } that defines the unitary transformation U such that Proof.If we explicitly know the arrowhead matrix, then its eigenvectors can be computed analytically [43], refer also to Eq. ( 5) in [40].Recall that we introduced the orthonormal basis { ψid , φ 2 , φ 3 . . .φ n } in Appendix C and used it to represent ρ as an arrowhead matrix.This corresponds to a unitary transformation U ρU † = ρ, where ρ is the arrowhead matrix from Statement 1.Using these notations we can write the dominant eigenvector of ρ from Definition 1 up to this unitary transformation as where λ is the domiant eigenvector from Definition 1.We can apply this explicit formula to compute the coherent mismatch from Definition 1 as where we have used that | ψid = (1, 0, . . .0) T .
Appendix E: Proof of Remark 2 Proof.The first-order perturbation correction to the dominant eigenvector can be computed via Statement 1 using the arrowhead decomposition as where | ψid = (1, 0, . . .0) T and D is diagonal.Let us now treat C as a perturbation of the diagonal matrix F | ψid ψid |+ D and use the usual perturbative expansion, see e.g., Eq. (5.1.44)in [44].We can thus compute the first-order correction to the dominant eigenvector (and recall that | ψ := U |ψ ) as ). (E1) The normalised first order eigenvector is obtained as ).
Computing the coherent mismatch c from the above first-order perturbation we obtain Let us remark that using Eq (10.2) from [45], one can obtain the more accurate first-order approximation assuming explicit knowledge of the eigenvalues Lemma 2. The coherent mismatch c of density matrices is generally bounded via their arrowhead-matrix representations with non-negative entries where k and D m is the smallest non-zero diagonal entry of the arrowhead matrix.The upper bound is saturated by any density matrix ρ that can be mapped to an arrowhead matrix of the form where the only non-zero off-diagonal entry C 2 is next to D 2 .Furthermore, D k with k > 2 are eigenvalues of the arrowhead matrix.The lower bound is saturated by analogous matrices but the non-trivial 2-dimensional subspace contains the smallest non-zero diagonal entry D m > 0 as where we have used the notation Ξ := We can upper bound Ξ by using the interlacing property with where we have introduced the d − 1-dimensional vector The upper bound is saturated by arrowhead matrices of the form where we used the notation C := (1, 1, 1, . . . 1) C / √ ν and ν is the dimension of the identity matrix Id ν .It is straightforward to show that these matrices saturate the upper bound just by computing the coherent mismatch as Ξ = d k=2 C 2 k (λ−D2) 2 which coincides with the upper bound above.The above matrix has non-zero off-diagonal entries in the upper left corner in a ν + 1-dimensional subspace.Here ν represents the degeneracy of the eigenvalues of the matrix A max , and the upper bound in Eq. (F1) is saturated by any such matrix with any ν ≤ d − 1.For example, setting ν = 1 assumes no degeneracy of the eigenvalues.Let us now express this non-trivial subspace explicitly as where q := C / √ ν.The last step is that we show that the above matrix is unitarily equivalent to the matrix where only a 2-dimensional sub-block has non-zero off-diagonal entries and represents that the two matrices are unitarily equivalent.Note that we map density matrices to arrowhead matrices by applying a suitable unitary transformation.Let us consider the following example.Instead of mapping ρ to the matrix on the left-hand side above by applying U 1 , we map ρ to the matrix on the right-hand side by applying U 2 U 1 , where U 2 maps between the above two matrices due to their unitary equivalence.We will prefer to map density matrices to the arrowhead matrices on the right-hand side, albeit the two forms would be equivalent and result in the same coherent mismatch.
The most straightforward way to show the unitary equivalence of the above two matrices is by recognising that both are arrowhead matrices and their eigenvalues are roots of the same secular function from Eq. ( 8) As the two matrices share the same eigenvalues, there exists a unitary transformation that transforms one into the other.It follows therefore that the upper bound on the coherent mismatch from Eq. (F1) is saturated by any density matrix that can be mapped to an arrowhead matrix of the form

Lower bound
We consider arrowhead matrices with arbitrary non-negative entries F, C k , D k ≥ 0 with 2 ≤ k ≤ d, which contain the density matrices from Statement 1.The lower bound on Ξ can be obtained as where D d is the smallest diagonal entry.Arrowhead matrices that saturate the lower bound can be constructed by following a very similar argument to the one presented above.We find that any density matrix that can be mapped to an arrowhead matrix of the following form saturates the lower bound One additional remark is that the non-trivial 2-dimensional sub-block needs to be positive-semidefinite as the matrix represents a density matrix.It therefore follows that only arrowhead matrices with C d ≤ √ D d F can represent valid density matrices.We can exclude trivial cases such as D d = 0, where necessarily C d = 0, and tighten the lower bound the following way.Density matrices that are mapped to arrowhead matrices satisfy a lower bound on the coherent mismatch via where D m is the smallest non-zero eigenvalue.This lower bound is saturated by density matrices that can be mapped to arrowhead matrices of the form Let us now prove Theorem 1.
Proof.Explicit construction of extremal density matrices It was shown above that the upper bound of the coherent mismatch is saturated by density matrices that can be mapped to arrowhead matrices of the from of Eq. (F2).We now aim to explicitly construct density matrices (positive semi-definite and unit trace) that map to the extremal arrowhead matrices in Eq. (F2) and thereby maximise the coherent mismatch.Let us now derive the explicit form of these states in terms of the decomposition in Eq. ( 4) as a weighted sum of the ideal state and an error state as ηρ id + (1 − η)ρ err .
Let us denote the non-trivial 2-dimensional block of the density matrix in Eq. (F2) as M , which in the arrowhead representation M := U M U † yields Let us also assume that M is rank-2 (i.e., if it is rank-1 than it represents purely a coherent error and we trivially find that c = 1 − F ) which guarantees that the decomposition in Eq. ( 4) exists.In this case we can uniquely find an optimal η from Sec.A for which the difference matrix M − η|ψ id ψ id | is rank-1.We can thus obtain the following expression for M as a sum of two rank-one matrices as where η is the weight in ηρ id + (1 − η)ρ err and µ 1 is the largest eigenvalue of the error density matrix as defined in Definition 1.Here the pure state |χ can generally be expressed as a linear combination of the first two basis vectors (and recall that | ψid = (1, 0, 0, . . .0) T and | φ2 = (0, 1, 0, . . .0) T ) as for some α ≥ 0.
We finally obtain the error density matrices that saturate the upper bound of the coherent mismatch as where R is orthogonal to ψ id |R|ψ id = χ|R|χ = 0 as well as we can arbitrarily choose the probability distribution {D k : 3 ≤ k ≤ d} and we can choose the dominant eigenvalue of ρ err arbitrarily in the range 1/d ≤ µ 1 ≤ 1 as long as As such, any density matrix of the form ηρ id + (1 − η)ρ err saturates the corresponding upper bound on the coherent mismatch.

Obtaining the upper bound
The upper bound in Eq. (F1) depends on parameters that one can only obtain from the arrowhead representation of a quantum state, i.e., D 2 and C 2 .Let us now derive an alternative upper bound on the coherent mismatch that depends on parameters of the decomposition ηρ id + (1 − η)ρ err .We compute this upper bound by exactly computing the coherent mismatch for the extremal density matrices in Eq. (F13).By representing M in the arrowhead basis, we obtain the 2 × 2 block from Eq. (F13) as .
We can now compute the coherent mismatch c analytically as a function of δ 1 , δ 2 and α, and maximise c with respect to α.For this reason, we first express the coherent mismatch analytically as (by computing the first component of the eigenvector as) We can find the first order optimality condition by differentiating c with respect to α as 2 ) 3/2 .We can uniquely solve this equation in terms of δ := δ 2 /δ 1 from Definition 1 as We finally obtain the coherent mismatch for the above worst-case scenario density matrices as The above expression is a general upper bound for the coherent mismatch which is saturated by any density matrix that can be mapped to an arrowhead matrix of the form assumed above.

Remark:
Let us remark that one could use the upper bound that explicitly depends on α But here we aimed to derive an upper bound that is independent of α.

Appendix G: Number of copies for error suppression
The suppression factor Q was introduced in ref [23] which we adapt to the notations used in this work as Q := λ 2 /λ (the present work uses a different convention to denote the eigenvalues of ρ in Eq. 2 when compared to ref. [23]).
Indeed, there exist two non-zero solutions σ = ± C HS / √ 2. Since there are only two non-zero eigenvalues, we can compute the infinity norm as C ∞ = C HS / √ 2. In fact, we can compute any p-norm of the matrix C as Eigenvectors of C: Let us introduce the vector |φ which can be defined via the decomposition of the C matrix as the first row and column vectors of C as Using results of [40] we can analytically compute the eigenvectors of C using the eigenvalues σ = ± C HS / √ 2 from statement 1 as where φ = σ = C HS / √ 2. Indeed we can confirm that the eigenvalue equation is satisfied as Eigenvalues and norm of the commutator: We can similarly write the commutator as and its eigenvectors are and indeed its eigenvalues are ±i C HS / √ 2 = ±i φ = ±iσ via the eigenvalue equation Equivalence of norms: We have shown in the previous statement that C and the commutator share the same singular values.It immediately follows with denoting the singular value σ = C HS / √ 2 that the norms are equivalent as Lemma 4. The Hilbert-Schmidt norm of the commutator can be computed exactly as Proof.The Hilbert-Schmidt norm is computed via the trace here we can simplify the expressions using that ρ id ρ ρ id = F ρ id and we can also use the cyclic reordering property of the trace so we obtain where σ is the common, only non-zero singular value with C and we have also used that F = ψ id |ρ|ψ id .This is indeed a quantum-mechanical variance and using elementary statistics where X := ψ id |X|ψ id .Our final result is that we can compute the singular value via the above variance The norm [ρ id , ρ] 2 HS of the commutator is given by the variance Appendix I: Proof of Theorem 2: upper bound in terms of the commutator Recall that density matrices that can be mapped to arrowhead matrices of the form of Eq. (F2) as saturate the upper bound of the coherent mismatch in Eq. (F1).Here the 2-dimensional sub-block is a 2-dimensional arrowhead matrix and we can write the corresponding coherent mismatch using Statement 2 as where the term Ξ yields the simplified expression where we have used that σ ≡ C 2 .Let us analytically express D 2 in terms of the eigenvalues λ and λ of the 2-dimensional matrix, and in terms of C 2 .We can analytically solve these eigenvalues as and express F and D 2 in terms of the eigenvalues λ and λ as We can now express Ξ either in terms of (λ, F ) or in terms of (λ, λ ) and we now substitute From the first equation we can see that Ξ ultimately depends on the ratio of the gap λ − F and the commutator σ.Similarly, the second equation only depends on σ and on the gap between the two eigenvalues λ − λ .Let us remark that λ is the second largest eigenvalue of the density matrix as λ 2 ≡ λ in case if the diagonal entries of the extremal arrowhead matrix from Eq. (F2) are such that λ ≥ D 3 .This condition is generally satisfied when ).Nevertheless, without loss of generality we can assume in the following that λ 2 ≡ λ , in which case the resulting upper bound will only be saturated by arrowhead matrices A max which satisfy D 2 ≥ D 3 + C 2 2 /(F − D 3 ).We can simplify our expression for Ξ(λ, λ 2 ) by introducing the factor ∆ := σ/(λ − λ 2 ).This allows us to directly express Ξ in terms of ∆ via the above equation as It is immediately clear that the term in √ 1 − 4∆ 2 is non-negative when 1 − 4∆ 2 ≥ 0 and thus our bound holds when 1/2 ≥ ∆.Let us finally write ∆ in terms of λ and Q := λ 2 /λ and in terms of σ r := σ/λ as We can finally simplify the expression for c and find the surprisingly simplified formula as We can expand this for small ∆ and find that indeed the coherent mismatch scales quadratically with the commutator norm as The above equations express the coherent mismatch for the extremal states and thus guarantee a general upper bound for c.This bound is saturated by density matrices that can be mapped to arrowhead matrices of the form of Eq. (F2) with the additional constraint D 2 ≥ D 3 + C 2 2 /(F − D 3 ) that ensures that the smaller eigenvalue λ of the two-dimensional matrix block M max is the second largest eigenvalue of the arrowhead matrix.
Appendix J: Proof of Lemma 1: lower bound in terms of the commutator Proof.Recall that density matrices that can be mapped to arrowhead matrices of the form of Eq. (F3) as The above equations express the coherent mismatch for the extremal states and thus guarantee a general lower bound for c.We note that the eigenvalues of the 2-dimensional matrix M min are guaranteed to be the largest and the smallest non-zero eigenvalues of the density matrix due to the interlacing property.It follows that the above lower bound is saturated by any density matrix that can be mapped to an arrowhead matrix of the form of A min above.
Appendix K: Commutators in noisy quantum circuits The noise model Here we assume a noise channel that maps to a noisy state via ν noisy gates as introduced in Eq. ( 1).This noisy quantum circuit is in the form of a product of noisy quantum gates as where every gate can be written in terms of the Kraus map from Eq. (1) as M jk ρM † jk .
In the following we focus on the special case of K = 1 for ease of notation.A prominent example is the dephasing noise channel in which case M k = Z k U k , where Z k is a Pauli Z operator that acts on the same qubit(s) as the unitary U k .This family of noise models can be understood via the analogy to flipping ν coins: every coin has a probability to yield heads (error event via M k ) and probability 1 − to yield tails (no error via U k ).The probability that no error happens throughout the entire circuit (all tails) is then (1 − ) ν .This allows us to write the resulting density matrix into the following form ρ = p 0 ρ id + (1 − ) ν−1 ρ 1 + (1 − ) ν−1 ρ 2 + . . . 2 (1 − ) ν−2 ρ 12 + . . .ν ρ 1234...ν .
Here every term represents a pure state, for example ρ 1 = |E 1 E 1 | is a pure state in which an error occurred at gate 1 and therefore its state vector can be expressed as We thus conclude that the commutator from Statement 3 can be computed via Let us express the vector norm by introducing the index set I = {1, 2, . . . 2 ν − 1} whose elements k ∈ I index the individual error events.The vector norm is then given by a sum over these events as Similarly, we can express the overlap via the summation The first term (1 − ) 2ν =: η2 in the solution is identical to the square of exponential decay of the incoherent fidelity, i.e., probability that no error happens in Eq. (15).We have approximated this probability for large ν and for a varying circuit error rates ξ := ν as (1 − ) 2ν = (1 − ξ/ν) 2ν ≈ e −2ξ .The second term in the solution can be simplified as 2( − 1) + 1 ( − 1) 2 ν − 1 = exp[ν ln 2( − 1) + 1 ( − 1) 2 ] − 1 = e ν( 2 +2 3 +... ) − 1 = e ξ 2 /ν+2ξ 3 /ν 2 +... − 1 ≈ e ξ 2 /ν − 1 and we have used that for bounded ξ and large ν all higher order terms can be neglected and keep only the leading term ξ 2 /ν.Combining the two approximations we finally obtain the approximate upper bound as f ≈ e −2ξ (e ξ 2 /ν − 1) ≈ e −2ξ ξ 2 /ν, which approximation has an additive error that scales with (ξ 2 /ν) 2  1.The function f (ξ) can be divided into 3 distinct regions: for ξ 1 it grows quadratically for fixed ν as f (ξ) ≈ ξ 2 /ν, or equivalently it grows linearly for fixed as f (ξ) ≈ ξ.The function then reaches its maximum f (ξ max ) at around ξ ≈ 1/2 due to the expansion It is also interesting to note that global maximum of the function is completely independent of the other two variables and depends only on = ξ/ν.Similarly, the position of the global maximum is approximately constant as it is approximately independent of all three variables.In the third region where ξ 1, the function decreases exponentially but our approximation breaks down in this regime.

Extension to more general Kraus maps
The above formulas straightforwardly generalise to higher Kraus rank the following way.For example, the single qubit depolarisng channel corresponds to K k = 3 and in this case there will be ν = 3ν different single error events that can occur with probabilities = /3.The circuit error rate ξ is invariant under this transformation as ξ = ν = ν and the upper bound f (ξ) ≈ [ξe −ξ ] 2 /ν → f (ξ)/3 is only different by a global constant factor.We can similarly generalise this model to other Kraus maps.
Another simplification we have made is that we have assumed in Eq. ( 1) that all gates have identical error probabilities .We can extend these Kraus maps in which all gates φ k have possibly different error probabilities k .In this case our previous bounds straightforwardly apply by upper bounding k ≤ max k and using the largest error probability in the bound.Our results thus still hold via the upper bound function f (ξ ) ≈ [ξ e −ξ ] 2 /ν, where set ξ := max k ν.Indeed, one can straightforwardly tighten these bounds by assuming some average error rate mean .It is interesting to note that the probabilities of k errors happening in the circuit are still expected to be Poisson distributed via the Le Cam theorem even if we allow different per-gate error probabilities k for every gate assuming the limit of a large number of gates (and bounded ξ) as discussed in Sec.IV in [57].
Furthermore, if a fraction κ of the error Kraus operators M k from Eq. (1) (assuming Kraus rank K = 1 for ease of notation) commutes with the corresponding ideal unitary gates U k , then we can simplify the error model the following way.A fraction 1 − κ of the gates are noise-free while a fraction κ of the gates undergo a higher error rate 2 .Thus our upper bounds still apply via the modifications as ν = (1 − κ)ν and we can use the general upper bound on the probabilities ≤ 2 (assuming a small fraction κ).This modifies our upper bounds as f (ξ) ≈ [ξ e −ξ ] 2 /ν → 4(1 − κ)f [2(1 − κ)ξ], which is generally a rescaling of the function by a multiplication of the argument ξ and a multiplication by a global constant.Indeed we observe in Fig. 7 that the numerical data is slightly shifted to the right when compared to the upper bounds (solid lines).This discrepancy could be explained by the fact that a fraction of the gates in the simulated circuits actually commute with the noise Kraus operators.
FIG. 1.An important application of the present work is that it allows one to determine the ultimate precision of the ESD/VD error suppression technique[23,24].The trace distance (blue line) used in ref[24] is given by the square root of the coherent mismatch c from[23] and generally upper bounds the error | ψ id |O|ψ id − ψ|O|ψ | ≤ 2 √ c in estimating expectation values (with O ∞ = 1) with the ideal computational state |ψ id vs. the dominant eigenvector |ψ .Most randomly generated quantum states (blue dots) are significantly below this pessimistic general bound (blue line) -which was already noted in ref[24].We show that this error bound is quadratically smaller as 2c (orange line) in the specific but pivotal case of preparing eigenstates (orange rectangles) -the aim of most near-term quantum algorithms.Refer to Sec.II A.

1. Arrowhead matrices Statement 1 .
The quantum state ρ in Definition 1 is unitarily equivalent to a real, symmetric, nonnegative arrowhead matrix and can be decomposed into the sum of matrices ρ = F | ψid ψid | + D + C as

c 1 δFIG. 3 .
FIG. 3. Coherent mismatch c in randomly generated states and its upper bound (dashed black lines) as a function of the ratio δ of the largest error eigenvalue vs. the ideal state's contribution η from Definition 1.(a) linear-linear scale and (b) log-log scale.The dominant eigenvector |ψ of a noisy quantum state ρ = η|ψ id ψ id | + (1 − η)ρerr is generally different than the ideal computational state as characterised by the coherent mismatch (infidelity) c = 1 − | ψ id |ψ | 2 .The general upper bound on c from Theorem 1 (dashed black line) is saturated by the extremal quantum states ρerr which are highly unlikely to appear in practical scenarios and thus experimental quantum states are expected to be significantly below this bound.Here, δ, ρerr and c are defined in Definition 1. Randomly (uniformly with respect to the Haar measure) generated quantum states of large dimensions (blue dots) are significantly less likely to saturate the bounds than quantum states in smaller dimensions (orange rectangles).We present better lower and upper bounds in Sections III B, see also Fig.4.

5 Δ 1 Δ
FIG.4.Coherent mismatch c in randomly generated states and its upper bound (dashed black lines) as a function of the relative commutator norm ∆ (that is proportional to [ρ id , ρ] ∞).(a) linear-linear scale and (b) log-log scale.This bound is independent of the not necessary unique decomposition in Eq. (4).Another significant advantage is that this upper bound comes with a similarly scaling lower bound: for small c ≤ 10 −3 all randomly (uniformly with respect to the Haar measure) generated states (blue dots and orange rectangles) nearly saturate the upper bound (dashed black lines) due to the asymptomatically coinciding lower and upper bounds.

FIG. 5 .
FIG.5.Commutator norm σ 2 in simulated circuits and its upper bound (black solid lines) as a function of the circuit error rate ξ.Overall 10 4 circuits composed of ν = 200 gates were randomly generated as combinations of single qubit X and Z rotations, and CNOT (left) or XX (right) entangling gates.The gates are followed by depolarising (left) or damping (right) noise.It is established in Sec.IV that for sufficiently complex quantum circuits the commutator norm σ = [ρ id , ρ] ∞ from Statement 3 is upper bounded by the function f (ξ) ≈ e −2ξ ξ 2 /ν from Eq. 19 up to a constant, where ξ is the expected number of errors in the circuit and ν is the number of gates.For very small ξ 1, the bound (black solid lines) is approximately by a factor of ν smaller than the worst-case scenario (dashed black lines).σ is maximal when ξ ≈ 1/2 and its maximum is at σmax = const × ξ/ν which decreases when increasing the number of gates.Our approximate upper bound (solid black lines) may break down for very large error rates ξ 10.Remarkably, in the practically most important regime ξ ≤ 5 the same kind of scaling can be observed for a large variety of circuits (even for highly deterministic ones) as shown in Fig.7.

C
. Application to coherent mismatch c and noise floor √ c FIG. 6.Example of a variational quantum optimisation using 8 qubits.The ground state of a spin-ring Hamiltonian with nearest neighbour XX, Y Y and ZZ couplings and randomly generated on-site frequencies ω k Z is searched via a VQE optimisation.The distance from the exact ground-state energy (brown) approaches 0 as the number of iterations is increased.If the errors in the noisy quantum circuit (circuit error rate ξ ≈ 2) are suppressed via the ESD/VD approach then one can measure the exact expectation value with respect to the dominant eigenvector of the noisy quantum state -this causes an error (black) when compared to the ideal expectation value in a noiseless circuit.This error is generally upper bounded by the noise floor (red, trace distance) which is very pessimistic and as the quantum state approaches the ground state then the error is guaranteed to be upper bounded by the coherent mismatch (blue, infidelity).The latter bound seems to hold even for approximate ground states (low iteration depth).

)
Proof.Upper boundLet us consider arrowhead matrices with arbitrary non-negative entries F, C k , D k ≥ 0 with 2 ≤ k ≤ d, which contain the density matrices from Statement 1. Recall from Statement 2 that the coherent mismatch can be expressed as ) with the diagonal entries satisfying the usual ordering D 2 ≥ D 3 ≥ . . .D d and it follows that D k with k > 2 are eigenvalues of the arrowhead matrix.The degenerate case is recovered via D 2 = D 3 = . . .D ν+1 .
J1) saturate the lower bound of the coherent mismatch in Eq. (F1).Here D m is the smallest non-zero diagonal entry in the arrowhead matrix.Here the 2-dimensional sub-block M := M min = F C m C m D m is a 2-dimensional arrowhead matrix and we can compute the corresponding coherent mismatch similarly as for the upper bound in Appendix I.For this reason, let us introduce ∆ m := σ r /(1 − Q m ) where σ r := σ/λ and Q m := λ m /λ is the ratio of the smallest and largest eigenvalues.With this, we can compute the analytical expression for c and obtain the expression for the coherent mismatch as c = 1 − [1 + Ξ(∆ min )] −1 = (1 − 1 − 4∆ 2 min )/2, Let us remark that we can expand the above expression for small ∆ min as c = (1 − 1 − 4∆ 2 )/2 = ∆ 2 min + ∆ 4 min + O(∆ 6 min ).