Efficiently improving the performance of noisy quantum computers

Using near-term quantum computers to achieve a quantum advantage requires efficient strategies to improve the performance of the noisy quantum devices presently available. We develop and experimentally validate two efficient error mitigation protocols named"Noiseless Output Extrapolation"and"Pauli Error Cancellation"that can drastically enhance the performance of quantum circuits composed of noisy cycles of gates. By combining popular mitigation strategies such as probabilistic error cancellation and noise amplification with efficient noise reconstruction methods, our protocols can mitigate a wide range of noise processes that do not satisfy the assumptions underlying existing mitigation protocols, including non-local and gate-dependent processes. We test our protocols on a four-qubit superconducting processor at the Advanced Quantum Testbed. We observe significant improvements in the performance of both structured and random circuits, with up to $86\%$ improvement in variation distance over the unmitigated outputs. Our experiments demonstrate the effectiveness of our protocols, as well as their practicality for current hardware platforms.


Introduction
The last few years have seen unprecedented advances in quantum technologies, particularly in the development of Noisy Intermediate-Scale Quantum (NISQ) devices.These devices can already outperform their classical counterparts in specific tasks [2], but suffer from significant noise that corrupts their outputs.As fault tolerance is not expected to become available in the immediate future [41], understanding how to optimize the performance of NISQ devices is of paramount importance.
In recent years, much effort has been devoted to finding alternatives to fault tolerance that are feasible in the near term.This paved the way for the development of a rich set of error mitigation (EM) protocols [3, 13, 14, 20, 24-26, 28-31, 43, 46].Unlike fault-tolerant protocols, EM protocols do not attempt to correct errors that occur in noisy circuits.On the contrary, by actively amplifying noise in a controlled way and post-processing the outputs, they eventually extrapolate correct outputs from the noisy ones.EM protocols require a high number of samples and are generally inefficient in the circuit size [45].On the other hand, they typically have little overhead in qubits and gates, which makes them practical for today's devices.
Despite the promising results obtained in a large number of experimental demonstrations [3,11,13,24,26,28,42,43,51], performing EM on large circuits remains a challenging task.Some of the leading proposals [13,46] require reconstructing the noise afflicting multi-qubit operations using Gate-Set Tomography (GST) [5,6,21].Since GST is inefficient in the number of qubits, the noise reconstruction is typically performed by analysing individual one-and two-qubit gates while ignoring the rest of the system.This approach severely limits the effectiveness of EM for large circuits, where noise processes involving more than two qubits (such as crosstalk) typically play a major role [22,23,35].For this reason, so far the EM protocols that rely on GST have been demonstrated on circuits containing up to two qubits [42,51], where this limitation is irrelevant.Alternative protocols [3,8,20,24,26,30,46] that do not rely on noise reconstruction have been applied to larger circuits [28], but their effectiveness has been formally proven only for specific noise models, e.g. for depolarizing noise [3,20,30].Overall, further work is required Bias δ PEC + δrec, with δ NOX + δrec, with Table 1 Runtime and bias for our EM protocols, calculated as functions of the desired standard deviation σ of the results, the circuit depth m, and the error rate of each cycle.(For simplicity, in this table we assume that every cycle has an error rate equal to nε, where n is the number of qubits in the input circuit and ε is a constant; see theorems 1 and 2 in section 3 for a generalisation.)The quantity δ rec depends on the accuracy of the noise reconstruction, with δ rec = 0 if the noise is known exactly (cfr.lemmas 1 and 2).In comparison, an unmitigated implementation of the input circuit hes runtime 1/σ 2 and bias O(mnε).
before EM protocols can become of use for circuits of interesting sizes.
Recently, a number of protocols have been developed that can characterize multi-qubit noise processes more efficiently than GST [18,19,22], under a set of realistic (and verifiable) assumptions of the noise.Among these is "Cycle Error Reconstruction" (CER), which can accurately reconstruct Pauli channels even when they act on more than two qubits [23].Leveraging CER, in this paper we develop a novel approach to EM that targets "cycles" of gates (i.e., groups of gates applied in parallel to disjoint subsets of qubits [48]) rather than individual gates.This approach enables us to design two new EM protocols to efficiently mitigate noise that involves an arbitrary number of qubits-potentially up to an entire register.By targeting the noise afflicting full cycles of gates rather than individual gates, our protocols are fully robust to complex error processes (such as cross-talk, correlated errors, and non-depolarizing noise) that are known to affect present devices [22,23,35], and that may compromise the effectiveness of existing EM protocols.

Our protocols in summary
Our protocols (which we name "Pauli Error Cancellation", or PEC, and "Noiseless Output Extrapolation", or NOX) take as input a quantum circuit and an operator O, and return an estimator of the expectation value of O at the end of a noiseless implementation of the input circuit.To compute this estimator they undertake two different approaches.In particular, PEC is built around quasiprobabilistic error cancellation, one of the most popular techniques in EM [13,26,31,46], while NOX requires amplifying the noise afflicting individual cycles.Despite this difference, they require performing the same fundamental tasks: characterizing the noise in the input circuit with CER, implementing a set of noisy circuits, and finally post-processing their results.
The estimators returned by NOX and PEC have a residual bias that depends on the noise levels of the input cir-cuit, as well as on the accuracy of the noise reconstruction (Table 1).Crucially, if backed by an accurate noise reconstruction, this bias is quadratic in the error rate of the input (unmitigated) circuit.This is a significant improvement over the unmitigated estimators, whose biases are linear in the error rates.
While being able to mitigate a broader class of noise processes than other existing EM protocols, NOX and PEC require a similar runtime as the available protocols.In particular, if the input circuit is afflicted by moderate noise (i.e. in the notation of Table 1, if ε < 1/mn), they remain efficient, meaning that their runtimes scale polynomially with the desired statistical accuracy of the results.

Our experiments in summary
We demonstrate our EM protocols on four fixedfrequency superconducting transmon qubits (labeled Q4, Q5, Q6, and Q7) on an eight qubit quantum processor (AQT@LBNL Trailblazer8-v5.c2;see Fig. 1a) at the Advanced Quantum Testbed at Lawrence Berkeley National Lab [1].Single-qubit gates are implemented using resonant Rabi-driven X π/2 gates and virtual phase shifts between pulses [36].The native two-qubit gate on the device is a controlled-Z (cZ) implemented via off-resonant drives between neighboring qubits [38].We apply PEC and NOX to circuits of different size and nature.We implement each circuit multiple times, with and without EM, and calculate the average variation distance (Eq.17) between ideal and experimental probability distributions of the outputs.We observe significant improvements in variation distance under EM for every circuit (Fig. 1b).
The cZ gates are the noisiest components in our circuits.However, the CER data show that when a cZ gate is performed in parallel with idling spectator qubits, the idling qubits experience higher levels of noise than the qubits being entangled (see Appendix).In this scenario, performing mitigation at the level of individual gates (e.g. with the gate-centred EM protocols that rely on GST [13,46]) would not lead to significant improvements in the outputs, and could in fact amplify the noise acting on the idling qubits.On the contrary, by targeting cycles rather than gates, NOX and PEC are able to provide the visible improvements reported in Fig 1b .This paper is organized as follows.In section 2 we define our notation, list the assumptions made throughout the paper and provide a brief overview of CER.In section 3 we describe our protocols and state their main properties.In section 4 we describe our experiments.

Notation and assumptions
We denote unitary gates with capital Latin letters and Completely Positive Trace-Preserving (CPTP) maps with calligraphic letters.We write U = {U l } L l=1 to indicate that U has Kraus operators U l .We use • to indicate the composition of maps, e.g.
To prove our results we make the following two assumptions: ).In this work, we used the four transmon qubits (green) with independent drive lines (blue) out of a total of eight qubits arranged in a ring geometry-the other four qubits are inactive on the device.The qubits are coupled to nearest neighbors via coupling resonators (CR, purple), and are dispersively measured via independent readout resonators (red) coupled to a multiplexed readout bus (MRB, cyan).(b) Average improvements in variation distance obtained in successful implementations of NOX and PEC protocols.In our experiments we apply PEC and NOX to four-qubit random circuits of varying depth m as well as to structured circuits, such as circuits to prepare W states with n = 2, 3, 4 qubits (Eq.15) and to estimate a parameter κ through the quantum phase estimation algorithm.We compute the variation distance (VD, Eq. 17) between the ideal and experimental probability distributions of the outputs, and we quantify the improvement in VD under EM as 1 − D EM /D unm , where D EM (D unm ) is the VD for the mitigated (unmitigated) outputs.We observe drastic improvements in VD for both NOX and PEC, ranging from 32% to as high as 86%.
A1.We assume that the noise is Markovian and timestationary, i.e., that a noisy implementation of an operation U can be written as D U U , where D U is a (potentially operation-dependent) CPTP map that is fixed in time.
A2.We assume that the cycles of one-qubit gates suffer gate-independent noise, i.e., D U = D for all the cycles of one-qubit gates U .
In addition to A1 and A2, we assume that every noise process D U in our circuits is a Pauli channel, i.e., that it maps an n-qubit state ρ into (1) Here, the "Pauli errors" P k ∈ {I, X , Y, Z} ⊗n are n-qubit Pauli operators and the "Pauli error rates" ϵ (U ) k are probabilities (we set P0 = I ⊗n for convenience).Although not every noise process is a Pauli channel, under the assumptions A1 and A2 every process can be efficiently transformed into Pauli channels via Randomized Compiling [23,48], available on True-Q [4].

Cycle Error Reconstruction
CER (available on the software True-Q [4]) is a protocol to efficiently characterise noisy cycles with high accuracy.In more detail, let D H H be a noisy implementation of an n-qubit Clifford cycle H with Pauli noise D H .In its simplest form, CER takes as input the cycle H and a positive integer K ≤ n.After characterizing the cycle's noise via Cycle Benchmarking [15] and post-processing the results of these benchmarking circuits [23], it estimates the Pauli error rates associated to all the errors of weight K-that is, to all the errors that affect up to K qubits simultaneously.
The accuracy of these estimates depends on the nature of the state-preparation and measurement (SPAM) errors afflicting the device in use.If no assumption is made on the SPAM errors, the estimates are averages over small subsets of error rates that typically contain up to two elements.On the contrary, if state-preparation errors are negligible compared to measurement errors or vice-versa (which is the case for many of today's platforms [17,32] and is routinely assumed in related works [7,18,33]), CER can estimate all the error rates individually.
Note that the total number of weight-K errors grows as n K , hence CER is not efficient in K.However, at fixed K, CER scales polynomially in n.This means that CER can efficiently perform an accurate characterisation of the noisy cycle of interest, provided that low-weight errors encompass the majority of the probability distribution.This is often the case on state-of-the-art devices, where high-weight errors (K ≥ 3) occur with negligible probability-as an example, see the CER data in Fig. 10 or the CER data reported in Ref. [23].

Our EM protocols
In this section we describe our EM protocols and their overheads and biases.Without loss of generality, we consider input circuits that alternate cycles of one-qubit gates and cycles of Clifford two-qubit gates, implementing operations of the form Here, Ej (respectively Hj) is the operation implemented by the jth cycle of one-qubit gates (respectively by the jth cycle of two-qubit gates).Under the assumptions A1 and A2 (section 2.1), a noisy implementation of the input circuit performs the map where we have recompiled the gate-independent noise afflicting the cycles of one-qubit gates into that afflicting the cycles of two-qubit gates and obtained the Pauli channels D H j via Randomized Compiling.Motivated by the above equation, for convenience we will refer to the cycles of onequbit gates as "noiseless under compilation" (or simply as "noiseless") and to all the other cycles as "noisy".

Pauli Error Cancellation
We begin by explaining the main ideas behind quasiprobabilistic error cancellation, one of the primary ingredients employed by our PEC protocol.Quasi-probabilistic error cancellation is a strategy to compute an unbiased estimator by sampling from a distribution of biased estimators [31,46].Formally, let U be a desired, noiseless operation, and let { U l } L l=1 be a set of noisy operations that can be implemented experimentally.The task of quasi-probabilistic error cancellation is to calculate a set of probabilities q l , a set of signs s l ∈ {−1, +1} and a number Ctot > 0 (called the "cost") such that for any state ρin and operator O, where δ ≈ 0 represents a residual bias and captures the effectiveness of the EM protocol.All the EM protocols based on quasi-probabilistic error cancellation guarantee a negligible bias δ ≈ 0, provided that the noisy maps U l can be accurately characterised.
We can now present our PEC protocol, which is formally described in section I of the Supplementary Material.PEC takes as input the circuit C (Eq. 2), the Pauli error rates {ϵ } of all the noisy cycles Hj in C (which are computed in advance with CER), an n-qubit state ρin, an operator O such that the spectral norm ||O||∞ ∼ 1, and a number σ ∈ (0, 1) representing the desired standard deviation of the results.It uses quasi-probabilistic error cancellation to suppress the noise afflicting the noisy cycles Hj, and eventually it returns an estimator EPEC(O) of Tr OC ρin .To calculate EPEC(O), PEC requires running N = (Ctot/σ) 2 circuits in total, with cost given by ( Each of these circuits is obtained by appending randomly chosen Pauli gates to the noiseless cycles.Specifically, every circuit in PEC implements an operation of the type where Pj ∈ {I, X , Y, Z} ⊗n is chosen at random with probability ϵ Together with the N circuits, PEC also initialises a list of signs s1, . . ., sN , where s k = 1 if circuit k contains an even number of random Pauli cycles Pj that are different from the identity and s k = −1 otherwise. After initializing circuits and signs, PEC applies Randomized Compiling to every circuit, runs the circuits and stores the results r1, . . ., rN .Finally, it computes the estimator EPEC(O) as The following theorem states the standard deviation and bias of EPEC(O), under the simplifying assumption that the Pauli error rates of every noisy cycle are known exactly: Assuming perfect knowledge of the Pauli error rates is unrealistic for two reasons (more details in section 2): Firstly, estimating all the 4 n error rates of an n-qubit cycle is impractical even for few-qubit cycles, so we can only learn a few of them (e.g. the largest ones).Secondly, the estimates returned by CER are subject to statistical fluctuations.To relax this assumption, we show that our PEC protocol is robust to inaccuracies in the estimates of the Pauli error rates, provided that they are suitably small.Formally:

Lemma 1. (Proof in section III of Supplementary Material). Let ϵ (H j ) l
be the Pauli error rates of the noisy cycle Hj and let ϵ (H j ) l be the estimates computed with CER.Under the assumptions A1 and A2, the estimator E PEC (O) returned by our PEC protocol has bias δ ′ PEC = δ PEC + δrec, where δ PEC is the bias in theorem 1 and To better quantify the bias of PEC, let us assume for simplicity that the error probability is the same for every noisy cycle, i.e., 1 − ϵ (H j ) 0 = ε for all j ∈ {1, . . ., m}.In this case, the bias δPEC in theorem 1 grows quadratically in ε as δPEC = O(mε 2 ).Hence, if the Pauli error rates are known perfectly, PEC can successfully improve the performance of circuits with depth m ≲ ε −2 .More generally, if we assume a fixed relative precision of the Pauli error rates, i.e. ϵ . CER inherently provides Pauli error rates with multiplicative precision, and the relative uncertainty β can be brought closer to zero by improving the quality of the characterization data set (e.g. by increasing the number of CER circuits and the number of shots).Overall, performing an accurate characterization is vital since a high relative uncertainty on the Pauli error rates might negatively impact the residual bias δ ′ PEC .
To conclude the section we analyse the complexity of PEC.To achieve a fixed standard deviation O(σ), PEC requires implementing C 2 tot /σ 2 circuits, where Ctot typically grows exponentially with m.For example, when all the noisy cycles have the same error probability ε, we have Ctot = O((1 − ε) −2m ).Thus, in general PEC (as well as all the other protocols based on quasi-probabilistic cancellation [45]) is inefficient due to the exponential scaling of the cost with the circuit depth.Nevertheless, if applied to circuits with depth m ≲ ε −2 , PEC remains an efficient and practical solution.

Noiseless Output Extrapolation
NOX relies on the ability to amplify the noise afflicting individual noisy cycles in the input circuit.Specifically, it requires replacing the noisy operations D H j Hj with D α H j Hj for integers α > 1.We begin this subsection by explaining how this amplification may be performed, then we describe NOX.
The traditional method to amplify the noise is the socalled "Identity Insertion" [24], which consists of replacing a noisy cycle Hj with Hj(Hj H −1 j ) α .This method is efficient and is used by a number of other EM protocols [3,20,24,28,30], but it is accurate only if two conditions are satisfied: Firstly, if Hj and H −1 j are afflicted by identical noise.Secondly, if the noise and the cycle commute, i.e., D H j Hj = Hj D H j .The first condition is trivially satisfied by cycles for which Hj = H −1 j , for example, by cycles containing a combination of cZ and cX gates.The second condition is satisfied by specific noise processes, e.g. by the n-qubit depolarising channel, but not in general [28].Importantly, CER allows checking if these two conditions are satisfied and to evaluate the accuracy of Identity Insertion before employing it in an experiment.
While we do not attempt to improve Identity Insertion, we propose an alternative method that can correctly amplify arbitrary noise processes.Our method (which we call "Append Errors") takes as input the circuit C, a label j ∈ {1, . . ., m}, the Pauli error rates ϵ (H j ) k of the jth noisy cycle and the amplification factor α > 1.It returns the circuit C ′ (j; k1, . . ., kα−1) = (10) where each Q k l ∈ {I, X , Y, Z} ⊗n is an n-qubit Pauli operator chosen at random with probability ϵ . Since This corresponds to the operation C implemented by the noisy circuit except for the noise on the jth noisy cycle, which is amplified by a factor α.
We can now move onto presenting NOX.NOX (formally described section II of Supplementary Material) takes as input the circuit C, an n-qubit state ρin, an operator O such that ||O||∞ ∼ 1, a number σ ∈ (0, 1) representing the desired standard deviation of the results, an integer α > 1 and a Boolean id insert ∈ {True, False}.It requires running m+1 circuits in total.The first of these circuits is identical to the input circuit, while the other m circuits contain one noisy cycle with noise amplified by a factor α. If id insert = True, the noise amplification is performed with Identity Insertion, otherwise it is performed with Append Errors.Each circuit is implemented m 2 /(α − 1) 2 σ 2 times and yields a noisy estimator of E(O).We denote with Ein(O) the noisy estimator returned by the circuit that is identical to the input one, and by E H j ,α (O) that returned by the circuit with amplified noise on the jth noisy cycle.After running all the m + 1 circuits, NOX returns the quantity This quantity is yet another estimator of E(O), but it is significantly more accurate than the noisy estimators.
The following theorem states its standard deviation and bias, under the assumption that the noise amplification is performed exactly: Overall, while the biases of the noisy estimators Ein(O) and E H j ,α (O) grow linearly with the cycles' error rates, the bias of ENOX(O) only grows quadratically.Note that the bias δNOX also grows linearly with α.Thus, choosing small values of α leads to better performance for NOX.
Assuming that the noise can be amplified exactly is unrealistic, both for Identity Insertion (since the noise may commute approximately but not exactly) and for Append Errors (since inevitable inaccuracies in the estimation of the Pauli error rates may lead to an imperfect amplification).To relax this assumption we prove the following lemma:

Lemma 2. (Proof in section III of Supplementary Material). Let D α
H j Hj be a noisy implementation of Hj with noise amplified exactly by a factor α, and let R H j D H j Hj be an implementation of Hj with noise amplified imperfectly.Under assumptions A1 and A2, the estimator E NOX (O) returned by our NOX protocol has bias δ ′ NOX = δ NOX + δrec, where δ NOX is the bias in theorem 2 and The above lemma is analogous to lemma 1 for NOX, as it proves that an accurate but imperfect (i.e. a realistic) noise amplification can still guarantee a high performance of our PEC protocol.Overall, if the noise is amplified perfectly, the bias of NOX grows quadratically in the cycles' error rate ε as δNOX = O(m 2 ε 2 ), where for simplicity we assume that every cycle has the same error probability ε.If the noise is amplified imperfectly, we can expect δrec to grow linearly in ε as δrec = O(γmε), where γ is proportional to the inaccuracy in the noise amplification.Therefore, performing an accurate noise amplification is vital to ensure that the residual bias δ ′ NOX remains as low as possible.
Unlike PEC, NOX requires running a number of circuits that does not depend on the cycles' error rate.In particular, to achieve the desired standard deviation, NOX requires initialising m + 1 circuits and running each of them O(m 2 ) times.Thus, NOX has runtime O(m 3 ) and is efficient in m.This result may seem to contradict Ref. [45], which shows that the EM protocols are fundamentally inefficient in the circuit depth.However, Ref. [45] only considers protocols that have a fixed bias, independent of the circuit depth, while the bias of NOX grows quadratically with m.
We conclude this section by clarifying the differences between NOX and the existing protocols based on noise amplification.NOX can be seen as a noise-aware generalisation of the "Random Identity Insertion Method" (RIIM) presented in Ref.s [24], which is built around Identity Insertion.Even though NOX and RIIM undertake similar approaches, crucial differences exist between the two protocols.In particular, RIIM targets individual cX gates afflicted by local depolarising noise.Being a noise-agnostic technique, by construction it is unable to correctly amplify (and therefore to suppress) noise processes that do not commute with the cX gates [28].On the contrary, NOX targets entire cycles afflicted by a broad class of noise processes, including non-local and non-depolarising processes.By using Randomized Compiling in combination with CER, NOX can evaluate the the ability of Identity Insertion to correctly amplify the noise, and potentially use Append Errors to ensure a more precise amplification.This makes NOX more reliable than RIIM as well as more widely applicable.

Our experiments
We begin this section by discussing our strategy for testing NOX and PEC.Next, we present the results of our experiments.

Our testing strategy
We conduct both numerical and experimental testing of PEC and NOX.The numerical testing allows us to evaluate the performance of our protocols in an ideal scenario in which the Pauli error rates of every cycle are known exactly and the assumptions A1 and A2 apply.In every simulation we model the cycles' noise based on the CER data collected in the corresponding experiment, and for simplicity we consider noiseless state preparation and measurements.On the other hand, with the experimental testing we investigate the performance of our protocols in a real-world setting, where the noise is known approximately but not exactly and slight deviations from A1 and A2 are to be expected-for example, non-Markovian errors have been previously observed on the chip that we Figure 2: Circuit to generate an n-qubit W state [10].
The gates G t are defined in Eq. 16. use for our experiments [27].
We begin every experiment by characterizing the noise with CER.This typically takes around twenty minutes per noisy cycle.Next, we run the input circuit several times, with and without EM, in order to gather statistics for the final estimators.In addition to PEC and NOX we employ standard readout error mitigation (REM) protocols to mitigate measurement errors [7].The runtimes per repetition are usually within the hour.For example, for the three-qubit quantum phase estimation circuits (which contain m = 11 noisy cycles), for σ = 2% we require around one minute to compute an estimator for the unmitigated circuit, around ten minutes for the NOX estimator and around twenty minutes for the PEC estimator.We note that the errors afflicting idling qubits are the dominant type of error in our device (Fig. 10 in Appendix 6.1).As these errors commute with the noisy cycles in our circuits, to amplify the noise in NOX we use Identity Insertion with α = 3.

W-state circuits
In our first test we implement our protocols on circuits that generate W states. W states are a special type of multipartite entangled states that play a central role in quantum communication, memories and networks [12,50].An n-qubit W state can be written as an equal superposition of all the weight-one basis states, namely as Fig. 2 shows a circuit to produce n-qubit W states in a linear architecture with nearest-neighbouring connectiv- Figure 4: Four-qubit pseudo-random circuits of the type implemented in our third test.Each gate V i,j is a random one-qubit gate.
ity [10].This circuit contains n − 1 controlled gates implementing the operation |0⟩⟨0| ⊗ I + |1⟩⟨1| ⊗ Gt, where Each one of these gates is followed by a cX gate.After recompiling the entangling gates into our native set (cZ gates between nearest neighbours), the resulting circuit contains 3(n − 1) noisy cycles, each one comprising one cZ gate and two identity gates.
In our tests we generate W states with n = 2, 3 and 4 qubits.By measuring these states in the computational basis we estimate the probability associated with each one of the possible outputs.Formally, we compute the quantity pest(s) =Tr Os|Wn⟩⟨Wn| for every projector Os ∈ {|s⟩⟨s| : s ∈ (0, 1) ⊗n }.Fig. 5 shows the estimates for the most frequent outputs obtained in the simulations (Fig. 5a) and in the experiment (Fig. 5b).As it can be seen, all the estimates returned by NOX and PEC concentrate around (or very close to) their ideal value, whereas the unmitigated estimates are generally inaccurate.To better quantify the improvement provided by our EM protocols we calculate the variation distance between {pest(s)} and the ideal probability distribution of the outputs {p id (s)}.As shown in Fig. s 5c and 5d, the variation distances of the mitigated outputs are significantly smaller than those of the unmitigated outputs, with average improvements between 47% and 66% for the experimental outputs (Fig. 1b).
Setting t = 2 and |ψ⟩ = |1⟩, we estimate the parameter κ for a series of gates that perform rotations of the type Decomposed into our native gateset, our QPE circuits contain n = 3 qubits and m = 11 noisy cycles.As with the W-state circuits, we measure all the qubits (target and ancillae) in the computational basis and reconstruct the probability distribution of the outputs.Fig. 6 shows the variation distances between ideal and estimated probability distributions of the outputs obtained in the simulations (Fig. 6a) and in the experiments (Fig. 6b).In both cases the mitigated outputs are significantly more accurate than the unmitigated ones, with average improvements between 42% and 86% in variation distance for the experimental outputs (Fig. 1b).
The better accuracy of the outputs under EM naturally improves the precision of the QPE algorithm.To see this, by post-processing the estimated probability distributions of the outputs of the ancillae we calculate the probability qest( κ|κ) that the QPE algorithm returns κ ∈ {0.00, 0.25, 0.50, 0.75} when κ is the parameter being estimated.Fig. 6c shows the probabilities qest( κ|κ) calculated in the various experiments (solid bars), along with the ideal probabilities q id ( κ|κ) calculated with a noiseless simulation (striped bars).The probabilities qest( κ|κ) obtained in the experiments with PEC and NOX are generally closer to the ideal ones than those obtained in the experiments without EM.To quantify the improvement, we calculate the variation distances between the ideal and estimated probability distributions of the outcomes of QPE for the values of κ chosen in our experiments.As shown in Table 2, NOX and PEC drastically improve the precision of the QPE algorithm in all the cases considered.We repeat the above experiment with t = 3 ancillae, setting |ψ⟩ and κ = 0.5.Decomposed into our native gateset, the resulting QPE circuit contains n = 4 qubits and m = 25 noisy cycles.As opposed to the experiments with t = 2 ancillae, both NOX and PEC return less accurate outputs than the unmitigated circuit and visibly decrease the precision of the QPE algorithm (Fig. 6d).We attribute this unsuccessful result to a series of noise processes (unmodeled non-Markovian errors [27], drift, errors due to an inaccurate amplification of the noise, etc.) that are not mitigated by our protocols.These unmitigated noise processes accumulate along the circuit, resulting in a bias that grows linearly in m and that becomes nonnegligible in deep circuits.This failed test shows us that when the noise processes that are not encompassed by our assumption become dominant, they have the potential to disrupt the performance of our EM protocols.

Pseudo-random circuits
In our third test we target pseudo-random circuits of varying depth of the type shown in Fig. 4.These circuits alternate between cycles containing either one or two cZ gates and cycles containing random one-qubit gates.Fig. 7 shows the variation distances between ideal and estimated  probability distributions of the outputs obtained numerically (Fig. 7a) and experimentally (Fig. 7b).As in our previous tests, the mitigated outputs are visibly more accurate than the unmitigated ones, with average improvements between 32% and 56% in variation distance for the experimental outputs (Fig. 1b).

Relation between input the parameter σ and the standard deviation of the estimators.
In addition to suppressing the bias of the final estimators, our protocols provide guarantees about their statistical fluctuations.In particular, choosing a specific value for the input σ guarantees O(σ) standard deviation for every estimator, at the cost of running a number N = O(σ −2 ) of circuits.To verify the relation between the input σ and the standard deviation, we test numerically the performance of NOX on a two-qubit W-state circuit for different values of σ.As shown in Fig. 8a, smaller values of σ lead to estimators that are statistically more accurate, which confirms the expected relation between σ and the standard deviation under ideal experimental conditions.
In a real-world setting, a number of uncontrollable factors (such as drift in the noise afflicting the device in use) may inevitably prompt fluctuations in the estimators, limiting our ability to attain the desired statistical accuracy.To see how this may affect our protocols, we repeat our two-qubit test experimentally (Fig. 8b).We find that for σ ≳ 2% the standard deviation of the results decreases with σ as expected, whereas for σ ≲ 2% the standard deviation remains approximately constant.Due to the inherent fluctuations of the noise afflicting the device in use, implementing NOX with σ < 2% requires running more circuits than with σ = 2%, but it does not improve the statistical accuracy of the estimators.In other words, our EM protocols cannot provide performance guarantees below the noise floor of the device.Overall, the results shown in Fig. 8b highlight the importance of the assumption A1 in the context of EM and call for methods to suppress drifts.

Conclusions
While fault-tolerance remains a long-term goal, understanding how to improve the performance of the existing noisy quantum computers is of timely importance.By leveraging cutting-edge protocols for noise reconstruction, we have developed PEC and NOX and experimentally tested their effectiveness and practicality on a four-qubit superconducting chip.The results of our tests demonstrate that both of our protocols can significantly enhance the performance of the noisy quantum circuits implemented on existing hardware platforms.
The previous EM protocols based on noise reconstruction are centered around GST [13,46].Since GST is inefficient in the number of qubits, these protocols have been tested on circuits containing up to two qubits [42,51].Implementation on larger circuits required enhancing the noise-reconstruction process with machine learning tools, at the price of increased complexity and runtime [43].On the contrary, being robust to all the main noise processes that naturally occur in multi-qubit systems, our protocols provide the tools to increase the performance of platforms with an arbitrary number of qubits, provided that they suffer moderate levels of noise.
Going forward, it is important to study how EM can help bridge the gap between today's noisy devices and tomorrow's fault-tolerant quantum computers (FTQC).Recent works showed how EM can reduce various types of logical errors in FTQCs, such as errors due to insufficient code distances [40] or imperfect magic-state distillation [40,44].Due to their ability to suppress multi-qubit errors, we anticipate that our EM protocols may be helpful to suppress multi-qubit physical errors that have a higher weight than the code corrects, and consequently to reduce the errors at the logical level.We leave this point open for future works.
Note added.While editing the final version of this manuscript, we became aware of related work that also employs efficient methods for noise reconstruction to enhance error cancellation [47].Our protocols have been developed independently and around the same time as that in Ref. [47].
CER data.Fig. 10 shows the data obtained in two different implementations of CER.We note that in both figures, the errors afflicting the idling qubits dominate the probability distributions.This is due to the nature of the cZ gates employed in this work, which utilized off-resonant drives to implement tunable ZZ interaction between two fixed-frequency transmon qubits [38].These off-resonant drives can induce phase errors on spectator qubits if the driving tones are far off-resonant, or induce partial Rabi driving on spectator qubits (i.e.X-or Y -type errors) if the driving tones are near-resonant.The cZ gates also contain signals (with equal amplitude but opposite phase) which attempt to null any conditional errors on the neighboring spectator qubits, but if the nulling of these crosstalk terms is imperfect (as we see in Fig. 10), then errors on the idling qubit can dominate the CER results.
We do not observe significant fluctuations of the cycles' error rates over periods of several hours.Thus, to minimize the runtime, we avoid taking new CER data in between different repetitions of the same circuit.See the Supplemental Material for Ref. [23] for further details about CER and errors on this device.
RCAL data.To obtain the RCAL data, we implement circuits of two different types.Firstly, we implement circuits with an identity on every qubit.Secondly, circuits with a Pauli-X on every qubit.We use the relative frequency of the outputs 0 and 1 on every qubit to estimate state-dependent measurement errors.Fig. 9 shows RCAL data taken on September 7, 2021.As it can be seen, the probability that an outcome 0 is flipped to 1 is below 1% for every qubit, while the probability that an outcome 1 is flipped to a 0 is around 2% on average (Fig. 9).
We collect the RCAL data at the beginning of each experiment, and we avoid taking new RCAL data while running the experiment in order to minimise the runtime.We use the RCAL data to mitigate the readout noise via REM [7].By applying REM we observe a visible improvement of the results for W-state (Fig. s 11a).On the contrary, when we apply REM to circuits with a higher twoqubit gate count (Fig. s 11b and 11c), we obtain outputs that are equal to the unmitigated ones within error bars.This remarks the fact that mitigating both cycles' and readout errors can be significantly more beneficial than mitigating readout errors alone, especially for circuits that contain a large number of two-qubit gates.where in the last line we used Using 2 , which proves the theorem for m = 1.
The generalisation to m > 1 follows easily by linearity.Averaged over all the random Pauli operators, for m > 1 the quantity E PEC (O) returned by PEC equals where s l j = 1 if lj = 0 and s l j = −1 otherwise.Following the same arguments as for m = 1 we find where we omitted terms that are of higher order in the Pauli error rates and we defined Using C H j ≤ Ctot, Eq. 28 and the triangle inequality we finally find II. Formal description of NOX and proof of theorem 2 We now provide a formal description of our NOX protocol, then we show a proof of theorem 2. The formal description of NOX is given in the following box: Box 2. Noiseless Output Extrapolation (NOX). Inputs.
An n-qubit quantum circuit  We now prove theorem 2.
Proof.(Theorem 2).We begin by proving that E NOX (O) has standard deviation O(σ).Since all the noisy estimators Ein(O) and E H j ,α (O) are calculated independently by running each circuit N = m 2 /(α − 1) 2 σ 2 times, they all have the same standard deviation σ = O(N −1/2 ).Using the formula for propagation of errors, we find that the standard deviation of E NOX (O) is To calculate the bias, let us first define the maps δ H j = D H j − I for j ∈ {1, . . ., m}.Note that δ H j = (1 − ϵ (H j ) 0 )(Qj − I) for some Pauli channel Qj.Hence, if the probability of error 1 − ϵ (H j ) 0 is sufficiently small (as is the case for the noisy cycles in state-of-the-art platforms) we have ||δ 2 where For every j, the quantity Aj is a multiple of E H j ,α (O) − Ein(O), modulo terms that are second order in δ H j .Indeed, since D α H j = (I + δ H j ) α = I + αδ H j + α(α − 1)δ 2 H j /2 + O(δ 3 H j ) we have = (α − 1) Aj + Bj + Cj , (36) where is quadratic in the δs and Cj is cubic.Thus, combining Eq.s 33 and 35 we find (1 − ϵ We end the section by proving the following lemma, which we use to derive Eq. 33. Proof.(Lemma 3).We prove the lemma using induction.Eq. 42 holds trivially for m = 1, since the l.h.s equals D1U1 and the r.h.s.equals U1 + δ1U1 = D1U1.To complete the induction we now assume that Eq. 42 holds for a given m > 1 and we prove the equality for m + 1.For m + 1 the r.h.s. of Eq. 42 can be rewritten as =δm+1Um+1 Φm + Um+1 Φm (45) =Dm+1Um+1 Φm .(46) Since Dm+1Um+1 Φm = Φm+1, the equalities above show that if Eq. 42 holds for a given m, then it also holds for m + 1.This proves the lemma.

III. Proof of Lemmas 1 and 2
In this section we prove our lemmas 1 and 2. We begin by proving lemma 1.

Figure 1 :
Figure 1: (a) Micrograph of the superconducting quantum processor at the Advanced Quantum Testbed (reprinted with permission from Ref.[23]).In this work, we used the four transmon qubits (green) with independent drive lines (blue) out of a total of eight qubits arranged in a ring geometry-the other four qubits are inactive on the device.The qubits are coupled to nearest neighbors via coupling resonators (CR, purple), and are dispersively measured via independent readout resonators (red) coupled to a multiplexed readout bus (MRB, cyan).(b) Average improvements in variation distance obtained in successful implementations of NOX and PEC protocols.In our experiments we apply PEC and NOX to four-qubit random circuits of varying depth m as well as to structured circuits, such as circuits to prepare W states with n = 2, 3, 4 qubits (Eq.15) and to estimate a parameter κ through the quantum phase estimation algorithm.We compute the variation distance (VD, Eq. 17) between the ideal and experimental probability distributions of the outputs, and we quantify the improvement in VD under EM as 1 − D EM /D unm , where D EM (D unm ) is the VD for the mitigated (unmitigated) outputs.We observe drastic improvements in VD for both NOX and PEC, ranging from 32% to as high as 86%.

Figure 3 :
Figure 3: Circuit to perform the QPE algorithm.In our tests we set U = R Z (κ), where R Z (κ) is defined in Eq. 18, and run the algorithm for different values of κ.
Numerical testing, measured estimators for the most frequent outputs.Experimental testing, measured estimators for the most frequent outputs.

Figure 5 :
Figure 5: Summary of the results obtained for the W-state circuits.Figs.(a) and (b) show the estimated probabilities for the most frequent outputs obtained in the simulations and experiments, respectively.Figs.(c) and (d)show the variation distances between ideal and estimated probability distributions.In every figure the dots correspond to the actual data, the squares represent their means, the bars their standard deviations, and the dashed lines their ideal values.Note that for every n, the numerical estimates for the unmitigated circuit in (a) concentrate around similar values, which are close to the ideal value for n = 2 and below the ideal value for n = 3, 4. Instead, the corresponding experimental estimates in (b) are subject to larger fluctuations, and in some cases they are above the ideal value (see, for example, the estimates for the output 01).This is due to the presence of coherent errors in the experimental implementation of the unmitigated circuit.These coherent errors are tailored by Randomized Compiling and reported by CER in the form of stochastic Pauli errors.As a result, our simulations (which model the noise based on CER data) do not capture the full impact of these coherent errors.

Figure 6 :
Figure 6: Summary of the results for the QPE circuits.Figs.(a) and (b) show the variation distances between ideal and estimated probability distributions of the outputs obtained in the simulations and in the experiments for our QPE experiments with t = 2 ancillae.The dots correspond to the actual data, the squares represent their means and the bars their standard deviations.Fig. (c) shows the estimated probabilities that the QPE algorithm with t = 2 ancillae returns a given parameter κ est -the striped bars are calculated with noiseless simulations, the solid bars are calculated by averaging over the experimental outputs.Fig.(d) shows the estimated probabilities that the QPE algorithm with t = 3 returns a given parameter κ est .
Numerical testing with random circuits.
Experimental testing with random circuits.

Figure 7 :
Figure 7: Summary of the results for the pseudo-random circuits.Figs.(a) and (b) show the variation distances between the ideal and estimated probability distributions of the outputs obtained in the simulations and experiments, respectively.The dots correspond to the actual data, the squares represent their means and the bars their standard deviations.

Figure 8 :
Figure 8: Summary of the results obtained by applying NOX on a two-qubit W-state circuit at different values of σ.Figs.(a) and (b) show the variation distances between ideal and estimated probability distributions of the outputs obtained numerically and experimentally, respectively.The dots correspond to the actual data, the squares represent their means and the bars their standard deviations (which are reported in detail in the white boxes).

Figure 9 :
Figure9: Readout calibration (RCAL) estimates obtained on September 7, 2021 by running around 10,000 calibration circuits.The l.h.s.column contains qubit labels, the r.h.s.column contains estimates of the probabilities of no error-specifically, for i ∈ {0, 1}, P(i|i) represents the conditional probability that a measurement returns i, given that i is expected outcome.

Figure 10 :
Figure10: A map of the Pauli error rates (Eq. 1) for different four-qubit cycles performed by the chip in Fig.1a, each containing either one or two cZ gates.The errors on the l.h.s.correspond to the Pauli errors afflicting the cycles, the colormap indicates the estimated probabilities for the Pauli errors in the plot, and the gradient across each cell defines the 90% confidence interval of each estimate.All the errors with negligible probabilities are truncated and are not displayed.The first row of subplots shows the weight-one errors acting on the idling qubits.The second row shows the weight-one and weight-two errors on the idling qubits.The third row shows the weight-one and weight-two errors on the non-idling qubits.The fourth and fifth rows show correlated errors afflicting more than two qubits.

Figure 11 :
Figure 11: Summary of the results obtained in the various experiments by applying NOX+REM, by applying REM alone and by applying no mitigation.As it can be seen, REM improved the outputs of our W-state (Fig. a), but it did not significantly improve the outputs of our QPE and random circuits (Fig.s b, c).

2. 2
Initialise a circuit C ′ Hj ,α as in Eq. 10.Apply the circuit C ′ Hj ,α to ρ in a total of N times with Randomized Compiling.Estimate the expectation value E Hj ,α (O) of O at the end of the circuit.Outputs.The quantityE NOX (O) = E in (O) α−1+m α−1 − m j=1 E H j ,α (O) α−1 .

Lemma 3 .
Let Φm = • m j=1 Uj, Φm = • m j=1 Dj Uj and Dj = I + δj for all j ∈ {1, . . ., m}.The following equality holds: in section I of Supplementary Material).Let the Pauli error rates of every noisy cycle Hj be known exactly.Under the assumptions A1 and A2 (section 2.1), the number E PEC (O) returned by our PEC protocol is an estimator of E(O) with standard deviation O(σ) and bias

Table 2
Values of VD an n-qubit state ρ in , an operator O such that the spectral norm ||O|| ∞ ∼ 1, a number σ ∈ (0, 1), an integer α > 1 and a Boolean id insert ∈ {True, False}.Initialise the number N = m 2 /(α − 1) 2 σ 2 .2.Ifid insert is True: 2.1 Apply the circuit C to ρ in a total of N times with Randomized Compiling.Estimate the expectation value E in (O) of O at the end of the circuit.2.2 Initialise a circuit C ′ Hj ,α by replacing H j with H j (H j H −1 j ) α−1 in C. Apply the circuit C ′ Hj ,α to ρ in a total of N times with Randomized Compiling.Estimate the expectation value E Hj ,α (O) of O at the end of the circuit.Apply the circuit C to ρ in a total of N times with Randomized Compiling.Estimate the expectation value E in (O) of O at the end of the circuit.