Fully scalable randomized benchmarking without motion reversal

We introduce binary randomized benchmarking (BiRB), a protocol that streamlines traditional RB by using circuits consisting almost entirely of i.i.d. layers of gates. BiRB reliably and efficiently extracts the average error rate of a Clifford gate set by sending tensor product eigenstates of random Pauli operators through random circuits with i.i.d. layers. Unlike existing RB methods, BiRB does not use motion reversal circuits -- i.e., circuits that implement the identity (or a Pauli) operator -- which simplifies both the method and the theory proving its reliability. Furthermore, this simplicity enables scaling BiRB to many more qubits than the most widely-used RB methods.


I. INTRODUCTION
Randomized benchmarking (RB)  is a family of protocols that assess the average performance of a quantum processor's gates by running random circuits.RB experiments are ubiquitous, yet the most widely-used RB protocols have important limitations that are caused by the kind of random circuits they use.Most RB protocols use motion reversal circuits that, if run without errors, implement the identity (or a Pauli) operator [1, 4-6, 9, 10, 18, 22] [Fig.1(a)].This makes errors easily visible: each RB circuit, when run perfectly, always outputs a particular bit string, so the observation of any other bit string implies that an error occurred.However, random motion reversal circuits must end with an inversion subroutine that undoes the preceding layers.The inversion subroutine causes challenges for RB theory [1,6,7,21,[23][24][25][26] as well as practical problems.In most existing RB techniques-including standard Clifford group RB (CRB) [6] and its streamlined variant direct RB (DRB) [18]-the size of the inversion subroutine grows quickly with the number of qubits [27][28][29] [see Fig. 1(a)], severely limiting their applicability outside of the few-qubit setting [1,3,18].
In this work, we demonstrate that motion reversal circuits are not required for reliable RB by introducing binary randomized benchmarking (BiRB).BiRB is an efficient and scalable protocol for estimating the average error rate of a Clifford gate set.BiRB's circuits [Fig.1(b)] consist of d i.i.d.layers of gates and two layers of single-qubit gates, for state and measurement preparation, and the measurement results are processed to obtain a binary-outcome Pauli measurement result.BiRB works because the average fidelity of highly scrambling random circuits decays exponentially in depth [18,21,24] and, for Clifford circuits, this fidelity can be efficiently estimated using random local state preparations and measurements [30,31].Our method's local state preparation and measurement enables benchmarking of many more qubits than most existing RB techniques-including CRB and DRB-as shown in Fig. 2. Furthermore, we show that BiRB is more accurate than mirror RB (MRB) [1,21], which is the only other Measure Pauli s' RB without motion reversal.(a) Standard RB methods use motion reversal circuits, which make errors easily visible but add complexity and limit scalability.(b) BiRB reliably estimates average gate error rates without motion reversal by tracking a single stabilizer of a random product state through a random circuit.(c) Results from CRB, DRB, and BiRB on ibm hanoi show that BiRB is more scalable than both CRB and DRB.DRB estimates the same error rate as BiRB (r Ω ), and we find that their error rates are consistent, providing evidence for the reliability of BiRB.
scalable RB protocol for Clifford gate sets.BiRB connects RB and cross entropy benchmarking (XEB) [32][33][34][35], another form of randomized benchmark.In contrast to RB, XEB uses random circuits consisting solely of i.i.d.(composite) layers of gates-these layers are typically sampled from a universal gate set, but a scalable form of XEB using Clifford gate sets has also been introduced [35].While XEB circuits have no overhead from subroutines, in practice XEB decay curves exhibit non-exponential behavior at low depths for some Markovian error models, and therefore measuring a reliable error rate requires circuits with at least O(n) depth [36].The exact circuit depths required for exponential decay depend on the connectivity and gate set and must be estimated numerically for each distribution of layers benchmarked, adding additional complication to performing XEB.This issue arises in part because XEB estimates the fidelity of random circuits using the (linear) cross entropy, which is . BiRB and conventional RB on IBM Q. Results from CRB, DRB, and BiRB on ibm hanoi show that BiRB is more scalable than both CRB and DRB.DRB estimates the same error rate as BiRB (r Ω ), and we find that their error rates are consistent, providing evidence for the reliability of BiRB.not an accurate fidelity estimator for general Markovian noise models [34,37].BiRB shows how to add minimal overhead to circuits of i.i.d.layers to obtain a provably reliable RB protocol.
The remainder of this paper is structured as follows.In Section II we introduce our notation and review the existing results on which our method relies.In Section III we introduce the BiRB protocol.In Section IV we present a theory of BiRB that shows that our method is reliable: it accurately estimates the average error rate of an n-qubit circuit layer under assumptions commonly used in RB theory (e.g., Markovian errors).In Section V we demonstrate the reliability of our method with numerical simulations of BiRB on gate sets that experience both stochastic Pauli errors and (coherent) Hamiltonian errors.In Section VI, we demonstrate BiRB on IBM Q processors and validate it against the results of DRB and MRB.We then conclude in Section VII.

A. Definitions
In this section, we introduce our notation.An n-qubit layer L is an instruction to perform a particular unitary operation on those n qubits, typically specified in terms of 1-and 2-qubit gates.We use U(L) ∈ SU(2 n ) to denote the unitary corresponding to L. The layers we use are randomly sampled, and we often treat a layer L as a layer-valued random variable.We use Ω : L → [0, 1] to denote a probability distribution over the set of layers L. We use L −1 to denote an instruction to perform the unitary where we use the convention that the circuit is read right to left.
For a layer (or circuit) L, we use U(L) to denote the superoperator representation of its perfect implementation, i.e., U(L)[ρ] = U(L)ρU † (L).We use ϕ(L) to denote the superoperator for an imperfect implementation of L, and we assume ϕ(L) is a completely positive trace preserving (CPTP) map.A layer L's error map is defined by E L = ϕ(L)U † (L).The entanglement fidelity (also called the process fidelity) of ϕ(L) to U(L) is defined by where φ is any maximally entangled state of 2n qubits [38], and P n is the set of all n-qubit Pauli operations with ±1 global sign.Throughout, we use the term "(in)fidelity" to refer to the entanglement (in)fidelity.The polarization is a rescaling of fidelity given by where P * n = P n \ {±I n }, and I n denotes the n-qubit identity operator.We say that a state |ψ⟩ is stabilized by a Pauli operator P if P|ψ⟩ = |ψ⟩.An n-qubit stabilizer state |ψ⟩ is a state that is stabilized by exactly 2 n Pauli operators.Equivalently, a stabilizer state is a state that can be prepared from |0⟩ ⊗n using only Clifford gates [27].The stabilizer group of a stabilizer state |ψ⟩ is S ψ = {P ∈ P n | P|ψ⟩ = |ψ⟩}.We use S * ψ to denote all non-identity elements of the stabilizer group, i.e., S * ψ = S ψ \ {I n }.

B. Ω-distributed random circuits
BiRB uses Ω-distributed random circuits [1,18,21,24], which we now review.Ω-distributed random circuits consist of n-qubit layers of gates sampled from a distribution Ω(L) over a layer set L. In this work, we restrict L to contain only Clifford gates.These circuit layers can be chosen to consist of a processor's native gates, or simple combinations thereof, thus eliminating the need for complicated compilation.
Ω-distributed random circuits are also used in DRB and MRB [1,18,21,24].DRB and MRB are reliable if Ω satisfies certain conditions, and these same conditions are required for BiRB to be reliable.We require that the circuits generated by layers sampled from Ω are highly scrambling, meaning that for all Pauli operators P, P ′ I n , there exists constants k ≪ 1 /ε and δ ≪ 1 such that Here, P[ρ] = PρP and P ′ [ρ] = P ′ ρP ′ are Pauli superoperators, L 1 , . . ., L k are Ω-distributed random layers, and ε is the expected infidelity of an Ω-distributed random layer [21,24].Informally, this condition means that an error is locally randomized (i.e., its basis is randomized over the X, Y, and Z bases) and delocalized across multiple qubits before a second error is likely to have occurred.While we require that k ≪ 1 /ε for our theory, this condition on k can be relaxed when n ≫ 1, because errors that occur on spatially separated qubits in close succession cannot cancel at all (see Refs. [21,24] for details).

C. The RB error rate
BiRB's output is an error rate r Ω that quantifies the error in random n-qubit layers sampled from Ω. BiRB's r Ω closely approximates an independent, physically motivated error rate ϵ Ω -which is closely related to the average layer infidelityintroduced in Refs.[21,39] and reviewed here.ϵ Ω is defined by the rate of decay of the fidelity of Ω-distributed random circuits.The expected fidelity of depth-d Ω-distributed random circuits C d is given by The scrambling requirements on Ω (see Section III A) ensure that Fd [7] decays exponentially, i.e., Fd ≈ Ap d rc + B for constants A, B, and p rc .The average error rate of layers sampled from Ω is then defined as [21,39] ϵ Ω = This rescaling of p rc is used because p rc corresponds to the effective polarization of a random layer in an Ω-distributed random circuit-i.e., the polarization in a depolarizing channel that would give the same fidelity decay-so ϵ Ω is the effective average infidelity of a layer sampled from Ω.When stochastic Pauli errors are the dominant source of error, ϵ Ω is approximately equal to the average layer infidelity, but this is not true more generally because gate infidelity is not "gauge-invariant"-see Refs.[21,23,25,39] for details.

D. Direct fidelity estimation
Our protocol can be interpreted as an application of direct fidelity estimation (DFE) [30,31] to varied-depth random Clifford circuits, so we now review DFE for the special case of Clifford circuits.Consider a Clifford circuit C and an imperfect implementation of that circuit ϕ(C) = U(C)E C , where E C denotes the overall error map of the circuit.Using Eq. ( 5), the polarization of E C can be written as where s ′ = U(C)sU(C) † is a Pauli operator that can be efficiently computed classically [27], because C is a Clifford circuit.Eq. ( 12) implies that polarization of E C can be efficiently estimated as follows: (1) sample Pauli operators uniformly from P * n , (2) for each sampled Pauli operator s, apply ϕ(C) to s and measure the evolved Pauli operator s ′ , and (3) average the measurement results.It is not physically possible to directly apply ϕ(C) to a Pauli operator s (Pauli operators are not valid quantum states), but DFE simulates doing so by applying ϕ(C) to randomly sampled eigenstates of s.BiRB also uses this approach, but, unlike DFE, BiRB is robust to state preparation and measurement (SPAM) error.BiRB separates SPAM error from gate error by applying DFE to variable-depth circuits and extracting gate error from the rate of decay of the polarization-as in cycle benchmarking [40] and Pauli noise learning techniques [41][42][43].

III. THE BINARY RB PROTOCOL
We now introduce BiRB circuits (Section III A) and the BiRB protocol (Section III B).

A. Binary RB circuits
We now state the procedure for constructing BiRB circuits [Fig.3(b)].Each BiRB circuit first generates an eigenstate of a random Pauli operator s, then applies a depth d random circuit, and then ends with a measurement of the evolved Pauli operator s ′ .A width n, benchmark depth d, Ω-distributed BiRB circuit is a circuit C = L d+1 L d • • • L 1 L 0 that begins with preparing |0⟩ ⊗n and ends with a computational basis measurement, and has layers sampled as follows: 1. Sample a uniformly random n-qubit Pauli operator s ∈ P * n and a uniformly random state |ψ(s)⟩ from the set of tensor product stabilizer states stabilized by s.L 0 is a layer of single-qubit gates that prepares |ψ(s)⟩.
2. L 1 , L 2 , . . ., L d are layers sampled from Ω.These layers form the core circuit, which is a depth-d Ω-distributed random circuit.
3. L d+1 is a layer of single-qubit gates that transforms into a tensor product of Z and I operators.
The circuit has an associated "target" Pauli operator If implemented without errors, the bit string b output by C will correspond to a +1 eigenstate of s C , i.e., s C |b⟩ = |b⟩.
Step (1) can equivalently be formulated as (i) sampling a random unsigned Pauli P, (ii) picking a random tensor product stabilizer state that is an eigenstate of P. Sampling from both +1 and −1 eigenstates ensures accurate fidelity estimation when there are non-unital errors in the circuits (see Section IV).
There is not a unique choice for either the initial layer (L 0 ) or the final layer (L d+1 ) of gates in BiRB circuits.These layers may be chosen deterministically or at random from the set of all possible layers of single-qubit Clifford gates satisfying the criteria above.In our simulations and experiments, we choose to randomize L 0 , but this is not required for our theory.There will always be a possible final layer satisfying the requirements in step 3. We can construct such a layer as follows: Let s ′ = n i=1 s ′ i , where s ′ i denotes the single-qubit Pauli operator acting on qubit i.On qubit i, apply

B. Binary RB protocol
The BiRB protocol is similar to other RB protocols: run BiRB circuits, compute a figure of merit for the circuits of each benchmark depth, then fit an exponential decay.A BiRB experiment is defined by a layer set L, a sampling distribution Ω, and the usual RB sampling parameters (a set of benchmark depths d, the number of circuits K sampled per depth, and the number of times N each circuit is run).Our protocol is the following: 1.For a range of integers d ≥ 0, sample K Ω-distributed BiRB circuits with benchmark depth d, and run each circuit N ≥ 1 times.
2. For each circuit C, estimate the expected value ⟨s C ⟩ of the target Pauli observable s C from the computational basis measurement results.Then, compute the average over all circuits of benchmark depth d, 3. Fit fd to an exponential, where A and p are fit parameters.Then compute [44] r In Appendix B, we show that the number of circuits per depth (and therefore the total amount of data) required by our protocol to estimate r Ω to within a fixed relative uncertainty is independent of n.For a fixed total number of shots per depth KN, it is statistically optimal to maximize the number of random circuits K and set N = 1.However, typically N > 1 for practical reasons-e.g., because of the time cost of generating and loading many distinct circuits onto the processor.See Refs.[45][46][47] for more detailed statistical analyses of RB protocols.
Note that if L is chosen to be the set of all n-qubit Clifford gates and Ω is the uniform distribution, then we obtain a version of standard RB (i.e., RB of the Clifford group) without an inversion gate.See Appendix A for further discussion of this variant of BiRB, whose reliability can be proven using the unitary 2-design twirling theory that underpins the theory of standard RB [6,7].

IV. THEORY OF BINARY RB
We now show that the error rate measured by BiRB [r Ω , Eq. ( 17)] is a close approximation to the average layer error rate ϵ Ω [Eq.( 8)].In Section IV A we show that BiRB estimates the expected fidelity of depth-d Ω-distributed circuits.In Section IV B, we show that this quantity decays exponentially in d, which allows us to conclude that the BiRB error rate is approximately the average layer error rate, i.e., r Ω ≈ ϵ Ω .

A. Relating measurement results to circuit polarizations
We start by showing that fd [Eq. (15)] is approximately equal to the expected polarization [Eq.( 4)] of an error map consisting of the composition of (1) the error map of a depth-d Ω-distributed random circuit, and (2) an error map absorbing all state preparation and measurement (SPAM) error.We then argue that the contribution of SPAM errors is approximately depth-independent and can be factored out, so that fd equals the polarization [Eq.( 4)] of the error map of a random, depthd Ω-distributed random circuit, multiplied by a d-independent prefactor.
We consider a BiRB circuit C with benchmark depth d and gate-dependent error channels on L 1 , L 2 , . . ., L d , i.e., ϕ(L) = E L U(L).We model the error on L 0 and state preparation as a gate-independent global depolarizing channel E 0 directly after L 0 .We model the error on L d+1 and readout as a single, gate-and measurement-independent global depolarizing channel E d+1 occurring directly before L d+1 .Therefore, the superoperator representing the imperfect implementation of the circuit C is given by We first rewrite the error in the circuit in terms of the core circuit's error map.We have where , and Now we show that fd is the expected polarization of E tot .fd is the expectation value of circuit C's target Pauli operator s C [Eq. ( 14)] averaged over all benchmark depth-d BiRB circuits, i.e., where |ψ(s)⟩ = U(L 0 )|0⟩ ⊗n is a uniformly random state from the set of all tensor product states stabilized by s.Substituting in We now average over |ψ(s)⟩.To do so, we first expand the initial state |ψ(s)⟩ in terms of its stabilizer group: To get from Eq. ( 24) to Eq. ( 25), we use the fact that E s∈P * n Tr(sE tot [I n ]) = 0 because we are averaging over signed non-identity Pauli operators.The symmetry properties of the set of all local +1 eigenstates of s guarantee that the last term of Eq. ( 25) vanishes (see Appendix C), so that Eq. ( 25) becomes Eq. ( 27) says that fd , which is measured in our protocol, is the expected polarization of E tot .This error map is the composition of (1) the error map of an Ω-distributed random circuit and (2) the error maps of the state preparation and measurement layers.Because E 0 and E d+1 are (by assumption) global depolarizing channels, we have If E 0 and E d+1 are stochastic Pauli channels (but not necessarily global depolarizing channels), or if E L 1 ,...,L d is a stochastic Pauli channel, then Eq. ( 28) holds approximately.Specifically, (29) where ε SPAM is the infidelity of is determined by the amount of error cancellation between E 0 E d+1 and E L 1 ,...,L d [1].At low depths d, this correction term is small because ε L 1 ,...,L d is small, and at depths d ≳ k [where k is the small constant in Eq. ( 6)], this term is small because the scrambling condition for Ω-distributed random layers implies that errors in that circuit are randomized and spread over many qubits.Eq. ( 29) relies on the assumption of stochastic Pauli errors, and randomized compilation theory [49] implies that this can be enforced by (1) choosing Ω so that the distribution of U(L) is invariant under left and right multiplication by Pauli operators, and (2) randomizing L d+1 and L 0 .However, in practice, we find that these conditions on BiRB's circuits are not required, because Ω-distributed circuits rapidly scramble errors.This makes error cancellation negligible after constant depth k [24], implying that Eq. ( 28) hold to a good approximation for all kinds of small Markovian errors.

B. Deriving the exponential decay model
Our theory so far shows that fd [Eq. (15)] is equal to the polarization of depth-d Ω-distributed random circuits multplied by a depth-independent prefactor.Recent work [1,21,24] has shown that the polarization of Ω-distributed random circuits decays exponentially-from which it follows that r Ω ≈ ϵ Ωgiven the scrambling condition [Eq.( 6)] that we require of Ω and L. This is because Eq. ( 6) implies that errors within Ω-distributed random circuits cancel with negligible probability, which implies that the polarization of the BiRB core circuit is closely approximated by the product of the polarizations of its constituent layers (the error in this approximation is O (dε(δ + kε)), which is negligible for small δ, where δ is as defined in Eq. ( 6) [21]).Because the polarizations of Ωdistributed layers approximately multiply, fd decays exponentially, i.e., fd ≈ Ap d (30) for some A and p, and r Ω ≈ ϵ Ω .
Here, we give an alternate, complementary proof that fd decays exponentially, which uses the "L superchannel" framework from Refs.[24,25] and is similar to the most accurate theories for standard RB [23,25,26].We start by expressing fd [Eq. (26)] in terms of d applications of a linear operator acting on superoperators (i.e., a "superchannel"), given by When E L = I for all L ∈ L, L has two unit eigenvalues (λ 0 , λ 1 ) and all other eigenvalues (λ i , i > 1) have absolute value strictly less than 1 [24].The following theory requires that the gate errors are sufficiently small that this gap between the unit and non-unit modulus eigenvalues is preserved [50].For this theory, we do not require that E d+1 and E 0 are global depolarizing channels.Eq. ( 26) can be expressed in terms of L as In our theory so far, including our definition of L [Eq. ( 20)], we have used a particular representation of the imperfect gate set-the imperfect gates are given by {E L U(L) | L ∈ L}.However, we can express fd in terms of a different representation of these gates with identical predictions, by performing a gauge transformation [51], i.e., we represent the gates as where M is an invertible matrix.Below, we re-express the gate set in a particular gauge defined in terms of L. Let W = E 1 + E λ , where E 1 and E λ are eigenoperators of L with eigenvalues 1 and λ, respectively (as defined in Ref. [24], Proposition 3) and where λ is the second largest eigenvalue of L. Using the gauge-transformed gate set where and If we assume that Ẽd+1 = Dmeas , where Dmeas is a global depolarizing channel (which commutes with all unitary superoperators), it follows from Eq. ( 33) that Ref. [24] (Proposition 3) shows that L(W) = D λ W, where D λ is a global depolarizing channel with polarization λ.Therefore, Therefore, fd decays exponentially in depth, at a rate determined by λ (the second largest eigenvalue of L).Furthermore, Proposition 4 of Ref. [24] implies that λ is the average polarization of Ω-distributed layers computed in a particular gauge that is defined by L.
V. SIMULATIONS In this section, we present simulations of BiRB that show that it reliably estimates the average layer error rate ϵ Ω .

A. BiRB with stochastic and Hamiltonian errors
To demonstrate that BiRB accurately estimates ϵ Ω under broad conditions, we ran simulations of BiRB with varied error models containing stochastic Pauli and Hamiltionan errors.We simulated BiRB on n = 1, 2, and 4 qubits with all-toall connectivity using the layer set consisting of all possible n-qubit layers constructed from parallel applications of Xπ /2 , Yπ /2 , and cnot gates.These layers were sampled so that the expected density of cnot gates in a layer is ξ = 1 /4, and each of the two single-qubit gates appears with equal probability.
We simulated BiRB with three types of error models for these gates: (1) Pauli stochastic errors, (2) Hamiltonian errors, and (3) Pauli stochastic and Hamiltonian errors.To generate each error model, we assign each gate random error rates specified using elementary error generators [52].For each kqubit gate (k = 1, 2), we specify a post-gate error of the form e G for each of {Xπ /2 , Yπ /2 , cnot}, where Here, S 1 , S 2 , . . ., S 4 k −1 denote the k-qubit stochastic Pauli error generators, and H 1 , H 2 , . . ., H 4 k −1 denote the k-qubit Hamiltonian error generators.For each error model, we sample s i and h i at random (see Appendix D for details) to produce a range of expected layer error rates.These models contain no crosstalk errors (but our theory encompasses error models with crosstalk errors) and no state preparation or measurement error.Figure 4 shows the results of these simulations.
to the estimate of the BiRB error rate per qubit in each simulation, separated into the three families of error models.Error bars (1σ) are shown, computed using a standard bootstrap (there are error bars on ϵ Ω as well as on r Ω because ϵ Ω is computed by random sampling).We observe that for each error model, r Ω approximately equals ϵ Ω , as predicted by our theory of BiRB.The statistical uncertainty in r Ω (and ϵ Ω ) is typically much larger in simulations of BiRB experiments on gates with purely Hamiltonian errors, due to higher variance in the performance of circuits of the same depth for this kind of error (as is the case with other RB methods).To quantify any systematic differences between r Ω and ϵ Ω , in  The relative error δ rel = (r Ω −ϵ Ω ) /ϵ Ω , divided by its standard deviation (σ rel ), for each randomly sampled error model.For all error models, we find that r Ω is approximately equal to ϵ Ω , and all discrepancies between r Ω and ϵ Ω are consistent with finite sample fluctuations.Amplitude Damping Errors on All Qubits FIGURE 5. Simulations of BiRB with measurement errors.We simulated BiRB with five types of error models: (a,f) no error on the measurements, (b,g) bit flip errors on the measurements for all n qubits, (c,h) bit flip errors on the measurements for only a single qubit, (d,i) amplitude damping errors on the measurements for all n qubits, and (e,j) amplitude damping errors on the measurement for only a single qubit.
(a-e) The relative error in r Ω divided by its uncertainty ( δ rel/σ δ rel ) versus the strength of the measurement error.(f-j) Histograms of δ rel/σ δ rel for each type of measurement error.We observe no evidence that r Ω is affected by measurement error, which is consistent with our theory of BiRB and provides further evidence that BiRB is robust to measurement errors.
which is computed from 1σ uncertainties for r Ω and ϵ Ω .We see that r Ω is typically within 2σ of ϵ Ω for all three classes of error model.The distribution of δ rel is similar across all error models, suggesting that BiRB is similarly reliable for all three types of error model.Furthermore, we observe that r Ω does not systematically under-or overestimate ϵ Ω .This contrasts with the only other method for scalable RB of Clifford gates: MRB.Simulations and theory for MRB both show that MRB systematically underestimates ϵ Ω [1,21].Therefore, our results suggest that BiRB is more accurate than MRB (although note that, unlike BiRB, MRB can scalably benchmark non-Clifford gates).

B. Binary RB with measurement error
The simulations presented above (Section V A) did not include SPAM errors, but SPAM errors are often large in current quantum processors.Like other RB protocols, BiRB is designed to be robust to SPAM errors-the effect of SPAM errors is absorbed into a depth-independent prefactor in the exponential fit (see Section IV).Here, we present simulations that demonstrate the robustness of BiRB in the presence of SPAM errors.
We simulated BiRB on 1, 2, and 4 qubits with single-qubit bit flip and amplitude damping measurement errors.These BiRB simulations used the same layer set and sampling distribution as the simulations presented in Section V A. For these simulations, we simulated BiRB with error models in which the gates have both stochastic Pauli and Hamiltonian errors with rates sampled so that ϵ Ω is approximately the same for every error model (see Appendix D for details).From each set of gate error rates, we construct five error models, each of which has different measurement error.These five error models are: (1) no error on the measurements, (2) bit flip errors on the measurements for all n qubits, (3) bit flip errors on the measurements for only a single qubit, (4) amplitude damping errors on the measurements for all n qubits, and ( 5) amplitude damping errors on the measurement for only a single qubit.The measurement error rates on each qubit are chosen so that the expected measurement error rate is a constant p, which we varied over a range of values (see Appendix D for details).
Figure 5 shows the results of our simulations of BiRB with measurement errors.We see that the BiRB error rate is not systematically affected by bit flip or amplitude damping error.Figure 5 (a-e) shows the relative error (δ rel ) in r Ω , divided by its standard deviation (σ δ rel ), for all error models.We observe no systematic change in δ rel/σ δ rel as the strength of measurement error (p) is varied.Figure 5 (f-j) shows the distribution of δ rel/σ δ rel for all error models with each type of measurement error.We see that the distributions are similar for all types of measurement error.These simulations show no evidence that r Ω is affected by measurement error, which is consistent with our theory for BiRB.

VI. DEMONSTRATIONS ON IBM Q
In this section we present demonstrations of BiRB on 7and 27-superconducting qubit IBM Q devices.We provide experimental evidence that BiRB works by comparing it to two other RB protocols-DRB and MRB-that are designed to measure the same error rate.

A. Validating binary RB in the few-qubit regime
We ran two experiments comparing BiRB and DRB.We chose to compare the results of BiRB and DRB because (i) DRB is designed to measure the same error rate as BiRB, (ii) BiRB is equivalent to DRB when n = 1, and (iii) DRB theory [18,24] shows that DRB is a highly accurate method for estimating the average error rate (ϵ Ω ).In these BiRB and DRB experiments, each layer in the core circuit consists of randomly-sampled native cnot gates (i.e., cnot gates on connected qubits) and uniformly random single-qubit Clifford gates on all other qubits.We sampled the two-qubit gates using the "edgegrab" sampler from Ref. [2] with an expected two-qubit gate density of ξ = 1 /4.In our experiment on ibm perth, we sampled K = 30 circuits at exponentially spaced benchmark depths.In our experiment on ibm hanoi, we also ran CRB on up to n = 5 qubits, and we sampled K = 60 circuits at exponentially spaced benchmark depths.See Appendix E for further details.6(d)] are consistent with each other on all qubit subsets we tested [53] which is consistent with the theories for both DRB and BiRB.These results demonstrate that BiRB is a reliable method for measuring the average layer error rate.In Fig. 2(b) we also compare the BiRB error rate to an ad hoc heuristic estimate of the average layer error rate obtained by rescaling the results of CRB (see Appendix E for details).The rescaled CRB error rate is systematically higher than both the BiRB and DRB error rates.While rescaling CRB error rates to estimate native gate error rates is common practice [54][55][56][57], this is not typically accurate, as these results demonstrate.
Our results demonstrate that BiRB is more scalable than both DRB and CRB.Although DRB is more scalable than CRB [ Fig 2], the initial (i.e., d = 0) polarization of DRB circuits [Fig 6(c)] still drops off rapidly with increasing n, which is due to the O( n 2 /log n) gate overhead from the stabilizer state preparation and measurement subroutines [24].The decrease in initial polarization with n for BiRB circuits is much smaller (note that we expect some decrease in polarization with increased n due to increasing SPAM error).

B. Demonstrating the scalability of binary RB
To demonstrate that BiRB is reliable in the n ≫ 1 regime, where DRB is infeasible, we ran BiRB and MRB on all 27 qubits of ibmq kolkata.MRB is designed to measure the same error rate as BiRB (ϵ Ω ) and it is also scalable.However, the theory of MRB shows that it slightly but systematically underestimates ϵ Ω due to correlations between the random layers used in MRB circuits [1,21].MRB circuits consist of (1) a depth d /2 Ω-distributed random circuit, followed by (2) its layer-by-layer inverse, with Pauli frame randomization.MRB theory shows that if the error rates of a Ω-distributed layer and its inverse are uncorrelated, then MRB accurately estimates ϵ Ω , but that if these error rates are correlated then MRB slightly underestimates ϵ Ω .In real systems, these error rates are typically correlated.
In the BiRB and MRB circuits we ran, each randomly sampled layer in the core circuit has the form L = L 1 L 2 , where L 1 consists of single-qubit gates on all qubits, and L 2 consists of parallel cnot gates on pairs of connected qubits.We sampled The results of MRB and BiRB on 1-27 qubits on ibmq kolkata.(c) DRB's initial (i.e., d = 0) polarization decreases rapidly as a function of the number of qubits (n), due to the random stabilizer state preparation and measurement subroutines in DRB circuits whose size grows quickly with n [18].In contrast, the initial polarization in BiRB decreases slowly with increasing n.(d) The BiRB and DRB error rates (r Ω ) are consistent on all qubit subsets.As DRB is a robust technique that is designed to measure the same error rate as BiRB, this provides evidence of BiRB's reliability.(e) The BiRB and MRB error rates are consistent up to n = 11 qubits, but the BiRB error rate is systematically higher than the MRB error rate for n > 11.This is consistent with the theories of BiRB and MRB: MRB theory [1,21] predicts that MRB's r Ω slightly underestimates ϵ Ω (the average layer error rate) whereas BiRB theory predicts that BiRB's r Ω accurately estimates ϵ Ω .
the single-qubit gates in L 1 uniformly from the single-qubit Clifford gates, and we sampled the two-qubit gates in L 2 using the "edgegrab" sampler [2] with an expected two-qubit gate density of ξ = 1 /4.We ran circuits with exponentially spaced benchmark depths and sampled K = 60 circuits of each circuit shape.
Figure 6(b) shows the results of our BiRB and MRB experiments on six sets of qubits.Figure 6(e) compares the MRB and BiRB error rates for all sets of qubits we tested.The MRB and BiRB error rates are consistent on up to 11 qubits.For n > 11 qubits, the MRB error rate is systematically lower than the BiRB error rate.This result is consistent with the theory of MRB, which predicts that MRB's r Ω systematically underestimates ϵ Ω , and the theory of BiRB, which does not predict a systematic under-or overestimate of ϵ Ω .MRB theory predicts that MRB's underestimate of ϵ Ω is larger when the error rate of a layer and its inverse are highly correlated [1].We therefore conjecture that the observed discrepancy between the BiRB and MRB error rates is caused by high variance in the layer error rates in many-qubit circuits, which could occur due to, e.g., large crosstalk error caused by some two-qubit gates.The largest difference we observe between MRB and BiRB is in the n = 20 qubit experiments, where the BiRB error rate is r Ω ≈ 41.5% and the MRB error rate is r Ω ≈ 35.7%.This discrepancy is consistent with BiRB and MRB theory if the variance in the layers' error rates are sufficiently large.For example, a simple error model that leads to the observed BiRB and MRB error rates is one where each layer experiences purely global depolarizing error and half of the layers have 85% polarization whereas the other half of the layers have 32% polarization.

VII. DISCUSSION
In this paper, we introduced BiRB, a highly streamlined RB protocol for Clifford gate sets.Unlike most RB protocols, BiRB does not use motion reversal circuits.Instead, BiRB works by tracking a single random Pauli operator through each random circuit-using ideas first developed for DFE [30,31] and later leveraged by Pauli noise learning methods [40,41,43].This enables BiRB to scale to many more qubits than most RB methods.Many-qubit BiRB allows for benchmarking of large many-qubit layer sets when individually characterizing all those layers is infeasible, and it is able to accurately capture crosstalk.We have presented a theory for BiRB that proves that BiRB reliably estimates the average error rate of random layers under common assumptions used in RB theory (e.g., Markovian errors), and we have supported this theory with simulations and experimental demonstrations.Our results on IBM Q processors demonstrate BiRB error rates consistent with DRB error rates on up to 6 qubits, and they show that BiRB scales well beyond the limits of DRB and standard CRB.
BiRB enables RB on many more qubits than most existing RB protocols, but it also has advantages in the few-qubit setting.For example, simultaneous few-qubit RB experiments are widely used to quantify crosstalk errors [58], but simul-taneous CRB and DRB on n > 1 qubits are complicated in practice by scheduling problems that arise due to the variable depths of compiled subroutines [54].In contrast, simultaneous BiRB experiments are simple to run because the state preparation and measurement layers in BiRB circuits are each just a single layer of single-qubit gates.Finally, because BiRB does not rely on motion reversal, we anticipate that BiRB can be adapted to benchmark operations that are not intended to be unitary.In particular, in subsequent work we will show that BiRB can be adapted to benchmark gate sets containing mid-circuit measurements-a computational primitive that is essential for quantum error correction.

CODE AVAILABILITY
Circuit sampling and data analysis code for BiRB is available in pyGSTi [59].Data and code for the simulations and IBM Q demonstrations in this work are available upon reasonable request.
Clifford group BiRB circuit is given by Here, we will use S C 0 to denote the stabilizer group of U(C 0 )|0⟩ ⊗n .The expected polarization of a benchmark depth d circuit is Eq. (A7) follows from applying the definition of s ′ C [Eq. (A1)] to Eq. (A6), and Eq.(A8) follows from Eq. (A7) and the cyclic property of the trace.Furthermore, we have where Therefore, fd decays exponentially in circuit depth, at a rate determined by the fidelity of E. This implies that the Clifford group BiRB error rate is the same as the (standard) CRB error rate.
where S 1 , S 2 , . . ., S 4 k −1 denote the k-qubit stochastic Pauli error generators, and H 1 , H 2 , . . ., H 4 k −1 denote the k-qubit stochastic Pauli error generators.For each error model, we sample s i and h i at random to produce a range of expected layer error rates.To generate error models, we start with an overall error parameter p that determines the expected gate error rates in the model.We generate models with p ∈ [0, 0.01875] for 150 evenly-spaced values for the single-qubit models and p ∈ [0, 0.0750] for 150 evenly-spaced values for the 2-and 4-qubit models.We use p to determine the expected rates of stochastic and Hamiltonian errors.In the stochastic Pauli error models, we set h = 0 and s = 1.2p.In the Hamiltonian error models, we set s = 0 and h = 8p for n = 4 qubit models, and we set s = 0 and = 6p for n = 1, 2 qubit models.In the stochastic Pauli and Hamiltonian error models, we generate s ∈ [0, p] at random and set h = 2p − s.These sampling parameters are chosen to produce models with a similar range of per-qubit error rates across all error model types and all values of n.
We include qubit-dependent Hamiltonian errors and stochastic Pauli errors on each gate, with Hamiltonian error rates sampled in the range [0, χh], and stochastic Pauli error rates sampled in the range [0, χs], where χ = 0.1 if n = 2, 4 and k = 1 (i.e., singlequbit gate error rates are sampled so that their expected error rate is 1 /10 the error rate of 2-qubit gates) and χ = 1 otherwise.The stochastic and Hamiltonian errors are each split randomly across the 4 k − 1 error generators.
For each error model, we run K = 100 BiRB circuits at each depth d ∈ {0} ∪ {2 j | 0 ≤ j ≤ 8}.Each layer in the core circuit consists of randomly-sampled CNOT gates and uniformly random gates from the set {Xπ /2 , Yπ /2 , I} on all other qubits.We sampled the two-qubit gates using the "edgegrab" sampler from Ref. [2] with an expected two-qubit gate density of ξ = 1 /2.
We also approximate the average layer error rate ϵ Ω via sampling.We sample K = 100 Ω-distributed random circuits at each depth d ∈ {0} ∪ {2 j | 0 ≤ j ≤ 8} (using the same layer sampling as described above) and determine their polarization, then fit the resulting data to an exponential to obtain an estimate of ϵ Ω .

Binary RB with measurement errors
We simulated BiRB on n = 1, 2, 4 qubits with single-qubit bit flip and amplitude damping measurement error.These BiRB circuits used layers constructed from the gates {Xπ /2 , Yπ /2 , cnot}.In these simulations, we simulated BiRB with error models in which the gates have both stochastic Pauli and Hamiltonian errors.We generated 30 models with Hamiltonian and stochastic errors.Each error model had randomly-chosen error rates sampled so that the expected stochastic error rate was p /2 and the expected Hamiltonian error rate was √ p /2, and we set p = 0.015n.In our n = 2, 4 qubit simulations, we sampled the errors on single-qubit gates so that their expected error rates were approximately 1 /10 of the expected two-qubit gate error rate.
From each set of gate error rates, we construct five error models, each of which has different measurement error.These five error models are: (1) no error on the measurements, (2) bit flip errors on the measurements for all n qubits, (3) bit flip errors on the measurements for only a single qubit, (4) amplitude damping errors on the measurements for all n qubits, and (5) amplitude damping errors on the measurement for only a single qubit.We define our measurement error using the single-qubit elementary error generators S X , S Y , and A X,Y defined in Ref. [52], and an error strength parameter p m .In our bit flip error models, we add the error E = e p m S x immediately before the measurement.In our amplitude damping error models, we add the error E = e p m (S X +S Y +A X,Y ) immediately before measurement.In error models with measurement error on a single qubit, we generate error models with 60 evenly-spaced values of p m ∈ [0.0001, 0.09].In error models with measurement error on all qubits we sample a uniform random p m ∈ [0, 2p /n] independently for each qubit, for 60 evenly-spaced values of p ∈ [0.0001, 0.09], For each error model, we run K = 100 BiRB circuits at each depth d ∈ {0} ∪ {2 j | 1 ≤ j ≤ 8} using the same gate set and layer sampling distribution as in Appendix D 1.We approximate ϵ Ω via sampling using the method described in Appendix D 1.

FIGURE 4 .
FIGURE 4. Simulations of BiRB for gates with stochastic and Hamiltonian errors.We simulated BiRB on 1,2, and 4 qubits with randomlysampled error models.These error models consist of randomly-sampled (a,d) stochastic Pauli errors, (c,f) Hamiltonian errors, (b,e) stochastic and Hamiltonian errors.(a-c): We compare the estimated BiRB error rate r Ω to ϵ Ω .Error bars are 1σ and are calculated using a standard bootstrap (there are error bars on ϵ Ω , as well as r Ω , as ϵ Ω is estimated via sampling).(d-f):The relative error δ rel = (r Ω −ϵ Ω ) /ϵ Ω , divided by its standard deviation (σ rel ), for each randomly sampled error model.For all error models, we find that r Ω is approximately equal to ϵ Ω , and all discrepancies between r Ω and ϵ Ω are consistent with finite sample fluctuations.

Fig. 2 and
Fig 6(a)-(d) show the results of these demonstrations.In all of our BiRB experiments, we observe that the polarization decays exponentially.The DRB and BiRB error rates [Fig.2(b) and Fig.

FIGURE 6 .
FIGURE 6. BiRB on IBM Q processors.(a) The results of DRB and BiRB on 1-6 qubits on ibm perth.(b) The results of MRB and BiRB on 1-27 qubits on ibmq kolkata.(c) DRB's initial (i.e., d = 0) polarization decreases rapidly as a function of the number of qubits (n), due to the random stabilizer state preparation and measurement subroutines in DRB circuits whose size grows quickly with n[18].In contrast, the initial polarization in BiRB decreases slowly with increasing n.(d) The BiRB and DRB error rates (r Ω ) are consistent on all qubit subsets.As DRB is a robust technique that is designed to measure the same error rate as BiRB, this provides evidence of BiRB's reliability.(e) The BiRB and MRB error rates are consistent up to n = 11 qubits, but the BiRB error rate is systematically higher than the MRB error rate for n > 11.This is consistent with the theories of BiRB and MRB: MRB theory[1,21] predicts that MRB's r Ω slightly underestimates ϵ Ω (the average layer error rate) whereas BiRB theory predicts that BiRB's r Ω accurately estimates ϵ Ω .

TABLE I .
], ibmq kolkata [TableIIIand IV], ibm hanoi [TableVand VI].BiRB and DRB on IBM Perth The RB error rates from every BiRB and DRB experiment we ran on ibm perth.

TABLE II .
IBM Perth calibration data.Calibration data from ibm perth from the time of our BiRB demonstrations.

TABLE IV .
IBMQ Kolkata calibration data.Calibration data from ibmq kolkata from the time of our BiRB demonstrations.

TABLE VI .
IBM Hanoi calibration data.Calibration data from ibm hanoi from the time of our BiRB demonstrations.