Coherence in logical quantum channels

We study the effectiveness of quantum error correction against coherent noise. Coherent errors (for example, unitary noise) can interfere constructively, so that in some cases the average infidelity of a quantum circuit subjected to coherent errors may increase quadratically with the circuit size; in contrast, when errors are incoherent (for example, depolarizing noise), the average infidelity increases at worst linearly with circuit size. We consider the performance of quantum stabilizer codes against a noise model in which a unitary rotation is applied to each qubit, where the axes and angles of rotation are nearly the same for all qubits. In particular, we show that for the toric code subject to such independent coherent noise, and for minimal-weight decoding, the logical channel after error correction becomes increasingly incoherent as the length of the code increases, provided the noise strength decays inversely with the code distance. A similar conclusion holds for weakly correlated coherent noise. Our methods can also be used for analyzing the performance of other codes and fault-tolerant protocols against coherent noise. However, our result does not show that the coherence of the logical channel is suppressed in the more physically relevant case where the noise strength is held constant as the code block grows, and we recount the difficulties that prevented us from extending the result to that case. Nevertheless our work supports the idea that fault-tolerant quantum computing schemes will work effectively against coherent noise, providing encouraging news for quantum hardware builders who worry about the damaging effects of control errors and coherent interactions with the environment.

Although there is no rigorous proof, much evidence supports the widely held belief that an ideal noiseless quantum computer would be able to solve problems that are intractable for classical digital computers. But in the real world, quantum computers are noisy. We therefore expect that quantum error correction will be needed to overcome the noise and reliably operate a large-scale quantum computer that can solve hard problems. Fortunately, the accuracy threshold theorem for quantum computation establishes that quantum computing is scalable, assuming that the noise is neither too strong nor too strongly correlated [1,2,3,4,5]. Until we try it on a real device, though, we won't know for sure whether realistic noise is sufficiently benign for quantum error correction to work effectively. A general noise channel acting on n qubits is extremely complex when n is large, so it will not be practical to fully characterize the noise in a complex quantum device using any feasible experimental protocol. A commonly used metric for the performance of single-qubit and two-qubit quantum gates is the "average infidelity" r = 1 − F , where F is the fidelity of the output from the gate relative to the output of an ideal gate, averaged uniformly over all possible input states. This quantity r has the great virtue that it can be feasibly measured using randomized benchmarking [6,7], but as a characterization of the noise strength it has shortcomings. Assuming an uncorrelated noise model, threshold theorems guarantee scalability if a different metric, the diamond distance D , is less than a critical value. Here D denotes the deviation of the noisy gate from the ideal gate as measured by the diamond norm. For an incoherent noise channel like a Pauli channel, the diamond distance D is equal to the average infidelity r; in contrast, for a highly coherent channel, D scales like the square root of r. If we know only r, and have no information about the coherence of the noise, we cannot estimate D accurately, and therefore cannot easily make sound predictions about how effectively any error-correcting code will combat the noise [8,9,10]. The situation is even worse for correlated noise models.
Our purpose in this paper is to study further how well quantum error correction performs against coherent noise models. To make our analysis manageable, we will make some simplifying assumptions. For one, we will not actually consider quantum computation, but instead will focus on the easier task of operating a quantum memory. We envision encoding a quantum state in the memory using a quantum code; after the encoding step the memory is subjected to noise, and then the quantum state is decoded. As a further simplification, we will assume that the encoding and decoding are noiseless. Therefore, the performance of the code against the noise is captured by a logical channel, the result of composing the encoding channel, noise channel, and decoding channel.
We will be interested in what happens to a quantum state which is stored in the memory for a long time, and undergoes many rounds of error correction -that is, we want to characterize the effect of applying the logical channel many times in succession. For this purpose, we will need to understand the coherence properties of the logical channel. If the logical channel is incoherent, then the diamond distance of the decoded state from the ideal state grows linearly with the number of channel repetitions, while for an highly coherent logical channel, it can grow quadratically. Our main conclusion is that, even if the physical noise acting on the quantum memory is highly coherent, the coherence of the logical channel becomes strongly suppressed as the block length of the quantum error-correcting code increases, assuming that the noise is sufficiently weak and sufficiently weakly correlated.
Although we can analyze the logical channel only in a simplified setting, and only for particular code families, we believe that the lessons learned apply more broadly. We expect, for example, that randomized benchmarking applied to logical gates will accurately characterize logical noise even when the physical noise is highly coherent, at least for large code blocks. This also suggests that for concatenated coding schemes, in which the "physical" qubits of a higher-level code are themselves the logical qubits of a lower-level code, the average infidelity of the lower-level code should be a good predictor for the performance of the higher-level code.
Our main conclusion is not unanticipated [4], as the suppression of coherence in the logical channel has an intuitive explanation. To decode, one measures the error syndrome, and then applies a recovery operation conditioned on the syndrome. For a large code, many different syndromes are possible, and only the errors which are projected onto the same syndrome value can interfere constructively, while errors projected onto different syndrome values add stochastically. The stochastic average over many syndrome sectors suppresses coherence, leaving only small residual coherent effects arising from summing coherently over errors which are projected onto a given syndrome sector. That said, carefully analyzing the residual coherence in the logical channel involves daunting combinatorics. It turns out that further cancellations occur, resulting in even stronger suppression of logical coherence than might be naively expected.
This discussion about averaging over all syndrome sectors highlights an important issue. We will consider the logical channel obtained by averaging over error syndromes, and then study the coherence of the resulting channel. One could make a case for an alternative procedure: define a metric that characterizes coherence, evaluate that metric for the logical channel conditioned on each syndrome, and then average the value of the metric over syndromes by weighting each syndrome with its probability. To argue in favor of this alternative procedure one might note that the experimentalist who executes the error correction protocol could know the syndrome she measures in each run of the protocol, and might be interested in the properties of the logical channel conditioned on that knowledge [11]. Our view is that properties of logical channels conditioned on the syndrome are potentially of interest for near-term experiments using relatively small codes, particularly because it might be feasible to postselect by retaining favorable syndromes and rejecting unfavorable ones. In future experiments using larger codes, though, syndrome histories will be quite complex, and it will be impractical to make useful inferences about the logical channel conditioned on syndrome information. For long computations using large codes, properties of the logical channel averaged over syndromes will most likely provide more usable guidance regarding the features of the protected quantum computation.
We should also note that methods have been proposed to suppress the coherence of physical noise. One such method is randomized compiling, which, under certain assumptions, can transform any single-qubit noise channel into an incoherent depolarizing channel [12].
The assumptions include a Markovian noise model, and gate independence of the noise for the "easy" gates in the scheme. These assumptions may hold to a good approximation for some realistic cases, but they will not hold exactly. We may then ask how the residual coherence is affected by error correction, an issue that can be addressed using the methods in this paper.
Here we investigate the coherence of the logical channel in the case where the physical noise is fully coherent unitary noise, a problem that has been previously studied [13,14,15]. Our work improves on these past results in that we consider a family of codes with an accuracy threshold (toric codes without boundaries) and prove bounds on the logical coherence which apply in the limit of a large code block. By specializing to a particular code family, we also find better bounds on the logical coherence for finite code length. Other authors have obtained numerical results for sufficiently small codes in the case where all physical qubits are rotated about a fixed axis [16,17,18], including analyses of logical channels conditioned on particular error syndromes [11]. We focus instead on investigating asymptotic properties for large codes, using analytic methods.
In our analysis we make extensive use of the chi-matrix formalism for describing quantum channels. The chi matrix arises when the action of a channel on an input density operator is expanded in terms of Pauli operators (tensor products of 2 × 2 Pauli matrices) acting on the density operator from the left and from the right. A channel can be expressed as the sum of an "incoherent part" in which the Pauli operators on left and right are equal, and a "coherent part" in which the Pauli operators on left and right are distinct. Our main task will be to infer, in the case of stabilizer codes, how the logical chi matrix which describes the logical channel after error correction is related to the physical chi matrix which describes the noise acting on physical qubits.
Specifically, we study the logical channel for the toric code on an L × L lattice where L is large, and where error correction is carried out using minimal-weight decoding. We estimate the coherent component of the logical chi matrix up to order L + 2k in the rotation angle θ, where k is any L-independent constant, and relate this coherent component to the incoherent component of the logical channel. Our main theorem states that the strength of the coherent part of the logical channel is bounded above by strength of the incoherent part times a factor of 1/θ. (Here θ is the rotation angle applied to each of the physical qubits -our result also holds for rotation angles and axes that vary somewhat from qubit to qubit.) From this statement, we may infer that when the logical channel is applied m times in succession the average infidelity grows linearly with m. (There is a small contribution to the infidelity that grows quadratically with m, but this contribution is highly suppressed by a factor that scales as L −L .) Stated differently, our result says that after m applications of the logical channel, the accumulated distance from the identity channel, as measured by the diamond norm, grows linearly with m, apart from a correction which is negligible for large L. We emphasize that to reach this conclusion we assumed that the rotation angle θ scales with the block size as 1/L. Therefore, unfortunately, we are not able to make a definitive statement about the coherence of the logical channel in the more physically relevant case where L becomes large with θ fixed; the combinatoric task required exceeded our ability.
A related conclusion holds for a broad class of correlated noise models. We provide a detailed analysis of correlated noise for the simpler case of the quantum repetition code, under the assumption that the noise Hamiltonian commutes with the Pauli operator X acting on each qubit, so that the repetition code provides effective protection against the noise model. In a model in which the rotations acting on pairs of qubits are strongly correlated, we find as expected that the correlations significantly enhance the probability of an uncorrectable logical error. However, the correlations enhance the coherent and incoherent parts of the logical chi matrix by comparable factors. Therefore, our conclusion that the coherence of the logical channel is heavily suppressed in the limit of large code length continues to apply despite the strong pairwise correlations in the noise.
The rest of this paper is organized as follows. Sec. 2 is a self-contained review of quantum channels, emphasizing metrics for characterizing coherence and relations among them. In particular, we prove a relationship between the chi matrix and the Pauli transfer matrix which had not been previously discussed to our knowledge. In Sec. 3 we compute the logical channel for the repetition code assuming independent unitary noise, finding that the coherence of the logical channel becomes strongly suppressed as the code length increases. Then in Sec. 4 we analyze the repetition code again, this time using the chi-matrix formalism; we find that this analysis can be extended more easily to other stabilizer codes and other noise models. We consider the performance of the repetition code against two-body correlated noise in Sec. 5, again concluding that the logical noise becomes incoherent in the limit of large code length.
The heart of the paper is Sec. 6, where we build on lessons learned from the analysis of the repetition code to prove our main result, which asserts that, for an independent unitary noise model, the coherence of the logical channel is strongly suppressed by the toric code when the code block is large, assuming that the noise strength scales like 1/L. The proof mainly consists of a combinatoric analysis which allows us to bound the coherent and incoherent components of the logical chi matrix. We have divided the proof into a series of lemmas; Fig. 1 indicates how these lemmas fit together to build our main theorem. The subsections of Sec. 6 provide further guidance concerning the proof strategy, and various details are contained in the appendices. Furthermore, our analysis of two-body correlated noise in the repetition code can be extended to the toric code assuming the noise is sufficiently weak for error correction to succeed with high probability; we therefore conclude that the coherence of the logical channel is highly suppressed even in the case of strongly correlated two-body noise.
Sec. 7 contains our conclusions. There we recount some of the obstacles that prevented us from extending our main theorem to the more physically relevant case where the noise strength is a constant independent of L.
A Guide to the Proof Theorem 3: Coherence is suppressed in the toric code.

Pauli transfer matrix
We will use the Pauli transfer matrix representation to describe channels acting on n qubits. For this purpose we expand the density operator ρ in the Pauli operator basis {σ i }: where and σ 0 = (id)/d. Here d = 2 n is the Hilbert-space dimension, and id denotes the d × d identity matrix. Note that Tr(ρ) = ρ 0 . A linear map N acting on density operators defines a d 2 × d 2 matrix (the Pauli transfer matrix associated with N ) according to This matrix is real if N maps Hermitian operators to Hermitian operators. If the map N is trace preserving, then i N 0i ρ i = ρ 0 ; hence N 0i = δ 0i . If the map N is unital (that is, N (id) = id), then i N ij δ j0 = δ i0 ; hence N i0 = δ i0 . Thus the matrix representing the map N may be expressed as We say that the (d 2 − 1) × (d 2 − 1) matrix N u is the unital part of N and that the length-(d 2 − 1) vector N n is its nonunital part. Altogether the trace-preserving map N is specified by d 2 (d 2 − 1) parameters. For a unitary map N (ρ) = U ρU † , we have N n = 0 and (for i = 0) where hence N u is an orthogonal matrix. The matrix representing N is diagonal if and only if the map is a convex sum of Pauli operators in which case the diagonal entries are where σ i σ j = ξ ij σ j σ i ; that is, ξ ij is the sign ±1 determined by whether the Pauli operators σ i and σ j commute or anticommute.

Average infidelity
The fidelity F of a channel N acting on a pure state |ψ is defined by and 1 − F is called the infidelity. The average infidelity r of N is where the integral is with respect to the normalized invariant Haar measure on the unitary group, and ρ is any pure state. Equivalently, r is the infidelity of the averaged channel We may just as well define r as the infidelity of N averaged over a unitary 2-design. Hence r can be measured in randomized benchmarking experiments, in which U is chosen by sampling uniformly from the Clifford group, which is a unitary 2-design. The d × d unitary matrix U defines an orthogonal (d 2 − 1) × (d 2 − 1) matrix N u = O according to where O T denotes the transpose of O; therefore The uniform average of U over the unitary group becomes a uniform average of O over the orthogonal group. The nonunital part of N averages to zero, and the average of the unital part can be evaluated using which yields N u ij = Tr(N u ) Hence, the averaged channel is a completely depolarizing Pauli channel of the form where p = 1 d 2 − 1 Tr(N u ). (17) Note that if this averaged channel is applied m times in succession, we obtain N m (ρ) = p m ρ + (1 − p m ) id d ; (18) thus p is called the benchmarking parameter because it determines the rate of exponential decay of fidelity in benchmarking experiments. The average infidelity r is given by (19) for any pure state |ψ . Here I d 2 −1 denotes the (d 2 − 1) × (d 2 − 1) identity matrix. Because N 00 = 1, we may also express the infidelity as where I d 2 denotes the d 2 × d 2 identity.

Depolarizing channel
We have seen that if N p is the depolarizing channel with benchmarking parameter p, then (N p ) m = N p m . Using the relation r = d−1 d (1 − p), we can express the infidelity r m of (N p ) m in terms of the infidelity r of N p , finding If mr is small, the infidelity accumulates linearly with m, the number of times the channel is applied. A similar remark applies to more general Pauli channels. We say that a channel with this property is incoherent. The interpretation is that (up to a constant factor), the infidelity r may be regarded as a probability of error. If the channel is applied m times, where mr is small, any one of the m instances of the channel could be faulty, so that the total probability of error is mr + higher-order terms.

Qubit rotation
In contrast, consider the case of a unitary rotation of a single qubit about the x-axis which rotates the Bloch sphere by θ. For this channel the Pauli transfer matrix is therefore, the infidelity is Applying this channel m times, we obtain N (θ) m = N (mθ), a rotation by an angle m times larger. Therefore, Here, for m 2 r small, the infidelity accumulates quadratically with m; it is the rotation angle, rather than the error probability, that increases linearly. We say that a channel like this one, for which the infidelity increases faster than linearly with m, is coherent.

Rotation/Dephasing channels
The distinction between a coherent and incoherent channel is not always clearcut, and we will need measures that quantify the degree of coherence. As an example, consider the case where a qubit either dephases in the x-basis (with probability q D ) or is rotated by angle θ about the x-axis (with probability q R ): The Pauli transfer matrix is where I is the 2 × 2 identity, and M is the 2 × 2 matrix The infidelity is The eigenvalues of M are and therefore the infidelity of N m is Here the degree of coherence depends on the relative value of and δ. In the case of a unitary rotation, we have = O(δ 2 ), which means that the term growing quadratically with m can dominate. On the other hand, for ≥ δ, there is no quadratically growing term at all. A generalization of this channel will be useful in Sec. 3. Instead of a single rotation by θ occurring with probability q R , we may consider an ensemble of possible rotations, where a rotation by θ a occurs with probability q a . In that case r m is still given by Eq.(32), but now

Unitarity and the coherence angle
We have seen that N u is an orthogonal matrix if (and only if) the channel N is unitary. Hence a deviation from orthogonality of N u indicates a deviation from unitarity of N . With that in mind, following [10] we define the unitarity u(N ) of the channel N as which is 1 for unitary channels and strictly less than 1 for nonunitary channels. For a fixed value of the infidelity r, the unitarity achieves its minimum for the depolarizing channel [19], where The unitarity u and the benchmarking parameter p together provide a useful characterization of the coherence of a channel. We will be primarily interested in the case where the infidelity r is small, so that the diagonal elements {N ii } of the Pauli transfer matrix are close to one, and it makes sense to expand in the small quantity 1 − N ii . Writing we see that Expanding the square root of u we find where the ellipsis indicates terms that are fourth order in the off-diagonal entries (N u ) ij and terms that are quadratic order in (1 The coherence angle Θ is defined as which for p and u close to one, can be expressed as Apart from a normalization factor, and neglecting the higher-order terms, Θ 2 is the sum of squares of all off-diagonal terms in N u . It quantifies the coherence in the channel. For the qubit rotation channel Eq.(23), the coherence angle is related to the rotation angle θ by For the dephasing/rotation qubit channel in Eq.(28), our truncated power series expansion used to derive Eq.(40) is justified if is negligible compared to δ, in which case we find For the depolarizing channel, u = p 2 and hence Θ = 0. In [20], Carignan-Dugas et al. derived a bound on r m , the infidelity when a unital channel N is applied m times in succession, in terms of the infidelity r and coherence angle Θ of N : where the ellipsis indicates terms higher order in r and Θ 2 . In this sense (for unital channels), the coherence angle controls the quadratic growth of r m as a function of m, when r and Θ 2 are small.

Diamond distance
In some versions of the quantum accuracy threshold theorem, the strength of Markovian noise is characterized by the deviation of a noisy gate from the corresponding ideal gate in the diamond norm [21]. This diamond norm deviation is useful for quantifying the damage inflicted when the noisy gate acts on qubits which are entangled with other qubits in a quantum computer. The diamond norm |E of a linear map E is defined as the L 1 norm of the extended map E ⊗ I: If E acts on Hilbert space H with dimension d, then I denotes the identity acting on another Hilbert space H with dimension d; the maximum is over all density operators on H ⊗ H . A measure of noise strength for a noisy channel N is the diamond distance of N from the identity channel, If N is applied m times in succession, we have Upper and lower bounds on the diamond distance can be expressed in terms of the benchmarking parameter p(N ) = 1 − r(N )d/(d − 1) and the unitarity u(N ) [9]: where For the depolarizing channel, we have u = p 2 and f = 1 − p = rd/(d − 1); the diamond distance scales linearly with the infidelity r. But for a unitary channel, we have u = 1 and f = 2(1 − p); then the diamond distance scales like √ r. From Eq.(37), we see that which together with Eq.(47) provides upper and lower bounds on the diamond distance written in terms of of Pauli transfer matrix elements: We will be mostly interested in the upper bound on the diamond distance for a logical channel with a fixed number of encoded qubits; therefore the unfavorable scaling of the upper bound with the dimension d need not cause us great concern.

Coherence in the chi-matrix representation
The Pauli transfer matrix representation is useful for proving the preceding relationships between channel components, the growth of average infidelity, and the dependence of the diamond distance from identity on the average infidelity. When we analyze error correction, we will make use of a different representation of the noise channel. Any channel N has an expansion in terms of Pauli operators. Consider a completely positive map N with Kraus operators {K α }, and expand each K α as where all Pauli operators {σ i } are chosen to be Hermitian, and the {c αi } are complex numbers. Then where This is called the chi-matrix representation of the channel. The map N is trace preserving if and unital if i,j Note that σ i σ k σ j = ±σ k if and only if i = j; therefore, in the Pauli transfer matrix language, the terms in Eq.(52) with i = j contribute to the diagonal entries in N ab , while the terms with i = j contribute to the off-diagonal entries.
To be more concrete, consider the single-qubit rotation about the x-axis U X (θ) = exp( −iθσ X /2 , for which hence More generally, for the channel with Pauli transfer matrix as in eq.(28), we have There is a simple general relationship between the off-diagonal entries of the Pauli transfer matrix N ab and the chi matrix χ ij , namely Lemma 1. The off-diagonal elements of the Pauli transfer matrix N ab and the chi matrix χ ij are related by a,b|a =b where d = 2 n is the Hilbert space dimension.
Because of this identity, we may quantify the coherence of a channel using the off-diagonal entries in either N ab or χ ij . The case d = 2 is explained explicitly in Appendix A.
Proof. To prove the claim, note that, for any Hermitian Pauli operators σ i , σ j , σ a , we have for some Hermitian Pauli operator σ b and some phase η ab ij . By taking Hermitian adjoints of both sides, we also have The phase is η ab ij = ±1 if σ i σ a σ j is Hermitian, and it is η ab ij = ±i if σ i σ a σ j is anti-Hermitian. Furthermore, for each fixed i = j, as σ a ranges over the d 2 Hermitian Pauli operators, σ i σ a σ j is Hermitian for d 2 /2 choices of σ a , and anti-Hermitian for the remaining d 2 /2 choices. (If σ i and σ j commute, then σ i σ a σ j is Hermitian if and only if σ a commutes with σ j σ i . If σ i and σ j anticommute, then σ i σ a σ j is Hermitian if and only if σ a anticommutes with σ j σ i .) The entries in the Pauli transfer matrix are (for a = b).
where the sum is restricted to {i, j} such that Suppose now that, for fixed i, j, we collect all the terms in a =b N 2 ab which are quadratic in {χ ij , χ ji }. Because σ i σ a σ j is Hermitian for half the choices of σ a and anti-Hermitian for half the choices, we have where we have used χ ij = χ * ji , which is required by complete positivity.
To complete the proof of the claim, we must verify that all the multilinear terms of the form χ ij χ kl (where {i, j} and {k, l} are disjoint) cancel in the sum a =b N 2 ab . Such a cross term of the form η ab ij η ab kl χ ij χ kl (65) We will consider all such terms with i, j, k, l fixed, as we vary σ a and σ b over the possible Hermitian Pauli operators. Multiplying both sides on the left by Hermitian Pauli operator σ c we obtain Given a standard sign choice for the d 2 Hermitian Pauli operators, we may write here e.g. φ a ca is a phase, which is ±1 if σ a and σ c commute and ±i if σ a and σ c anticommute. We also have here ξ ic = ±1 is a sign indicating whether σ c and σ i commute or anticommute. Therefore and the corresponding cross term arising from N 2 a b is Now suppose that either σ c commutes with both σ a and σ b , or anticommutes with both; in either case φ a * ca φ b cb 2 = 1. As we vary σ c over the d 2 /2 Pauli operators with this property, the sign ξ ic ξ kc has the value +1 for the d 2 /4 choices of σ c such that σ c commutes with both σ i and σ k or anticommutes with both, while ξ ic ξ kc has the value −1 for the d 2 /4 choices of σ c such that σ c commutes with one of σ i and σ k and anticommutes with the other. Therefore, as we vary a and b over these d 2 /2 possible choices for σ c , with i, j, k, l fixed, the cross terms cancel.
Alternatively, suppose that σ c commutes with one of σ a and σ b and anticommutes with the other; then φ a * ca φ b cb 2 = −1. Again, as we vary a and b over the d 2 /2 possible choices for σ c , with i, j, k, l fixed, ξ ic ξ kc = +1 for half of the terms and ξ ic ξ kc = −1 for the other half; therefore the cross terms cancel. This completes the proof.

Logical channel for the repetition code
From now on we will use the streamlined notation for single-qubit Pauli operators: Consider the repetition code, which protects one logical qubit against bit flip (X) errors, but provides no protection against phase (Z) errors. Let us analyze how well this code protects against coherent errors, in which each physical qubit in the code block rotates about the x-axis. Similar calculations were carried out in [13,14]. Understanding this example will prepare us for an analysis of more general stabilizer codes. To be as concrete as possible, we will start with the simplest interesting case, the 3-qubit repetition code spanned by |000 and |111 . Our goal is to determine the logical channel that results when rotation errors applied to the physical qubits are followed by error correction. We will assume for now that the same rotation is applied to each of the three qubits; this will be generalized later.
Suppose that each physical qubit is subjected to the unitary rotation thus the product unitary map applied to the three physical qubits is To perform error correction we measure the operators ZZI and IZZ to obtain two syndrome bits. If the syndrome is trivial (both measurements yield +1) no further action is required. If the syndrome is nontrivial, X is applied to one of the three qubits, returning the state to the code space. Thus the terms in the expansion Eq.(74) with weight 0 or 1 (where the weight is the number of X's) are error corrected to the logical operatorĪ = III, while terms with weight 2 or 3 are error corrected to the logical operatorX = XXX. We conclude that the logical channel N L is a convex combination of two unitary transformations, where A logical rotation by θ 0 is applied when the syndrome is trivial (weight 0), and a logical rotation by θ 1 is applied when the syndrome is nontrivial (weight 1). The logical channel has the form specified in Eq. (28), where These expressions for and δ can be simplified using trigonometric identities. In terms of s/c = t = tan θ/2, we have (78) therefore we find Expanding to leading order for small θ, we have Here, because is higher order in θ than δ, Eq.(40) applies, and therefore the coherence angle is From Eq.(32), we see that if this logical channel N L is applied m times, the infidelity becomes Note that the term quadratic in m actually matches the upper bound in Eq.(43). Eq.(82) reveals that the coherence of the logical channel is somewhat suppressed, as it takes a number of repetitions m = O(θ −2 ) for the quadratically growing contribution to r to "catch up" with the dominant linear term. Now let's do a similar analysis for the length-n repetition code (where n is odd), which corrects up to (n−1)/2 bit-flip errors. In this case the logical channel is a convex combination of (n + 1)/2 unitary rotations, where w ranging from 0 to (n − 1)/2 indicates the weight of a correctable X error occurring in the expansion of (c − isX) ⊗n . When the (n−1)-bit syndrome is measured, syndromes pointing to a weight-w error occur with total probability and the logical rotation angle conditioned on a weight-w syndrome is Summing over the weight of the syndrome we find In Appendix B we use Stirling's approximation to evaluate the sum in the expression for . Applying Stirling's approximation to our expression for δ as well, we have proven Theorem 1. Consider the length-n repetition code which protects against bit flip (X) errors, subject to the independent unitary noise map U = ((cos θ/2) I − i (sin θ/2) X) ⊗n , where sin 2 θ/2 < 1/2. Let N L (ρ) = R U ρU † be the logical map, where ρ is a code state and R decodes using majority voting. Then N L has Pauli transfer matrix N of the form given in Eq. (27) and Eq. (28), with and δ given by Therefore, using Eq.(32) and approximations that are well justified (according to Theorem 1) when n is large and sin 2 θ/2 < 1/2, we can estimate the infidelity when the logical channel is applied m times is succession, finding The scaling of the infidelity r = O(θ n+1 ) arises because a bit flip error must have weight at least w = (n + 1)/2 to cause a logical error. The scaling O(θ 2n ) of the term quadratic in m indicates that the coherence of the logical channel is suppressed when θ is small. It takes m ≈ √ 2πn/θ n−1 successive applications of the logical channel N L for the quadratic term in r m to become comparable to the linear term. This suppression arises because larger logical rotations occur with only smaller probability; for example a logical rotation by θ occurs with probability O(θ n−1 ).
Keeping only the leading-order terms in Eq.(86) we obtain generalizing Eq.(80). We derived the relationship using the identity which can be proved by induction. For drawing the conclusion that θδ/ is bounded above by an n-independent constant, the oscillating minus sign in this expression is importantif not for the oscillating sign, the sum would be 2 n−1 , hence larger than Eq.(91) by a factor which scales like √ n. This would mean that average infidelity r m in Eq.(32) would have a large quadratic component relative to the linear component as the code length n becomes large. In other words, the logical noise channel would have significant coherence.

Repetition code revisited
In this section we will compute the logical channel for the repetition code using a different method than in Sec. 3. This new method can be extended more easily to general stabilizer codes.

Stabilizer formalism
We now briefly review the structure of stabilizer codes, as this will be used in our analysis. Let {g α , α = 1, 2, . . . , n−k} denote the n−k stabilizer generators for an [[n, k, d]] stabilizer code. These generators are mutually commuting Hermitian Pauli operators such that g 2 α = I. The syndrome s(σ i ) of Pauli operator σ i is a length-(n − k) binary vector such that s( Note that the syndrome of a product of Pauli operators is additive: s(σ i σ j ) α = σ i α + σ j α , where the addition is modulo 2.
The code space is the simultaneous eigenstate with eigenvalue 1 of all the stabilizer generators. If |ψ is a pure state in the code space, then Therefore, the syndrome of σ i can be identified by measuring all of the stabilizer generators.
Hence we may say that s (σ i ) is the syndrome of the state σ i |ψ . A Pauli operator that commutes with the stabilizer generators preserves the code space and is said to be logical. We may define a complete set of orthogonal projectors {Π s } on the n-qubit Hilbert space, where Π s projects onto the subspace with syndrome s. Then An encoded density operatorρ (one supported on the code space) has the property where s = 0 denotes the trivial syndrome.
To construct the error recovery map R, we first perform an orthogonal measurement to identify the syndrome s. Then, for each syndrome s, a particular Pauli operator E † s is applied, which returns the measured state to the code space; therefore, One says that E s is the standard error associated with the syndrome s. In the case of minimal-weight decoding, E s is chosen to be a minimal-weight Pauli operator with syndrome s. By the weight w(σ) of the n-qubit Pauli operator σ, we mean the number of qubits to which a nontrivial Pauli matrix X, Y , or Z is applied, while I is applied to the remaining n − w qubits. By summing over all values of the syndromes s to construct the error recovery channel we are averaging over all the possible outcomes of the syndrome measurement, with each syndrome weighted by its probability. We discussed in the introduction how to justify performing this average when computing the logical channel.

Recovery in the chi-matrix representation
For any such noise channel N acting on an encoded density operatorρ, we would like to find the error corrected map R • N (ρ). Using the chi representation of the noise channel, it evidently suffices to compute R σ iσk σ j for each pair of physical Pauli operators σ i , σ j and each logical Pauli operatorσ k . Because the syndrome is additive, we have if P t is any physical Pauli operator with syndrome t, and therefore That is, only the terms for which σ i and σ j have the same syndrome survive when the error recovery map is applied. This property will be crucial in our analysis of the logical channel. Now let's understand the action of R in more detail. An [[n, k, d]] stabilizer code has 4 k logical Pauli operators. The physical Pauli operator L representing a logical Pauli operator is not unique, because L and LG act in the same way on the code space, where G is any element of the stabilizer group. But let us by convention choose standard physical operators {L a , a = 0, 1, 2, . . . 4 k − 1} representing each of the logical Pauli operators. Since we have also assigned a standard error operator E s to each syndrome s, any Hermitian Pauli operator has a unique decomposition of the form where G x is an element of the stabilizer group, and η sax is a phase. Since there are 2 n−k stabilizer group elements (up to phases), 2 n−k distinct syndromes, and 2 2k logical Pauli operators, we see that this decomposition accounts for all 4 n physical Pauli operators. We conclude that ifρ is an encoded density operator, then where we have used the property that σ(s , a , x ) is Hermitian. In the logical channel, the terms with L a = L a are incoherent -they contribute to the on-diagonal elements of the logical Pauli transfer matrix. The terms with L a = L a are coherent -they contribute to the off-diagonal elements. When the noise channel N is weak, the dominant terms in the chi-matrix expansion Eq.(52) are those such that σ i σ j has minimal weight, and we have also seen that the only terms that survive when the recovery map is applied are those such that σ i σ j is a logical operator (has trivial syndrome). Now let's suppose that the code distance is d and that minimal-weight decoding is performed. This means that we choose E s such that L a = I (up to multiplication by an element of the stabilizer) whenever σ(s, a, x) has weight no larger than (d − 1)/2, assuming d is odd.
To get a contribution to the incoherent part of the logical channel, we will need both σ i and σ j to have weight at least (d + 1)/2, so that the total weight must be at least d + 1. In that case it is possible for both σ i and σ j to be error corrected to a nontrivial logical operator. But there are also weight-d contributions to the coherent part of the logical channel, arising from the terms in which w(σ i ) + w(σ j ) = d, where w(σ) denotes the weight of Pauli operator σ. In that case, one of the two Pauli operators has weight less than or equal to (d − 1)/2, hence is error corrected to the identity, while the other has weight greater than or equal to (d + 1)/2, hence is corrected to a nontrivial logical operator L. The resulting term in the logical channel is either Lρ orρL (up to a phase), depending on whether σ i or σ j has higher weight.
If we choose the standard errors {E s } differently, then the action of the recovery operator may be modified. But it is evident from Eq.(101) that if we make the replacement E s → E s = φ s E s G y , where G y is an element of the stabilizer and φ s is a phase, then R (σρσ ) is not changed. In particular, when we perform minimal-weight decoding, there may be more than one minimal-weight Pauli operator with syndrome s, so that the choice of E s is ambiguous. However, as long as any two minimal-weight Pauli operators E s and E s with syndrome s have the property that E † s E s is an element of the code stabilizer, then the logical channel will not depend on how the minimal-weight standard errors are chosen. This will certainly be the case if the code distance is d and the standard errors have weight not larger than (d − 1)/2, since then E † s E s has weight at most d − 1 and cannot be a nontrivial logical operator.

Analysis of repetition code using the chi-matrix formalism
To illustrate this method, we return to the length-3 repetition code, where the noise channel is as in Eq.(74). We write out the chi-matrix expansion Eq.(52) of N (ρ), and then apply the recovery operator R to find the logical channel N L = R • N . The task of applying R is simplified by the observation that, if the state ρ is supported on the code space, then R annihilates all terms in which σ i σ j is not logical. We may write where N null is the sum of terms such that σ i σ j is not logical (hence R • N null = 0 acting on encoded density operators), N incoh is the sum of term such that σ i σ j is the logical identity, and N coh is the sum of term such that σ i σ j is a nontrivial logical operator. Then R • N incoh is the incoherent part of N L and R • N coh is its coherent part. Explicitly, and The code has two syndrome bits, given by the measured values of ZZI and IZZ, and for a minimal-weight decoder we choose the standard errors to be while the nontrivial logical operator isX = XXX. Each of the Pauli operators in Eq.(103) and Eq.(104) can be expressed as a product of a standard error and a logical operator which is eitherĪ = III orX, so the logical map becomes N L,incoh (ρ) = R • N incoh (ρ) = c 6 + 3c 4 s 2 ρ + 3c 2 s 4 + s 6 X ρX, To compare with our previous calculation of the logical channel, we note that and N L,coh (Ī) = N L,coh (X) = 0, In the notation of Eq.(28) we have found that the logical channel is parametrized by in agreement with the result found in Eq.(79). Now consider the length-n repetition code, for n odd, where the noise is the product unitary transformation U X (θ) ⊗n . The incoherent part N L,incoh of the logical channel arises from the diagonal terms {σ i ρσ i } in the chi-matrix expansion of N (ρ). Here σ i can be any one of the 2 n Pauli operators contained in {I, σ X } ⊗n . The code can correct t = (n − 1)/2 σ X errors, so σ i is error corrected toĪ if its weight w(σ i ) is t or less, and is error corrected toX if its weight is t + 1 or more. Therefore, if ρ is an encoded density operator then where the binomial coefficient n w counts the number of weight-w (or weight-(n−w)) operators. Using as in Eq.(89). The coherent part N L,coh of the logical channel arises from the terms in the Pauli operator expansion of N (ρ) such that σ i σ j =X. There are 2 n such terms -σ i can be any operator among {I, X} ⊗n , and σ j is then the complementary operator with X and I interchanged.
If σ i has weight ≤ t, and so is error corrected toĪ, then σ j has weight ≥ (t + 1), and so is error corrected toX. We obtain Therefore, in agreement with Eq.(89).

Inhomogeneous x-axis rotations
Now let's consider the logical channel obtained by decoding the length-n repetition code, in the case where the rotation angle varies from qubit to qubit. That is, the unitary noise channel is where c α = cos θ α /2 and s α = sin θ α /2. As in our previous derivation for the case where all angles are equal, we can calculate the incoherent and coherent parts of the logical channel by expanding this tensor product and isolating the terms in N (ρ) of the form σ i ρσ j where σ i σ j is either a trivial logical operator (for the incoherent part) or a nontrivial logical operator (for the coherent part). The only difference from the previous calculation is that, while previously all terms in the expansion of U X of with the same weight occurred with equal amplitudes, now operators of the same weight may have different amplitudes. Still, the derivation goes through in much the same way as before. Let S denote a subset of the n qubits, let |S| denote the size of S, and letS denote the subset complementary to S. Extending our previous argument to the case of unequal angles yields = 2 S,|S|≥t+1 α∈S ᾱ∈S Note that the sum in the expression for δ does not depend on the angles. To leading order in the small {s α }, we find = 2 S,|S|=(n+1)/2 α∈S where we have used the identity As before we find = O(s n+1 ) and δ = O(s n ). Furthermore, the expression for δ is very simple -the same as our previous formula, but with s n replaced by α s α . The formula for depends in a more complicated way on the set of angles {θ α }. But we can show that for fixed δ, is minimized when all the s α are equal. Therefore, we have a lower bound on , namely where the ellipsis indicates terms higher order in s, and we have defined Correspondingly, using n we have the upper bound on δ: Therefore, for inhomogeneous as well as homogeneous rotations, we conclude that the coherent part of the logical channel is suppressed. In fact, the case where all rotation angles are equal is the worst case, where Eq.(125) is saturated. Now let's prove that is minimized (for fixed δ), when all {s α } are equal.

Lemma 2.
Consider minimizing the function subject to the constraint n α=1 x α = c > 0, where all x α are nonnegative. Here S denotes a subset of the n variables, and |S| is the size of S. The minimum occurs for x 1 = x 2 = · · · = x n = c 1/n .
Proof. Note that f m is a symmetric function, invariant under permutations of its n arguments, and can be decomposed as Using the constraint we write and regard f m as a function of the n−1 independent variables x 2 , x 3 , . . . , x n ; then Therefore, setting the gradient of f m equal to zero we find The constraint requires that all x α are positive; therefore f m−1 (x 3 , . . . x n ) is positive and we find that x 1 = x 2 . From the symmetry of f m , we conclude that x 1 = x α = c 1/n for α = 2, 3, . . . , n, when f m is stationary. This is the unique stationary point of f m (x 1 , x 2 , . . . x n ) when all x α are positive; furthermore f m is smooth and bounded below. Therefore it must be the minimum of f m .

Correlated unitary noise
Now let's consider unitary noise acting on n qubits which does not factorize into a product of single-qubit unitaries. Since we still wish to consider noise that can be corrected by the repetition code, assume that the n-qubit unitary U has an expansion in terms of X-type Pauli operators: where S denotes a subset of the n qubits and X(S) = ⊗ α∈S X α is the X-type operator supported on S. (X α means X acting on the αth qubit, and it is implicit that I acts on qubit α for α ∈ S.) Unitarity of U implies S |ψ(S)| 2 = 1, where S is a nonempty set and S + S = S ∪ S \ S ∩ S is the disjoint union of S and S . To make the analysis of the noise more tractable, let's also suppose the noise is invariant under permutations of the n qubits. In that case, ψ(S) = ψ(|S|); that is, the amplitude ψ depends only on the weight w = |S| of the error operator X(S). A tensor product of n identical unitary X rotations, U = (cI − isX) ⊗n , is the special case where an exponential function of the weight w.
The symmetric unitary transformation may also expressed as U = e −iH where H is a symmetric n-qubit Hamiltonian of the form We are assuming that there is no geometric locality constraint on the interactions among the qubits -the strength of a weight-w term in the Hamiltonian depends only on the weight, not on which set S of w qubits are interacting. Since h w is the coefficient of a sum of n w terms, it is implicit that h w decays as a function of w. It is natural to assume that n w h w = O(n), as only in that case do we expect (for h w sufficiently small) the probability of a logical error to drop rapidly as n gets large. For example, if h 2 = O(1), then each qubit has O(1) coupling strength with n−1 other qubits, so the strength of the noise acting on each qubit grows linearly in n, and error correction fails for n sufficiently large. We will elaborate on this point in the discussion below of two-body correlated noise. In a more realistic noise model, the higher-weight terms in the Hamiltonian would have O(1) strength (independent of system size), but would decay sufficiently rapidly as the qubits separate that the effective single-qubit noise strength is also O(1) [22,23]. The structure of the noise correlations is determined by how h w falls off as the weight w increases. In particular, if n w−1 h w = O (h w 1 ), then ψ(w) in Eq.(131) is a sum of O (h w 1 ) terms; in that case the parameters of the logical channel will be = O(h n+1 1 ) and δ = O(h n 1 ), so the coherent and incoherent parts of the logical channel qualitatively resemble what we found for uncorrelated noise. On the other hand, in the extreme case where h n = 0 and h w = 0 for 0 ≤ w ≤ n − 1, the code provides no protection against logical errors and there is no suppression of coherence. Instead we find δ = O(h n ) and = O(h 2 n ) so that = O(δ 2 ) just as in Eq. (23).
To be concrete, consider the 3-qubit repetition code and noise Hamiltonian The unitary noise has the expansion U = e −iH = (1 + · · · ) I + (−ih 1 + · · · ) (X 1 + X 2 + X 3 ) where only the leading terms are shown in the coefficient of each Pauli operator. Repeating the analysis of the logical channel as in Sec. 4.3, but now using this modified unitary noise operator, we find (showing only the leading terms), which yields whereχ denotes the logical chi matrix after error correction. (We don't find any contribution to the coherent part of the logical channel depending only on h 2 , because the h 2 term in the Hamiltonian has even X parity, while the logical operatorX has odd parity.) Now whether coherence is suppressed hinges on the strength of the h 3 term in the Hamiltonian. In particular, if h 3 is large compared to h 2 1 and h 2 , then highly correlated noise dominates, and coherence of the logical channel is unsuppressed.
As another instructive example, consider the length-n repetition code, where the Hamiltonian contains only single-qubit and two-qubit terms. We will compute the coherent and incoherent parts of the logical channel following the same reasoning as in Sec. 4.3. Again, we'll need to sum over all the possible values of the syndrome weight, which we'll now denote by k. For each value of k, we'll find a contribution to the chi matrix for the error-corrected logical channel, with logical operators acting on the encoded density operator ρ from the left and from the right. Each such operator can be obtained in many ways as a product of one-body and two-body terms in the Hamiltonian, and we'll have to do some combinatorics to sum up those contributions. By computing the logical chi matrix, and comparing its coherent and incoherent parts, we can prove the following: Theorem 2. Consider the bit flip code with n qubits, and let the noise model be given by the n-qubit unitary map After error correction, the logical noise channel satisfies the following bound relating the coherent and incoherent components: whereχ denotes the logical chi matrix. Eq.(141) holds for any odd n, and for any h 1 , but we have made the approximation nh 2 1, neglecting a multiplicative (1 + O(nh 2 )) correction on the right-hand side.
Theorem 2 implies that, even for this correlated unitary noise model, the coherence of the logical noise channel is heavily suppressed for large n. In fact, the ratio of the coherent to incoherent components of the logical noise channel is similar to what we found for the uncorrelated case, where h 1 ≈ θ/2; compare Eq.(90).
Proof. To prove the Lemma, we'll compute first the coherent component of the logical channel, then the incoherent component, and finally we'll compare the two to obtain Eq.(141).
The unitary operator U = e −iH can be expressed as where s 1 = sin h 1 , c 1 = cos h 1 , t 1 = tan h 1 , and likewise for h 2 . In our computations, we will suppress the prefactor c n 1 c n(n−1)/2 2 , which is implicit in all formulas, and we will expand U in a collisionless approximation. That is, we will neglect terms in the expansion in which operators such as X i and X i X j or X k X i and X i X j act on a qubit in common. The terms we are neglecting are systematically suppressed by powers of nh 2 compared to the terms we are keeping. More precisely, these corrections can be absorbed into a multiplicative renormalization of h 1 and h 2 by a factor (1 + O(nh 2 )).

Coherent component
Let us look first at the coherent componentχ XI of the logical chi matrix. For each syndrome of weight k, the physical error contributing to this logical component consists of an uncorrectable X error of weight n − k on the left of ρ and a correctable X error of weight k on the right, where k ranges from 0 to (n − 1)/2. The operators on the left and right are supported on disjoint sets of qubits. When we write these operators as products of one-body and two-body terms we will need to count the number of ways of dividing a set of 2p X errors into distinct combinations of p two body terms. We denote this number by κ p where Let us count the terms with k L factors of t 2 on the left and k R factors of t 2 on the right. In addition, there will be some number w of factors of t 1 on the right and n − 2k L − 2k R − w factors of t 1 on the left to fill out the coherent term. First we choose the 2k L qubits on the left where the t 2 terms act; these qubits can be chosen in n 2k L ways. Once these 2k L qubits have been chosen, there are κ k L ways to divide up the qubits into pairs where the two-body terms act. Next, we choose the 2k R qubits on the right where the t 2 terms act. Because the operators on the left and right are supported on disjoint sets of qubits, these 2k R qubits can be chosen in n−2k L 2k R ways. Once these 2k R qubits have been chosen, there are κ k R ways to divide up the qubits into pairs where the two-body terms act. Of the remaining n−2k L −2k R qubits where no two-body terms act, we choose w qubits on the left where the one-body terms acts; these can be chosen in n−2k L −2k R w ways. As usual, this contribution to the logical channel has a phase, which is determined by including a factor of −i for each term in the Hamiltonian which acts from the left, and a factor of i for each term in the Hamiltonian which acts from the right. By combining all these factors, we find a contribution toχ XI Next we sum over w, taking care to note the w-dependent phase in Eq.(144). Fortunately, this sum can be evaluated explicitly using an identity satisfied by binomial coefficients, just as we saw in Sec. 3. The sum ranges from w = 0 to w = (n − 1)/2 − k R , so we have To complete the evaluation ofχ XI , it remains to sum over k L and k R iñ where from Eq. (144) and (145) we have (147) In the sum Eq.(146), 2k R can be any nonnegative integer less than or equal to (n − 1)/2, and 2(k R + k L ) can be any nonnegative integer less than or equal to n − 1.
Our goal is to compare this coherent component with the incoherent component, which can also be expressed as a sum. Instead of performing an unrestricted sum over k L and k R , we will consider the sum over k L where k L + k R = q is fixed. This collects all the terms iñ χ XI of order t q 2 . Then we will follow a similar path to compute the incoherent component χ XX to order t q 2 , so that we can compare the coherent and incoherent components in each order.
Let us isolate the parts of Ω(k L , q − k L ) that depend on q only (not on k R ), and let us introduce the shorthand m = (n − 1)/2, finding where we have used Eq.(143). Now we need to sum k R from k R = 0 to k R = q, and then sum q from q = 0 to q = (n − 1)/2. We observe that, due to the oscillating sign (−1) k R , the sum over k R vanishes when q is odd. This cancellation occurs because if we replace k R by q − k R , the summand remains the same except for a change in phase (−1) q . What's happening is that for each term contributing toχ XI with l factors of it 2 on the right and q − l factors of −it 2 on the left, there is a corresponding term with q − l factors of it 2 on the right and l factors of −it 2 on the left. These two terms have equal magnitude but opposite sign, if q is odd. Similar cancellations occur in the computation of the incoherent componentχ XX .

Incoherent component
Now we can use similar reasoning to compute the incoherent componentχ XX of the logical channel. In this case, though, we will not perform a sum over all syndromes; instead we will keep only the contribution of lowest order in t 1 and t 2 , arising from the syndrome of highest weight. This will suffice for deriving the lower bound Eq.(141), because the contributions tõ χ XX higher order in t 1 and t 2 are nonnegative. Furthermore, keeping only the lowest-order term is a good approximation when t 1 and t 2 are sufficiently small.
For n odd, this leading-order contribution arises from terms with X acting (n + 1)/2 times from both the left and the right. In a term with k L factors of t 2 on the left and k R factors of t 2 on the right, there will also be (n + 1)/2 − 2k L factors of t 1 on the left, and (n + 1)/2 − 2k R factors of t 1 on the right. Summing over k L and k R , and arguing as in our discussion of the coherent contribution, we find Here we have defined m = (n−1)/2, and the ellipsis indicates nonnegative higher-order corrections. We can again introduce q = k L + k R and isolate the portion of ∆(q − k R , k R ) that depends only on q: here k R is to be summed from 0 to q, followed by a sum over q from 0 to (n + 1)/2. As for the coherent component, the sum over k R with q fixed vanishes when q is odd, due to the oscillating minus sign (−1) k R .

Comparing the coherent and incoherent components
Now we are ready to compareχ XI andχ XX . In both cases there is a sum over k R to perform for each even value of q, and by inspecting (148) and (151) we see that the k R -dependent factors in Ω(q, k R ) and ∆(q, k R ) are nearly the same; the factor in ∆ is obtained from the factor in Ω if we replace m by m + 1. Because this factor grows rapidly with m, we see that the factor in ∆ is larger than the factor in Ω for each value of q and k R , but that by itself does not suffice for comparingχ XI andχ XX , due to the alternating sign (−1) k R in the sum over k R . To compare the coherent and incoherent logical noise components properly we must perform the sum over k R . We will make use of the generalized hypergeometric function F 3 2 . This function is defined where (a) k denotes the Pochhammer function or the rising factorial and the sum over k in Eq.(152) terminates -instead of 0 to ∞, the sum runs from 0 to −a. The same is true if b or c is a negative integer. Using this definition of F 3 2 , we can write the sum over k R of Ω or ∆ in terms of F 3 2 . We will have to distinguish the two cases 2q < m and 2q ≥ m, although we will see at the end that the final expressions will coincide for the two cases. Take the second term in Eq.(148). Supposing that 2q < m, we can write Then we can apply Dixon's identity for the hypergeometric function F 3 2 . This reads [24]. Applying this formula to Eq.(155) we get We need to do something about the first factor on the right hand side (−q/2)!/(−q)! because the gamma function has poles at each negative integer. However, this ratio can still be defined: We can substitute this into Eq.(157) and we find that we can simplify the expression Up until now we have assumed 2q < m. If we instead assume 2q ≤ m we find that the intermediate steps look different, but we arrive at the same final answer as in Eq.(159). Now we can compute the sum of Eq.(148) as k R goes from 0 to q using what we found in Eq.(159). We can also apply our result to perform the sum over k for Eq.(151). This gives: The ratio of these quantities is Now we can sum over q; because all terms are nonnegative and the bound holds for every q, we concludeχ thus proving the Lemma.

Summary
By setting q = 0, we can check that the result Eq.(160) matches what we found in Sec. 3 for the uncorrelated case. It is also instructive to consider the expansion ofχ XI in powers of t 2 , under the assumption q m. From Eq.(160) we see that where the ellipsis indicates O(q/m) corrections.
Restoring the factors of t 1 and t 2 from Eq.(146), we see this expansion in t 2 generates a multiplicative correction toχ XI which exponentiates: Since the sum over q is dominated by terms with m 3 t 2 2 /t 4 1 ∼ q, this exponential series should be a good approximation for m 3 t 2 2 t 4 1 m, or mt 2 t 2 1 , since in that case neglecting the terms higher order in q/m can be justified. Under this condition, the two-body terms in the Hamiltonian Eq.(161) make a small contribution to the total energy, suppressed by O(t 1 ) compared to the one-body terms. Recall that we also needed mt 2 1 to justify the collisionless approximation used in the proof of Theorem 2; this condition is subsumed by in which our approximations are reliable, yet the multiplicative corrections toχ XI are large. That large corrections occur, even when the two-body terms make a small contribution to the total energy, is not a surprise; we have found as expected that the noise correlations can substantially enhance the probability of a logical error. The important point established by Theorem 2 (at least for the simple noise model we have analyzed) is that even when the correlated noise produces large corrections to the logical channel, the corrections occur in both the coherent part and the incoherent part of the channel, so that our conclusion that the coherence is strongly suppressed for large n continues to apply. It is not immediately obvious why the leading power of m in Eq.(163) should be m 3q/2 , because higher powers of m occur in Ω(q − k R , k R ) and ∆(q − k R , k R ) for each fixed k R and q. It turns out that these higher powers of m all cancel when we do the sum over k R . In Appendix C we explain why these cancellations occur, providing a useful check on our results.

The toric code against coherent noise
We now analyze the logical channel for the two-dimensional toric code on an L × L square lattice, where L is odd. We'll consider uncorrelated unitary noise acting on the 2L 2 qubits, and suppose that error correction is performed using minimal-weight decoding. Our goal is to show that, when the noise is sufficiently weak, the coherence of the logical noise channel is highly suppressed for large L.
Our analysis will draw heavily on the tools we developed in our study of the repetition code. Before proceeding further, we will review some notation. Z X Figure 2: In blue is a Z-type stabilizer generator for the toric code. There are Z generators on every plaquette in the lattice. In red is an X-type stabilizer generators. There are X generators at every vertex of the lattice.

The Toric Code
We will consider the 2D toric code, which is defined on a square lattice with qubits placed on edges. We choose a square patch of lattice with side length L and identify opposite edges. (The toric code can be constructed on a lattice with boundaries, but for simplicity we choose periodic boundary conditions.) The stabilizer group for the toric code is generated by the X and Z generators shown in Fig. 2. The logical operators of the toric code are topologically non-trivial loops that wrap around the torus. Fig. 3 shows two logical operators.
The toric code is parameterized by the linear dimensions of the lattice; when the side length is L, the code distance (the minimum weight of a nontrivial logical operator) is L. We will also sometimes refer to L as the code "size." The number of physical qubits in the code block is 2L 2 , and there are two encoded logical qubits. To analyze the logical channel, we must choose a decoding procedure. Decoding the toric code is a well-studied problem and many good algorithms are known [25,26,27]. We will choose minimal-weight decoding, in which the applied recovery operation has the lowest possible weight consistent with the measured error syndrome. This recovery operation can be computed efficiently on a classical computer [28], and corrects the error with a success probability that is exponentially close to 1 when L is large and the noise is sufficiently weak.

Notation
We will use the chi matrix to describe the physical noise channel N acting on the 2L 2 qubits in the code block: where {σ i } is a basis of Pauli operators.
Definition 1. When we speak of a "noise term" we will mean a component of the chi matrix for the physical noise channel acting on the qubits in the code block. We will find it convenient to use the notation (σ i ρσ j ) for the number χ ij , the coefficient of σ i ρσ j in the chi-matrix expansion in Eq.(166).
We may choose the index that labels a Pauli operator to be (s, a, f ), where σ(s, a, f ) = E s L a S f ; here s denotes the error syndrome, E s is the standard error associated with the syndrome s, L a is a standard choice for the physical Pauli operator that acts as the logical Pauli operatorL a , and S f is an element of the code stabilizer. To compute the logical chi matrix, we sum over the syndrome s and the stabilizer elements, observing that the standard error E s is removed by the recovery procedure. Hence we find that a term in the logical chi matrix can be expressed in our notation as We say that the diagonal components of the logical chi matrixχ ab with a = b are "incoherent" noise terms. and that the off-diagonal terms with a = b are "coherent."

Coherent and Incoherent Logical Components
We are going to analyze the coherent and incoherent sums separately at first. Using path counting, and assuming the noise is sufficiently weak, we will prove that in both cases the logical chi matrix is dominated by "short logical strings" (logical Pauli operators of relatively low weight), those with length ≤ L + 2k for a constant k. Then, by summing up the contributions due to these short logical strings, we will derive an inequality relating the coherent and incoherent components of the logical channel. Our argument will use Eq.(167), where we have expressed the the logical chi matrix as a sum of terms in the physical chi matrix. In the next several sections we will analyze the sums contributing to coherent and incoherent components ofχ ab . We will make a series of approximations to simplify the sums by neglecting certain terms. In the end we will demonstrate that the two sums are related by a constant factor.

The Coherent Sum
First, consider the coherent sum. The coherent components of the logical noise channel are sums of terms from the physical noise channel. We want to upper bound the magnitude of these coherent logical components. Before we go any further, we will make some simplifications. For one, we will neglect certain coherent logical noise components. We focus on the components of the logical noiseχ ab , where exactly one of the operators L a and L b is identity and the other is either an X or a Z error on one of the two encoded qubits. These components of the logical noise channel can be expressed as a sum over physical noise terms: where L a is either an X or Z logical error on one of the two encoded qubits of the toric code. In Appendix I we prove that we can neglect the coherent terms with non-trivial logical operators on both sides of ρ, and in Appendix H we prove that we can neglect Y logical operators and operators that act non-trivially on both encoded qubits. The proof comes down to showing that terms with a non-trivial error on both sides of ρ, that act on both encoded qubits, or that apply a Y to one of the logical qubits, have high weight relative the terms we keep. A further simplification concerns the structure of the noise model. Our result applies to a noise model in which the single-qubit unitary operator acting on each qubit has an axis of rotation and angle of rotation that varies somewhat from qubit to qubit. However, we will prove that the most coherent logical channel is one in which the same unitary operator is applied to each qubit, so we may confine our attention to that case for the purpose of deriving a bound on the relative strength of the coherent and incoherent parts of the logical channel. We will make use of another way of writing the coherent sum. Each coherent term in the form of Eq.(168) can be unambiguously associated with a logical string. The product of the Pauli operators acting on the left-hand and right-hand sides is the logical operator L a S f S g , which in general consists of a connected logical string wrapping around the code block, accompanied by some number of closed loops. To be concrete, if L a is a logical X error, then the logical string contains only physical X errors, the closed loops are either loops of X errors which are disjoint from the logical string, or closed loops of Z errors which may or may not intersect with the logical string or with the closed loops of X errors (the intersections are the Y errors).
Definition 2. For a given noise term (E s L a S f ρS g E s ), we can extract a connected logical string by removing the topologically trivial loops from L a S f S g . Call this logical string L. We define the "connected part" of the noise term as the restriction to the qubits in L. The connected part of (E s L a S f ρS g E s ) is a noise term given by where the symbol | L denotes the restriction of an operator to the support of L.
Definition 3. For a noise term (E s L a S f ρS g E s ) the "disconnected part" is the part of the noise term not in the connected part. Once again, we can define a continuous logical string L by removing all topologically trivial closed loops from L a S f S g . The disconnected part of where the symbol | L C denotes the restriction of an operator to the qubits in the complement of the support of L.
Furthermore, we will be able to assume that all of the physical single-qubit errors in the connected part are X or Z type. For example, in the case of a logical X-type error, we may neglect terms in which a closed loop of Z errors intersects with the logical string. To justify this assumption, we show in Appendix G that allowing Y errors along the logical string will only make the logical noise channel less coherent.
A coherent term contributing to the logical chi matrix elementχ Z 1 I , which includes disconnected errors, is illustrated in Fig. 4. The disconnected part includes identity on the qubits without errors in addition to the the disconnected errors. Z errors acting on the density operator from the left are shown in red, and Z errors acting from the right are shown in blue. Because the errors acting from the left and right have the same syndrome s, the product of the left and right logical operators is logical. The connected logical string crosses the code block near the bottom of the figure. Associated with the syndrome s is the corresponding standard error E s , the Z error of minimal weight with that syndrome. (If the minimal-weight error is not unique, we arbitrarily choose E s to be one of the errors of minimal weight by convention.) To evaluate the logical chi matrix elementχ Z 1 I as in Eq.(167), we need to sum over the syndrome s and the stabilizer elements S f and S g . To facilitate estimating the sum, it will be helpful to organize it in an appropriate way.
To this end, we introduce the following definition: For each fixed logical string, the sum over all partitions of the logical string will produce the full set of connected terms derived from that logical string. The sum over partitions, for a fixed logical string, is directly analogous to the sum over syndromes we encountered in our analysis of the repetition code in Section 4.3. In the case of the toric code, we compute the coherent partχ Z 1 I of the logical channel by summing over all possible logical strings, and for each choice of logical string we sum over all partitions of the logical string. In addition, for each chosen logical string, we sum over the possible disconnected pieces, the additional closed loops of Z errors which are disjoint from the logical string.
Schematically, the coherent component of the logical chi matrix is This form will allow us to approximate the coherent sum. Assuming that the noise is sufficiently weak, we will prove that we can truncate the sum over logical strings, including only short strings. Furthermore, most of the short logical strings have a particular shape.
To complete the argument, we will show that the disconnected sum is approximately the same for each short logical string and for each partition of the logical string.

Counting of Logical Strings
We want to find an upper bound on the magnitude of the coherent component of the logical noise channel. We have already put the sum over physical noise terms into a convenient form by factoring out the disconnected piece of each term. Next, we will simplify the sum by restricting the set of connected pieces we need to consider; we will neglect the long logical strings in favor of those strings with length no larger than than L + 2k, where k is an Lindependent constant. To justify this truncation we will require a strong assumption on how the physical noise strength scales with L; namely, the single-qubit rotation angles must scale as 1/L. In Eq.(171), we wrote the contribution of a given logical string to the coherent logical noise as a product of a connected and disconnected part as described in Definitions 2 and 3. The connected part summed over partitions as defined in Definition 4. The sum over partitions contains 2 w−1 terms for a weight-w logical string (one containing w lattice edges). Suppose that the unitary rotation U Z (θ) = exp −i θ 2 Z is applied to each physical qubit in the toric code block. We can upper bound the sum using the number of terms times the magnitude of each term. Then, the contribution of each logical string is upper bounded by 2 w−1 (| sin(θ/2)| cos(θ/2)) w times the factor from the disconnected part. We will prove in Section 6.7 and Appendix E that the disconnected piece is 1 plus a higher weight correction that we can neglect for short logical strings.
There is a regime where we can upper bound the number of logical strings as a function of the string's length. Asymptotically, the number c w of self-avoiding random walks with length w was proven in [29] to satisfy where µ ≈ 2.64 for the 2D square lattice. We can start a walk from a fixed point along one edge of the toric code. Logical strings will be the self-avoiding walks that wrap around the torus and end at the starting point. We can use Eq.(172) to show that the contribution to the coherent logical noise from logical strings of length is exponentially decaying with as long as |θ| < arcsin 1/µ ≈ 0.39. This statement apples only for logical strings with length much greater than the minimum of L, the code distance. We do not have a precise estimate indicating at what length above L the number of logical strings begins to scale like Eq.(172). This means we do not know at what string length the contribution will begin to decay exponentially, and therefore we do not know where to truncate the sum if we wish to use Eq.(172) to bound the terms we are neglecting. In any case, in our subsequent analysis we will truncate the sum over the string length at L + 2k for some constant k. In this regime the asymptotic estimate Eq. (172) is not helpful and we will not make use of it. Instead, we will assume that |θ| is sufficiently small that we can use the following lemma to bound the terms we neglect.
Lemma 3. Suppose that | sin θ| < 1/L. In Eq.(171) we wroteχ Z 1 I as a sum over logical strings. If we truncate the sum to include only logical strings of length w ≤ L + 2k, then magnitude of the difference between the truncated sum and the complete sum is Proof. We begin by fixing a point along one edge of the code block, which can be chosen in L ways. We will count the number of logical strings that wrap around the torus and pass through that fixed In Eq.(171), for each logical string in the sum the contribution to the logical noise is a sum over partitions of the connected part times a disconnected part. We will discuss the sum over partitions in detail in Section 6.6, but for now it is enough that we know that the sum over partitions contains 2 −1 terms for each connected logical string of length . These terms have different phases and in general the sum can be complicated. We can obtain a simple bound by multiplying the number of terms by the magnitude of each term, in other words treating all the phases as if they are the same. For each weight w string, We still have to handle the disconnected piece. In Section 6.7 we will argue that the disconnected sum decreases as the length of the logical string increases. Furthermore, the disconnected part equals 1 up to corrections which are small for logical strings with length ≤ L + 2k for a constant k. This means that we can upper bound the coherent logical noise where max is the longest Z 1 logical string supported on the code. If | sin θ| < 1/L, the contribution from logical strings of length decreases exponentially with .
If we truncate the sum over logical strings to those with weight w ≤ L + 2k, the error we make is equal to the total contribution of strings with weight w > L + 2k. The contribution at weight w is exponentially decreasing with w, so we can bound the sum over the long logical strings using where α = 1 1−β . We conclude that the absolute error we make by truncating the series is where α = (1 − L| sin θ|) −1 . Therefore, the error due to truncation is exponentially small in both L and k.
In Lemma 3 we proved an upper bound on the absolute magnitude of the error due to truncation in the coherent sum. However, so far we have not described any lower bound on the terms that we have kept, arising from the logical strings with length ≤ L+2k. Therefore, we have not yet justified that the error we have neglected is small relative to the coherent noise contributions that we kept. However, we will prove in Section 6.10 that the incoherent logical noise component is at least L L L+1 2 ( sin θ 2 ) L+1 ; compared to this incoherent component the contribution Eq.(178) to the coherent component due to strings of length > L + 2k is suppressed by a factor (L sin θ) 2k . This means that the error we make in truncating the sum in Lemma 3 is negligible compared to the incoherent component, an observation which will be helpful for showing that the coherence of the logical channel is suppressed. For now, we will restrict our attention to connected logical strings with length ≤ L + 2k for a constant k. We will refer to these as "short logical strings." Definition 5. A "short logical string" is a nontrivial logical Pauli operator with no topologically trivial closed loops and length ≤ L + 2k, where L is the code size and k is our chosen cutoff constant.

Sum Over Partitions
In the previous section we restricted our attention to short logical strings, which have length ≤ L + 2k where L is the code size and k is a constant. We can go further by characterizing the shape of a logical string, and arguing that logical strings with shape meeting certain criteria give a dominant contribution to the logical channel. Definition 6. Among short logical strings, we will speak of those with "typical shape." This means two things. First, supposing that the logical string in question runs left to right across the code block, then the steps up and down along the string are by one lattice spacing at a time. Furthermore, the string contains no backtracking steps that moving from right to left. Second, the individual steps up and down are separated from each other by at least γ √ L, where γ is a small constant we may choose. This constant γ will appear in the error term in many of our subsequent estimates.
In Lemmas 9 and 10 in Appendix D we prove that most short strings have a typical shape. Among short strings with length ≤ L + 2k, the fraction of atypical strings relative to the total number of logical strings of the same length is 5 illustrates a string with typical shape for some small γ. Short logical strings with typical shape are simple, which makes our analysis easier, particularly when we discuss the sum over partitions. Let's revisit the sum over partitions for a fixed connected logical string. That is, for a given logical string contributing toχ Z 1 I , we wish to enumerate all the ways to divide the Z errors along the string into an uncorrectable error acting on the density operator from the left and a correctable error acting from the right. This sum over partitions of a fixed logical string is analogous to the sum we encountered when we summed over syndromes in our analysis of the repetition code. In the case of the repetition code of length n, there is just one length-n "logical string" to consider, and summing over syndromes is equivalent to summing over all ways of choosing a (correctable) error acting on the right that has weight at most (n − 1)/2 (where n is odd).
In the toric code, although the sum over partitions is similar to the sum over syndromes in the repetition code, there is a complication. Definition 7. An "exceptional term" is a partition of a connected logical string L such that the uncorrectable error has lower weight than the correctable error.
In some cases, depending on the geometry of the logical string, we will have some number of exceptional terms. These exceptional terms complicate our analysis of the logical channel. Fortunately, because we need only consider contributions to the the logical channel arising from short logical strings when the noise is weak enough, we will be able to fully characterize the exceptional terms and show they are negligible.
How exceptional terms can occur is illustrated in Fig. 6. Here, for the toric code with L = 9, we consider the logical string of length 15 shown in Fig. 5, and we have chosen a partition such that the uncorrectable error shown in red has weight 7, while the correctable error shown in blue has weight 8. Note that the minimal-weight standard error associated with the error syndrome on the logical string has weight 6 -it follows nearly the same path as the correctable error, but achieves a lower weight than the correctable error by taking a "shortcut" across the blue notch on the logical string. Another example of an exceptional term for this same logical string is shown in Fig. 7, where this time the weight of the uncorrectable error is 6, and the minimal-weight error has weight 5. Again, the minimal-weight error takes a shortcut, avoiding the excursions up and down followed by the correctable error.
For all these examples, the correctable error contains the qubits along the logical string that make the furthest excursions up and down. This turns out to be a universal rule, at least among the typical short logical strings -for exceptional terms, the uncorrectable error Qubits L has no support on the outermost steps along the string. In the next lemma we count the number of exceptional terms and find that relative to the total number of partitions of a typical short logical string, these exceptional terms are exponentially unlikely in L.
Lemma 4. Fix a logical string of length ≤ L + 2k, where k is a specified L-independent constant, with a typical shape according to Definition 7. This means that if the string runs left to right across the code block, it has steps up and down by one lattice spacing at a time and the steps are separated by at least γ √ L for some constant γ. To keep the fraction of atypical strings small in Eq.(179) we will choose γ to be a sufficiently small constant. Now consider all the ways of partitioning this typical logical string into a correctable error and an uncorrectable error. Then, the fraction of exceptional partitions relative to all partitions of this string is bounded by Exceptional terms are exponentially rare for typical short logical strings and large L.
Proof. Take a logical string of length ≤ L + 2k with typical shape. Each step is separated from the others by at least γ √ L for some γ. Now, consider taking a subset of −1 2 of the qubits in the logical string. We would expect such a subset to be correctable. If not, this partition is exceptional.
Choose a partition of a connected logical string and let O U be the uncorrectable error and O C be the correctable error. O U and O C share a syndrome by definition. Denote that Qubits O U O C Figure 6: Now we choose a subset of 7 of the errors in the logical string in Fig. 5. The uncorrectable error O U is in red and the correctable error O C is in blue. All three errors along the "cap" in the top right appear on the correctable side. For this reason, the correctable error has weight 8, which is higher than the uncorrectable error with weight 7. We call this a weight-7 exceptional term. syndrome by s. The decoding algorithm, which in our case is minimal-weight decoding, applies some correction to this syndrome to return it to the code space. Call this correction E s . E s is by definition a correctable error in the code, and therefore, because we are using minimal-weight decoding, E s must have lower weight than O U . The fact that we chose our code size L to be odd ruled out the case where the two might be equal. Now if the partition we are considering happens to exceptional, this means by definition that O C has higher weight than O U , and we have We will use this condition to bound the number of exceptional terms for a given connected logical string. What does it mean for O C to have higher weight than E s ? For connected logical strings of typical shape as in Definition 6, this happens only if on some subset or subsets of the logical string, the correctable error O C contains errors on qubits arranged in a "cap." By this we mean a configuration of errors where the errors form three edges of a rectangle. The minimal-weight decoder will choose the fourth edge of the rectangle as part of the correction E s . This is illustrated in Fig. 6 and Fig. 7. If the connected logical string has length greater than L, then it has steps up and down if it crosses the code block left to right. In every exceptional term, the correctable error O C will contain the outermost qubits around some of the steps, forming a cap.

Qubits
O U O C Figure 7: Again, one possible partition of the logical string in Fig. 5 is illustrated. The uncorrectable error O U is in red, and the correctable error O C is in blue. The correctable error includes all the errors along both the cap in the top right and the bottommost cap of the logical string. For this reason, the correctable error has weight 9, while the correctable error has weight 6. Therefore, we call this partition a weight-6 exceptional term. Now that we have a simple necessary condition for an exceptional term, we will bound the number of exceptional terms for each short logical string with a typical shape according to Definition 6. Start with a logical string of length . Consider first the partitions into +1 2 and −1 2 . Of course, those partitions for which the weight-+1 2 error is correctable will be exceptional. Every exceptional term like this will have the property that the correctable error contains some number of "caps" where all of the qubits around three sides of a rectangle are part of the correctable error. To bound the number of exceptional terms we will count the number of partitions with this property.
Each partition of a weight-connected logical string into weight-+1 2 and −1 2 errors is formed by choosing −1 2 out of the errors in the logical string. This is what we mean by a partition. We want to count the number of ways of choosing these errors such that the correctable error (of weight +1 2 because we are counting exceptional terms) contains all the errors along a "cap". This means that the subset of −1 2 errors contains no errors along one or more of the "caps." A typical short logical string running left to right across the code consists of horizontal segments separated by single steps up and down. The outermost of these steps form "caps." The number of such "caps" depends on the particular pattern of steps in the logical string. However, we can bound the number of exceptional terms by counting the number of ways of choosing no qubits along one of the horizontal segments of length γ √ L. This is because every "cap" consists of an outermost horizontal segment combined with the up and down steps on either side. This counting gives Ways of choosing no qubits along a horizontal segment of length γ (182) We want the number of ways of choosing no qubits along at least one of the horizontal segments. There are ≤ 2k steps up and down along the logical string. Therefore, there are ≤ 2k horizontal segments. We can use a union bound to write Number of weight- This is relative to the total number of +1 2 , −1 2 partitions for our logical string of length , which is We can expand the ratio of exceptional terms to the total using Stirling's approximation. This gives This approximation holds up to corrections O(1/ ). We can rewrite this as Next we square the (1 − γ √ L ) term in order to combine terms: We upper bound the term inside the radical and also the term raised the power /2: 50 The second of the three terms term is exponentially decaying to exp(γ 2 /2). As long as ≥ 4, we can bound it by Now, we bound −γL ( +1)/2−γL > 2 and assemble one term raised to the power L and another to the power ( − L)/2: We chose some small value for γ in Lemma 10, and then the number of exceptional terms with a weight-−1 2 logical error on one side and a weight-+1 2 correctable error on the other is exponentially small in L.
For the chosen connected logical string of weight , we have calculated the fraction of exceptional terms among the partitions into −1 2 and +1 2 . We will also have exceptional terms among the partitions into other weights, possibly all the way down to partitions into weight L+1 2 and − L+1 2 . Above, we applied the condition in Eq.(181) that for every exceptional term the correctable error must have higher weight than the minimal-weight correction. If we apply this same method to bound the number of exceptional terms among partitions into −3 2 and +3 2 , we find that the correctable error must be at least 4 longer than the minimalweight correction. This means we want to count the number of configurations where at least two of the "caps" are contained in the correctable error. This is clearly far fewer than the number of configurations where one "cap" is contained. Therefore, the ratio of exceptional terms to total partitions is bounded by the ratio we found for partitions into −1 2 and +1 2 . In the end we see that number of weight-−1 2 exceptional terms is exponentially small in L for fixed k and further that the weight-−3 2 exceptional terms are exponentially small in L relative to the higher-weight exceptional terms, and so on. Then for large L, exceptional terms are negligible.
Lemma 4 allows us to approximate the sum over partitions for a typical, short logical string L. Neglecting exceptional terms, the sum over partitions resembles the calculation of what we called δ in the repetition code in Eq.(86) and Eq.(117). Let L have length . Each partition contributes sin θ 2 with a phase. The sum over partitions is given by where is the error from exceptional terms, which is upper bounded This is two times the the expression in Eq.(190), because each exceptional term contributes to the sum over partitions with the opposite sign relative to a non-exceptional term.

The Disconnected Part
In the preceding subsections, we analyzed the coherent component of the logical noise channel, expressed as a sum over many physical noise terms. So far we have only considered the connected logical string associated with each coherent term. In this subsection, we will analyze the disconnected errors in more detail, and describe more rigorously how they affect the evaluation of the coherent terms in the logical channel. In Sec. 6.4 we described how to decompose a contribution toχ Z 1 I into a connected piece and some number of disconnected pieces. The left and right hand side of each coherent term can be expanded as the product of the errors contained in the connected logical string and the errors outside of it; schematically, The factor "Disconnected" means the contribution to the coherent term from disconnected components that appeared in Eq.(171). The product of the two (disjoint) factors Conn L and Conn R yields the connected logical string, with no additional disjoint loops included. The connected factor includes sin θ/2 cos θ/2 for each qubit along the connected logical string. The disconnected factor includes (cos θ/2) 2 on every qubit not in the connected logical string in addition to a sum over all possible disconnected errors. Fix a partition (O U ρ O C ) of a short, typical logical string, and consider dressing it with disconnected errors. We can distinguish two types of added errors: incoherent and coherent. If the disconnected error is D L acting on the density operator from the left, and D R acting from the right, then if a particular qubit is hit by the same error contained in both D L and D R , we say that the disconnected error acting on that qubit is incoherent. If a particular qubit is hit by distinct errors contained in D L and D R , then the error is coherent. The product D L D R of the errors added on right and left must be a non-identity stabilizer operator, i.e. a closed loop or a set of disjoint closed loops. (Here, because we are investigating the encoded Z errors in the logical channel, only the Z-type physical errors are considered.) The two types of added error -incoherent and coherent -are shown in Fig. 8, where (A) is an incoherent-type added error and (B)-(D) are coherent-type. Let us first treat the case of incoherent-type added errors, where D L = D R ≡ D. These are the ones with the same disconnected error added to both operators in the partition, for example (A) from Fig. 8. These terms do not change the phase of the original partition, and they multiply the magnitude by (sin θ/2) 2m if m is the weight of the error added on each side. The disconnected part contains cos 2 θ/2 on each qubit corresponding to no disconnected errors plus many configurations of disconnected errors. The incoherent-type added errors on each qubit in the disconnected part supply the sin 2 θ/2 term to give 1 on the qubits not contained in the connected logical string. This reasoning applies to each incoherent-type added error that does not change how the operators O U and O C are decoded. In other words, if D is the disconnected error we add to O U and O C , we require that DO U is an uncorrectable error.
We must be careful because in some cases the added incoherent-type errors can change how the correctable and uncorrectable errors in the partition are decoded. The added error can "flip" the uncorrectable error to a correctable one. This means that the noise term that contributes to the logicalχ Z 1 I component is not (DO U ρDO C ) as we would have expected but is instead (DO C ρDO U ). This term has the opposite sign relative to the expected term. This is only possible when the added error D is located very near the connected logical string and only for special partitions. We prove in Lemma 11 in Appendix E that the contribution from these disconnected terms is negligible.
What of the coherent-type added errors? Again, fix a partition of a connected logical string. Let O U and O C be the correctable and uncorrectable errors. Now consider choosing a stabilizer operator or a closed loop, that is disjoint from the connected logical string. Let the length of the loop be | |. Now choose a subset of p of the qubits in the loop, and let the disconnected error D L act on these p qubits from the left, while the disconnected error D R acts on the remaining | | − p qubits from the right. Suppose further that the qubits in the loop and the partition are such that the uncorrectable error O U plus the additional error D L remains uncorrectable. This need not always be true; we will consider the case where the O U D L is correctable in a moment.
Supposing that the disconnected error D L does not change the decoding, we can perform a sum over all the ways of choosing the p errors in D L from among the | | errors in the loop. The number of ways of choosing p errors is given by a binomial coefficient, and the magnitude of each term is suppressed by (sin θ/2) | | relative to the original partition of the connected logical string without any additional disconnected errors added. The phase of each term depends, as always, on the relative weight of the errors on the right and the left. The disconnected part contributes a phase of (i) p (−i) | |−p , and is a closed loop so | | is even. The sum yields When we sum over all ways of forming disconnected terms out of the original loop , the sum is 0. This holds for any loop such that the disconnected part does not change how the connected part is decoded.
In the examples we considered in Fig. 8, the additional disconnected errors did not change how the connected part was decoded. This is the same condition we encountered in the discussion of incoherent-type added errors. In certain cases the error D L that we add to the O U side of the partition can be such that D L O U is a correctable operator. This means the partition is "flipped" by the disconnected error. We account for this case in Lemma 11 and prove that the contribution to the logical noise from these special disconnected terms is negligible for short logical strings.
Using Lemma 11 we can neglect the added errors that change how the partition is decoded. Then, we can conclude that the net contribution from coherent-type added errors is 0 and the incoherent-type added errors contribute a sin 2 θ/2 factor on each qubit not in the connected logical string. This implies that the "Disconnected Sum" term in Eq.(171) is equal to 1 plus a small correction. This implies that where L is a connected, short, typical logical string, partitions refers to the partitions of L denoted (O U ρ O C ), and E is a noise term. The error term satisfies This error term is from lemma 11 and comes from the added errors that change how the partition is decoded. The term "High Weight" in Eq.(195) is the error from Lemma 3 corresponding to the contributions of logical strings with length > L + 2k. We have not yet justified that this error is small relative to the short strings. This is because we do not have a lower bound on the short strings. The justification comes from our subsequent discussion of the incoherent logical noise components.

Incoherent Sum
Now that we have simplified the sum for the coherent components of the logical noise channel, factored out the disconnected pieces, and performed the sum over syndromes for the connected pieces, we turn our attention to the incoherent logical noise components. We start by making several of the same simplifications we made in the coherent sum. Of the incoherent logical components (L aρLa ), we neglect all the terms where L a is a logical Y operator or acts non-trivially on both encoded qubits. We retain only the terms where L a is a logical X or Z on one of the two encoded qubits. The reason is the same as for the coherent sum. The neglected terms are much higher weight, such that the path counting excludes them. Then we have the sum where L a is an X or Z logical operator on one of the encoded qubits and identity on the other. Again, we suppose that all the angles are equal to some fixed θ for each single-qubit rotation. We will extend to general rotations in Lemma 8. Again, we will divide each term into connected and disconnected pieces. In this discussion of the incoherent logical noise components, Definition 2 must be modified. The noise terms that enter into the incoherent logical noise contain an uncorrectable error on both sides of ρ. We will need to consider two logical strings in our definition.
Definition 8. The "connected part" of a noise term (E s L a S f ρS g L a E S ) is a noise term defined in the following way: let L 1 equal L a S f with all topologically trivial closed loops removed and L 2 equal L a S g with all trivial closed loops removed. Then, let A denote the set of qubits ⊂ L 1 ∪L 2 where either E s L a S f or E s L a S g , or both, act non-trivially. The connected part of (E s L a S f ρS g L a E S ) is given by | A denotes the restriction of an operator to the set of qubits A.
If the incoherent term is (O U ρ O U ) then this definition captures the set of qubits in the support of O U or O U that lie along the two logical strings formed by O U and E s and O U and E s pruned of all trivial closed loops. Fig. 9 illustrates the connected and disconnected part of a noise term that enters into the incoherent logical noise. The connected part of the noise term in the figure features factors of sin θ/2 cos θ/2 for the qubits that appear in exactly one of O U or O U and sin 2 θ/2 for the qubits that appear in both O U and O U . We can lower bound the connected part of each incoherent noise term by (sin θ/2 cos θ/2) |O U |+|O U | . This will be useful later on when we sum over many possible choices for the operators O U and O U . Definition 9. The "disconnected part" of a noise term (E s L a S f ρS g L a E s ) is the restriction of the noise term to the qubits not in the connected part. In Definition 8 we constructed the set A, which contained the qubits in the connected part. The disconnected part is given by where | A C denotes the restriction of an operator to the complement of the set A. For the example in Fig. 9, the disconnected part features factors of sin θ/2 cos θ/2 for the six qubits along the trivial closed loop near the top of the figure and cos 2 θ/2 for the rest of the qubits. For a given connected part, we can imagine adding disconnected errors to form many different noise terms. The connected part contains factors of sin θ/2 cos θ/2 for each qubit that appears in one of the uncorrectable errors and sin 2 θ/2 for each qubit that appears in both errors. The disconnected term includes cos 2 θ/2 for each every qubit not in the connected part plus a sum over all possible coherent and incoherent-type disconnected error. Just as in Section 6.7, when the disconnected errors do not change how the connected term is decoded, the incoherent-type errors give cos 2 θ/2 + sin 2 θ/2 = 1 on qubits not in the connected part. The coherent-type disconnected errors, which form loops split between left and right, sum to zero because of the alternating signs.

Disconnected Part Connected Part
Just as in the case of the coherent logical noise components, some disconnected errors will not be allowed because they change how the connected term is decoded. We will set the disconnected part equal to 1 plus an error term that comes from these disallowed disconnected errors. In Lemma 12, we justify this by proving that the error term is small. This is analogous to Lemma 11, where we prove that the disconnected part of the coherent logical noise components is equal to 1 up to small corrections.
We want to continue to follow a similar argument to the one for the coherent terms. The next step is restricting the set of connected terms we consider. We will break up each error into connected and disconnected pieces and restrict ourselves to noise terms with low-weight connected part, where the total weight of the connected part is bounded by L + 2k + 1; here k is the same L-independent cutoff as in the coherent sum. Just as for the analysis of the coherent logical noise in Section 6.5, we will require θ to scale like 1/L to justify this truncation of the noise terms contributing to the connected part.
Lemma 5. Consider an incoherent logical noise component, sayχ Z 1 Z 1 . We write this logical noise component as a sum over physical noise terms (O U ρ O U ). Then, if | sin θ| < 1/L, we can truncate the sum to include only those noise terms where |O U | + |O U | ≤ L + 2k + 1, where k is the same cutoff constant as in Lemma 3. In other words, Proof. We split each noise term into connected and disconnected parts. We show in Lemma 12 that the disconnected part is decreasing as the weight of the connected part increases. Moreover, the disconnected part is approximately equal to 1 for connected terms with total weight ≤ L + 2k + 1. Therefore, we need only consider the connected part as we proceed to truncate the sum and upper bound the error.
Let us denote the connected part of a noise terms that enter into the logicalχ Z 1 Z 1 component by (O U ρO U ). All such noise terms have the shape drawn in Fig. 10 . This is because the lowest-weight operator with the same syndrome and logical action has weight ≤ w. Then, O U consists of this lowest-weight operator combined with a number of additional deviations like we considered to derive Eq.(174). (Here we are neglecting a factor which is polynomial in w and w ; bounding the exponential dependence on w − w will suffice for what follows.) All together we have the following upper bound on the number of noise terms with fixed w and w : Each of these terms has magnitude at most (sin θ/2) w+w , which is positive because w + w is even. As in Lemma 3, we will truncate the sum and keep only those connected noise terms with w + w ≤ L + 2k + 1. If we let w + w = w total , for each w total there are several combinations of w and w with the same total. Because w and w must be > (L + 1)/2, there are less than w total − L combinations. We perform a sum over w total from L + 2k + 1 up to the maximum weight. Therefore, if we let denote the contribution from the higher weight connected terms toχ Z 1 Z 1 , then is bounded by Here we have estimated the sum over w total using the same method as in the derivation of Eq.(178). We will compare this error to the contribution from the lowest weight noise terms. These terms have w = w = (L + 1)/2, and contribute at least ζ, where Then, the relative error associated with our truncation is given by We have neglected a polynomial factor in L in our counting of noise terms. Nevertheless, as long as L| sin θ| < 1, the relative error is exponentially small in k, and the higher-weight connected terms are negligible.

The Incoherent Sum Over Strings
The connected part of the incoherent components is not as simply expressed as a sum over strings as the coherent components because each uncorrectable error E s L a S x can generally be completed to many different logical strings by multiplying by different correctable errors. Nevertheless, we can rewrite the sum in a similar way. This will form our primary tool for comparing coherent and incoherent logical noise components. Fix a connected logical string L with length and choose an uncorrectable subset of the logical string O U with weight w and let w ≥ ( + 1)/2. We also require that the complement of O U , denoted O C , has the same weight as the minimal-weight correction E s . Then, choose a second uncorrectable error O U with the same syndrome and with weight w . This will produce every incoherent connected term (O U ρ O U ). However, each uncorrectable error O U will appear many times as we sum over logical strings. Each O U will appear once for every correctable error O C with the same weight and syndrome as O C . These correctable errors complete O U to a lengthlogical string in every possible way. We chose O U such that O C had minimal weight, so that this over-counting had a relatively simple form. The number of O C operators for a given O U operator gives us the over-counting of that O U operator in the sum over logical strings. Each incoherent logical noise component can be written as a sum over connected logical strings L times a disconnected factor. This form of the sum will allow us to compare with Eq.(171). We sum over logical strings, and for each logical string we sum over possible choices of O U and O U . We divide by the over-counting factor for each O U . This gives where To reiterate, Eq.(205) expresses an incoherent logical noise component as a sum over connected logical strings. For each string L with weight we sum over all uncorrectable subsets O U of weight ≥ ( + 1)/2 such that the complement O C has weight equal to the minimalweight correction of O U , namely E s . For each O U we must divide by the number of correctable errors O C with the same syndrome and weight as O C in order to cancel the over-counting. {O C } is the set of such operators, and |{O C }| is its cardinality. Finally, we sum over all uncorrectable operators O U with the same syndrome to produce the complete set of incoherent terms. We will prove the following lemma, which provides a lower bound on the contribution of each logical string to the incoherent logical noise component. We will apply this lemma to lower bound the incoherent logical noise strength in terms of the coherent logical noise strength.
Lemma 6. As long as | sin θ| < 1/L, we can apply Lemma 5. This means that in Eq.(205) we can restrict to the case where |O U | + |O U | ≤ L + 2k + 1. Let us also suppose that |O U | = |O U |. This assumption will be justified by Lemma 7. Then, we can pick a connected logical string L with |L| = such that ≤ L + 2k. L is a Z 1 logical string if we are calculating theχ Z 1 Z 1 logical noise component. O U is subset of L such that O U is corrected to a logical Z 1 operator and |O U | = ( + 1)/2. O U is a operator with the same weight, syndrome, and logical action as O U . For each fixed L with length ≤ L + 2k the following holds: Proof. For each short logical string L with length , we partition it into an uncorrectable operator O U of weight w = ( + 1)/2 and a correctable operator O C of length |O C | = |E s |. Then, we consider the alternative uncorrectable and correctable paths, O U and O C , with weight w and |E s |, respectively. The logical string L is short, so we can use Lemmas 9 and 10. Say the logical string runs right to left across the code. We observe by studying Fig. 12 that we have multiple possible strings of the same weight exactly when both a vertical error and some number of adjacent horizontal errors are contained in either the correctable or uncorrectable part. Suppose that for some partition consisting of an uncorrectable operator O U and a correctable operator O C plus one additional error. This is shown in Fig. 11, where we have kept the error on the farthest left vertical segment fixed and flipped the rest relative to the term in Fig. 12. For every O U there are w possible mappings, one for each of the w choices of the single-qubit error that remains fixed. In the same way, every O U is mapped onto by w different mappings acting on w other operators with the same logical action as O U . Then, there exists a convention that selects exactly one partner for each O U .  Figure 12: This is a partner of the partition shown in Fig. 11. It is another partition of the same logical string, and the errors in O U and O C are interchanged except for one qubit. In this case, that qubit is the one that lies on the farthest left vertical segment. The error on that qubit is part of O U in both partitions. Once again, the operator O U is in red and the operator O C is in blue. The alternative operators with the same weight, syndrome, and logical action are given by the dashed lines. For this partition |{O U }| = 12 and |{O C }| = 2.

Qubits
We assumed that for the O (1) U that we started with (208) The mapping we described constructs a partner O Then for each pair O U , we can lower bound the contribution to the incoherent logical noise using Finally, we apply the lower bound to the entire sum over O U to conclude The number of terms in the sum over O U is at most w , where is the length of the logical string L. For typical, short logical strings the binomial coefficient will be the number of terms in the sum over O U up to a small correction.

Noise Terms with Mismatched Weight
We have already shown that we can neglect the high-weight noise terms in the incoherent logical noise components, and we can also write the incoherent logical noise components as a sum over logical strings. Next, we will show that among the low-weight noise terms, we may neglect the terms with different weight errors on each side of ρ. This is crucial to our proof that the coherence of the logical noise is suppressed. We will construct a lower bound on the incoherent logical noise components and an upper bound on the coherent logical noise components. The noise terms with mismatched weight enter with a phase of −1 whenever the difference between the weights on left and right = 2 mod 4. A large contribution from noise terms with mismatched weight could spoil our lower bound on the incoherent logical noise. Fortunately, no such contribution occurs.
Lemma 7. If | sin θ| < 1/L, then the incoherent logical noise componentχ Z 1 Z 1 can be writteñ The sum over L includes all typical, short logical strings with length such that ≤ L + 2k.
The sum over O U includes uncorrectable subsets of L with weight ( + 1)/2 such that the complement O C has minimal weight. The sum over O U has the same syndrome and the same weight as O U . The error term comes from the high weight terms we neglected in Lemma 5.
Proof. Using Lemma 5 we can truncate the sum over noise terms in the incoherent logical noise componentχ Z 1 Z 1 to include only those noise terms with total weight ≤ L + 2k + 1.
In doing so we make an error that is exponentially small in the cutoff k, assuming that the single-qubit angle of rotation θ satisfies | sin θ| < 1/L. We will use Eq.(205) to express the incoherent logical noise components as a sum over strings. We will begin by reviewing how we construct that form of the sum. We denote the weight of O U by w and O U by w . We upper bound the mismatchedweight terms where w = w by letting w > w and multiplying by two. As in Eq.(205), we can generate the complete set of connected incoherent terms with fixed w and w by summing over connected logical strings L. Denote the length of the logical string by |L| = .
To produce the incoherent terms with fixed w + w , it will suffice to sum over logical strings with < w + w . We already restricted to low-weight terms, so w + w ≤ L + 2k + 1. For each logical string, we sum over the uncorrectable subsets O U with weight w. We will also require that the complement of O U , which we denoted O C , has minimal weight. This is to control the over-counting of each incoherent term (O U ρ O U ). Then for each O U , we sum over the operators O U with weight w that have the same syndrome and logical action as O U . As discussed in Section 6.9, we must also divide by an over-counting factor 1/|{O C }| that is a function of O U and equals one over the number of times O U appears in the sum over L. The contribution to the incoherent logical noise is lower bounded by The inequality comes from the cosine factors. If the operators O U and O U act on the same set of qubits, then we have (sin θ/2) 2w with no cosine factors in the connected part. The lower bound corresponds to the case where O U and O U act on disjoint sets of qubits, and we pick up a cosine factor on each qubit in the connected part. We also have an upper bound: In this bound, we have (sin(θ/2)) w+w . This corresponds to the case when O U and O U act on the same qubits, yielding no cosine terms.
Consider first the terms with w = w . These are generated from logical strings of length ≤ 2w − 1. Some strings L with length such that L ≤ ≤ L + 2k have typical shape and some do not. We will prove first that the contribution from a given string of atypical shape is no greater than that of a string of typical shape, in fact it will be less. We will conclude that we can safely neglect the contribution from strings of atypical shape (because there are fewer such strings). This is the same simplification we made in our discussion of the coherent logical noise components in Section 6.6.
We require that O U is an uncorrectable error, so we cannot choose any subset of w qubits in L. We ignore the subsets that correspond to exceptional partitions like we discussed in Section 6.6. Now, if we have two connected logical strings of the same length, one with a typical shape and one with an atypical shape, we want to compare the terms with w = w . The first thing we notice is that exceptional terms are exponentially unlikely for the string with typical shape, while for the atypical string, exceptional terms may be a significant fraction of the total partitions. This tells us that in the sum over O U there are many more terms for the typical string than for the atypical string. We have argued about the number of terms in the sum in Eq.(213) and Eq.(214), but we must also consider the magnitude of each term, which is given by the ratio of |{O U }| over |{O C }|.
We must argue that after summing the ratio of {O U } and {O U } over O U , the result is less for an atypical string than for a typical string. {O U } here is the set of uncorrectable operators with the same weight and syndrome as O U and {O C } is the set of correctable weight-− w operators with the same syndrome. Suppose the logical string runs left to right across the code. The set {O U } contains more than one element whenever O U contains a set of contiguous qubits around one or more of the vertical steps in the logical string. This was discussed in detail in Section 6.9. |{O U }| and |{O C }| are large when either O U or O C contain contiguous sets of qubits around the vertical steps. The typical logical string has at least γ √ L horizontal steps around each of the vertical steps. The atypical string does not. This means that the typical string has more possible sets of qubits around each vertical step that make |{O U }| or |{O C }| large. Therefore, |{O U }| and |{O C }| will tend to be larger for the typical string. The ratio of |{O U }| to |{O C }| is what determines the contribution to the incoherent logical noise. In Lemma 6 we showed how we can match up terms such that for each O U such that |{O U }|/|{O C }| = c, the partner has |{O U }|/|{O C }| ≥ 1/c. If c is large, then c + 1 c 2. It follows that because the typical string has more operators O U in the sum and the |{O U }|/|{O C }| factors tend to be larger, the contribution to the incoherent logical noise is smaller for an atypical string than the contribution from a typical string. When we combine this fact with the fact the atypical strings represent an small minority, this means we can neglect the atypical strings among the w = w terms in the incoherent logical noise. The error is given by Lemma 10. Now, consider the mismatched-weight terms that are the subject of this lemma. Fix w + w and suppose w > w . For each L we can construct a number of incoherent terms with mismatched weight depending on the length and shape of L. Let |L| = . Once again O U is an uncorrectable subset of the logical string L with weight w. Then, for each O U we have the possibility that there may exist an operator O U with the same syndrome as O U and lower weight. We sum over the set of O U such that for each O U there exists an O U with weight w . If we sum over all logical strings of length < w + w , we produce every connected incoherent term with |O U | = w and |O U | = w . We will proceed by fixing a logical string and upper bounding the sum over w of the noise terms derived from this logical string with w + w fixed. In this sum the terms will alternate sign as w increases. The terms with w = w have a positive sign. As we seek to bound the contribution of these mismatched-weight terms to the incoherent logical noise, there are two things we need to bound. First, we must understand the combinatorics that govern the number of operators O U that permit lower-weight O U . Second, we must bound the factor |{O U }|/|{O C }| for each such O U .
Suppose that the logical string L has a typical shape. To be concrete, consider the set of operators O U with weight w = ( + 3)/2. If there exist lower-weight O U , then O U must contain all of the qubits around a "cap", which is similar condition to the one we discussed in Section 6.6. Each cap has width at least γ √ L because the string is typical. This means that such O U are exponentially few relative to the total set of uncorrectable O U with weight w. This is the same calculation as in Lemma 4. We compare these terms to the terms with |O U | = ( + 1)/2 = |O U |. There are exponentially more of these terms where |O U | = |O U |. This means that in Eq.(213) and Eq.(214) the sum over O U contains exponentially more terms when w = w for a typical logical string. The summand also tends to be less for the w > w terms. The argument is similar to the one we used earlier when we were discussing the w = w terms from typical and atypical string. |{O U }| is large when O U contains many qubits around several of the different steps up and down along the logical string. In this case, the terms with w > w feature operators O U that contain at least one of the "caps" along the logical string. This removes at least two the vertical steps. These steps cannot contribute to |{O U }|. Then by the argument we used above, the ratio |{O U }|/|{O C }| tends to less for the w > w terms relative to the w = w . We chose w = ( + 3)/2, but we could have chosen any w > ( + 1)/2 and any w < w. We would find that there are 2 γ fewer of the mismatched weight terms. We have a factor of 2 γ √ L for each cap contained in O U . We conclude that the mismatched-weight terms are negligible for strings of typical shape.
Finally, consider a logical string L with an atypical shape. Fix w + w . We already neglected the contribution of atypical strings to the w = w terms. We seek a bound on the contribution from the terms with w > w for atypical strings. We will compare two sets of terms for fixed L with length . On the one hand, take the terms with w = w 1 for some w 1 > ( − 1)/2 and w = w 2 < w 1 . On the other hand, take the terms with w = w 1 + 1 and w = w 2 − 1. We will show that the latter set of terms contribute less than the former. This will tell us that the sum over mismatched-weight terms for the fixed string L is bounded by the contribution from terms with |O U | = |O U |.
When O U has lower weight than O U , O U must contain all the qubits along a cap. If O U − O U = 2j, then O U must contain at caps with total height at least j. Because the logical string L has an atypical shape, these caps may have width one or height greater than one. It will not be exponentially unlikely that all qubits around a small cap are contained in O U . For the terms with w = w 1 and w = w 2 , relative O U O U contains all the qubits around w 1 − w 2 of the caps. These terms will be a fraction of the w 1 subsets of w 1 qubits in the logical string L. We compare these terms to the ones with w = w 1 + 1 and w = w 2 − 1 keeping our logical string L fixed. These terms include all the qubits around an additional cap. On one of the caps, instead of containing at least one and less than all of the qubits, O U contains all of the qubits around that cap. This stricter condition of O U means that the fraction of the total w 1 +1 weight w 1 subsets of L that feature an O U operator with weight w 2 − 1 is smaller than the fraction of the total w 1 weight w 1 subsets of L that feature an O U operator with weight w 2 . This means that in Eq.(213) and Eq.(214) in the sum over O U for our fixed logical string the number of possible O U at a given weight w > ( + 1)/2 is given by a binomial coefficient times a function that decreases monotonically as w increases. As for the summand |{O U }|/|{O U }|, we apply the same reasoning as above. For each cap contained in O U , there are fewer vertical steps to create many operators O U . This implies that the summand will tend to be smaller as w increases. The sum over the different values of w has the form where c ≥ (l + 1)/2 and f is a monotonically decreasing function. The inequality in Eq. (215) is proven by pairing the adjacent terms in the sum, positive and negative, to produce small positive contributions bounded by the contributions in the case where f (w) = 1 for all w.
It follows that the sum over the mismatched-weight terms derived from the logical string L is positive, and moreover is bounded by the w = w terms. We already argued that the w = w terms from atypical strings are negligible relative to those terms from typical strings. Finally, we can lower bound the incoherent logical noise component by neglecting the atypical strings and for the typical strings, neglecting the mismatched-weight terms. This yields Eq.(212).
We are left with only the incoherent terms that have the same weight of uncorrectable error on each side and the weight is ≤ L+2k+1

2
. These terms all have the same phase +1, so the incoherent terms with different weights will add constructively. This gives us lower bounds on the logical incoherent noise strength. Each logical string with length ≤ L + 2k contributes at least +1 2 sin θ 2 +1 . When is much larger than the minimum logical string length, L, the number of logical strings is given by Eq.(172). In particular, the incoherent logical noise components must be larger than the lowest order term. This at last completes the argument begun in Section 6.5 about neglecting the contribution to the coherent logical noise from connected logical strings with length > L + 2k for a cut-off constant k. In Lemma 3 we proved that the contribution from long logical strings is upper bounded by αL 2k+1 | sin θ| L+2k , where α is (1 − L| sin θ|) −1 . This bound is exponentially small in k relative to the lowest order incoherent logical noise component, L L sin θ 2 L+1 . Our aim is to compare the logical coherent and incoherent noise components, and we have shown that the contribution to the coherent logical noise from long strings is small relative to the incoherent logical noise components. Therefore, we can safely neglect the long connected logical strings. The same applies for the truncation error in Lemma 5. The truncation error is negligible relative to the lowest order incoherent terms for large enough k.

More General Rotation Angles
In Sections 6.4 and 6.8, we simplified the problem by assuming that all qubits are rotated by the same single-qubit unitary rotation. Now we want to extend our result to more general single-qubit rotations. We will allow the magnitude of the rotation angle to vary from qubit to qubit and will also allow different axes of rotation for different qubits. Here we will assume that each rotation axis is contained in the X-Z plane. Physical Y errors are treated in Appendix G, where we prove that rotations partly along the Y -axis produce less coherent logical noise channels than those arising from rotations along axes in the X-Z plane. The idea of the proof is the same as that of Lemma 2. We will consider the coherent and incoherent logical noise components as functions of individual qubit rotation angles and prove the coherent component is maximized relative to the incoherent component when all rotation angles are equal.
Lemma 8. Consider the toric code with qubits subject to single-qubit rotations, where each rotation axis lies in the X-Z plane, and both the rotation axis and angle of rotation may vary from qubit to qubit. The bound on the coherence of the logical noise channel proved in Theorem 3 continues to apply if the rotations are sufficiently close to uniform; that is, provided that each rotation axis and angle deviates from a fixed constant value within a bounded region.
Proof. Suppose at first that all rotations are about the Z-axis and denote the rotation angle for the ith qubit by θ i . Each logical coherent or incoherent component is a sum of physical noise terms, which are functions of all the angles θ i . We will refer to the coherent or incoherent logical noise strength; by this we mean the sum of norms squared of the offdiagonal or diagonal components of the chi matrix for the logical noise channel. We are interested in the coherence of the logical noise channel, that is, the relative magnitude of the coherent and incoherent logical noise strength. Our approach will be to fix the coherent logical noise strength and calculate how the incoherent logical noise strength varies as we change rotation angles while remaining in the submanifold with constant coherent logical noise strength.
We begin at a point where all single-qubit rotation angles are equal. Suppose that this rotation angle is > 0. The proof will be similar if the angle is < 0. We will perturb away from this point, moving along the submanifold with fixed coherent logical noise strength. These perturbations can be built out of small elementary steps, in which two qubits, i and j, are selected. We require that θ i ≥ θ j . Then, the elementary step consists of increasing θ i by some amount and decreasing θ j such that we remain on the submanifold with constant coherent logical noise strength. We will prove that such elementary steps increase the incoherent logical noise strength. Therefore, we will conclude that the coherence of the logical noise is maximized when all single-qubit rotation angles are equal. Our calculation will be limited to configurations of angles not too far from the point where all angles are equal.
In Lemmas 3 and 5, we proved that when all the rotation angles are equal and satisfy | sin θ| < 1/L, the logical noise is dominated by the contributions of the low-weight connected terms. We bounded the absolute magnitude of the sum over high-weight connected terms. These high-weight terms were negligible relative to the low-weight connected terms in the incoherent logical noise. If the rotation angles are allowed to differ, so long as all the angles θ i satisfy | sin θ i | < 1/L, our upper bound on the absolute magnitude of the error from the high-weight terms continues to hold. We require that this error is negligible relative to the low-weight terms we keep in the incoherent logical noise components. This was true when all angles are equal and will continue to be true for a wide range of configurations; only certain edge cases will violate this condition. For instance, one such edge case arises if all the rotation angles are 0 except for the qubits along a long logical string with a shape such that it contains no low-weight uncorrectable subsets.
We previously defined the connected and disconnected parts of a noise term (Definitions 2, 3, 8, and 9). As we described in Section 6.7 and Lemmas 11 and 12, the disconnected part has a value of 1 up to corrections. These corrections are small for low-weight connected terms when all rotation angles are equal. If the angles are different, we can still apply our analysis, so long as the absolute error from the corrections is small relative to the low-weight connected terms in the incoherent logical noise components. This holds in a region around the point where all angles are equal. Hence, in this proof we will compare only the connected terms in the coherent and incoherent logical noise components with the understanding that the error terms we are neglecting are small relative to the low-weight connected terms we have kept.
We can build any general perturbation out of an elementary (non-infinitesimal) perturbation where we increase one rotation angle θ i and decrease a second angle, θ j , such that the connected contribution to the coherent logical noise strength is unchanged. The perturbation will look different depending on how the two qubits are positioned. If the qubits i and j are adjacent to each other and aligned in the correct direction, they will appear together in many short logical strings. Otherwise, i and j will not appear together in short logical strings. Throughout this section, we will approximate sin θ i /2 ≈ θ i /2 to simplify the equations. We will incur a relative error of θ 2 i /4, that will always be small, since we have assumed that | sin θ i | < 1/L for every i. Then the contribution to the coherent logical noise strength from the low-weight connected terms as a function of θ i and θ j is The coherent logical noise strength is a sum of norms squared and is therefore positive. This implies that γ 0 > 0. Moreover, the sum over partitions has the same phase for every short logical string as in Eq.(191). This means that each logical string makes a positive contribution to the noise strength. Therefore, γ 1 and γ 2 are both non-negative. The relative size of γ 1 and γ 2 depends on how close the two chosen qubits i and j are. When i and j are both along the same horizontal or vertical line, many low-weight logical strings will contain both qubits. These strings contribute to γ 2 , so that the γ 2 term may be comparable to the γ 1 term. On the other hand, if qubits i and j are not along a horizontal or vertical line, then none of the minimal-weight logical strings contain both qubits. Also, for any fixed length ≤ L + 2k, the number of logical strings of length that contain both qubits i and j is negligible relative to the number of length logical strings that contain qubit i and not qubit j. In this case, the γ 2 term is negligible relative to the γ 1 term. In either case, we can write down the perturbation that leaves the coherent logical noise strength unchanged. Let θ i = c i θ and θ j = c j θ for some θ, and then we will solve for c j such that the connected coherent sum is constant. This yields so that when γ 2 = 0 we have c j = 2 − c i . We can expand the incoherent logical noise strength in the same way. The noise terms that enter into the incoherent logical noise have the form (O U ρ O U ). As we expand in the angles θ i and θ j , we have cases where the qubits i and j are contained in neither, one of, or both O U and O U : We see that the δ 1 term is positive for all c i > 1. We will show that the positive terms are larger than the other terms for all c i > 1. Each term has the same denominator and contains a factor of (c i − 1) 2 in the numerator. This means immediately that the first derivative with respect to c i vanishes at the point c i = 1. We pull out the shared factors of (c i −1) 2 (γ 1 +c i γ 2 θ) 2 and rearrange terms in Eq.(219): If the conditions, are satisfied, then each line of Eq.(220) is greater than 0. Recall that γ 1 , γ 2 , δ 1 , and δ 2 are non-negative. Each elementary perturbation increases the value of c i . Therefore, if the conditions in Eq.(221) are satisfied, then each elementary step along the submanifold with constant coherent noise strength increases the incoherent logical noise strength. It remains for us to argue that these conditions are satisfied when the rotation angles are close to equal. Consider two cases for the relative positions of qubits i and j. In the first case, suppose qubits i and j are positioned so that no short logical strings contain both. Then the strings that contribute to γ 2 as well as δ 2 , δ 4 , and δ 5 are all long. Such strings do not appear in the sum over low-weight connected terms, so γ 2 = 0 in Eq.(220). Therefore, the only condition is the first line of Eq.(221). In this inequality δ 2 , δ 4 and δ 5 are 0, and the condition is satisfied.
In the other case, qubits i and j are in a horizontal or vertical line so that both qubits are contained in several short logical strings. In this case θγ 2 is comparable to γ 1 . Now consider the incoherent contribution from the strings that contain both qubits i and j. For each weight-2w − 1 logical string containing both qubits i and j, the errors O U are weight-w subsets of the logical string. Nearly half will contain exactly one of qubits i and j and one quarter will contain both qubits i and j. This means that more terms contribute to θ 2 δ 1 and θδ 3 than to θ 4 δ 2 , θ 2 δ 4 , and θ 3 δ 5 . In particular, θ 2 δ 1 > 2θ 4 δ 2 . For each O U , the set {O U } contains operators with the same syndrome, logical action and weight. O U differs from O U only near certain transverse steps along the logical string as we described in Section 6.9. For most logical strings, if O U does not contain qubit i, then most of the operators O U also will not. If O U contains qubit i, most operators O U will as well. This implies that Together with our earlier statement that θ 2 δ 1 > 2θ 4 δ 2 , this implies that even when i and j are near to each other, the conditions in Eq.(221) are satisfied. We have proven that each of the elementary perturbations starting from uniform angles increases the incoherent logical noise strength. However, we must also consider elementary perturbations that are applied after a different elementary perturbation has taken us away from uniform angles. In that case, in Eq.(216) and Eq.(218) we would no longer have exact symmetry between θ i and θ j . In other words, Eq.(216) would read where γ are different coefficients. However, as long as we are not too far from uniform angles, γ 1 , and the change in the incoherent logical noise strength will be almost the same as in Eq.(219). We have argued that the conditions in Eq.(221) are satisfied by a large margin. Therefore, there exists a region around the point with uniform angles where every elementary perturbation on the submanifold with fixed coherent logical noise strength increases the incoherent logical noise strength. We conclude that in a region around the symmetric point where every single-qubit rotation is by the same angle θ, this symmetric point gives the largest coherent logical noise strength relative to the incoherent logical noise strength. This means that the connected contribution to the incoherent logical noise strength has a local minimum at the point with uniform rotations within the submanifold with constant connected contribution to the coherent logical noise strength. This implies that the upper bound on logical coherence we derive in Theorem 3 for the case where all angles are equal also upper bounds the logical coherence in a region around the point where all angles are equal. Now consider changing the axes of rotation while keeping the total rotation angle the same. Let the noise model be a rotation in the X-Z plane such that the rotation angles are θ X and θ Z for every qubit. We show that Y rotations on the qubits will produce less coherent logical noise in Lemma 13. The X and Z rotations contribute to independent components of the logical noise channel. Then, each noise term that enters into the X-type logical noise components depend on at least L powers of θ X . Similarly, the Z-type logical noise depends on at least L powers of θ Z . Therefore, if the total rotation angle for each qubit, θ 2 X + θ 2 Z , is fixed, the logical noise strength is greatest when either θ X or θ Z is 0. Also, because the X and Z-type errors contribute to different logical noise components, we can apply our analysis of coherent and incoherent logical noise components to the two types of errors separately. If both | sin θ X | and | sin θ Z | are < 1/L, then Lemmas 3 and 5 imply that the logical coherent and incoherent X and Z noise is dominated by the contributions of short logical strings. With this noise model, we will in general expect logical Y noise, but Lemma 14 implies that logical Y -type errors are negligible relative to X and Z-type errors. The θ X rotations contribute to the X 1 and X 2 logical noise, while the θ Z rotations contribute to the Z 1 and Z 2 logical noise. The bounds we proved on coherent and incoherent logical noise components apply equally well to the X and Z-type noise separately in this noise model. Lemma 8 states that among noise models consisting of single-qubit rotations where each rotation is close to the same, the coherence of the logical channel is greatest for the noise model consisting of Z-axis rotations on every qubit by the same angle. The same does not necessarily hold for noise models consisting of single-qubit rotations with wildly different angles of rotation on each qubit. This is not surprising because if we allow for wildly different rotation angles, we encounter the case where all the rotation angles are 0 except for the qubits along some very long logical string. This kind of high-weight connected term is beyond the scope of the present work, cf. Lemmas 3 and 5.

Correlations
We can apply Theorem 2 to study the toric code with minimal-weight decoding subject to correlated unitary noise. In the repetition code, we found that adding two-body correlations did not change the relation between coherent and incoherent components of the logical noise when the code size n is large. We can transfer that to the toric code using what we have already proven. Consider a single logical string. We can sum over its partitions. With correlated unitary noise, instead of sin θ/2 cos θ/2 for each qubit in the logical string, we have a sum over the one and two-body couplings in the Hamiltonian, h 1 and h 2 . The model of correlated noise that we considered in Theorem 2 included two-body coupling terms between every pair of qubits in the code. Therefore, the magnitude of each multi-qubit error is a function only of its weight. We found that the coherent and incoherent logical noise in the toric code is dominated by the contributions from short logical strings with typical shape. In Theorem 3 we will finish proving a relation between the coherent and incoherent terms that is based on the number of terms and their magnitudes, which we always assumed to be sin θ/2 raised to the weight of the terms. In the correlated case we alter the magnitude of each term, but Theorem 2 tells us that string by string we can bound the coherent logical noise contribution in terms of the incoherent logical noise contribution.

Main Theorem
Theorem 3. Consider the L×L toric code without boundaries subject to single-qubit unitary noise acting on each qubit. We chose minimal-weight decoding and assume that syndrome extraction is perfect. Suppose that each qubit is rotated by an angle θ about some axis and that | sin θ| < 1/L. Our conclusion will also hold for angles and axes that differ among the qubits, so long as the deviation is small as discussed in Lemma 8. LetÑ be the logical noise channel produced by encoding into the toric code, acting with single-qubit unitary noise, and then decoding. Denote byχ the chi matrix for the logical noise channelÑ . Then, the coherent and incoherent components of the logical channel are related by where and k is an arbitrary L-independent constant. We denote the diamond norm distance ofÑ from the identity channel by D ♦ (Ñ ). It follows that for a constant c given by where d L is the dimension of the code space (d L = 4 for the 2D toric code without boundaries).
Here r is the average infidelity of the logical noise channelÑ , and E is the error term bounded in Eq.(225). (If the logical noise channelÑ were unitary, then D ♦ (Ñ ) would be proportional to √ r.) We can also consider the growth of the average infidelity as we apply the logical noise channel many times in succession. Let r m be the average infidelity after m applications of N ; then using Eq.(224), we can write where d L = 4 for the 2D toric code without boundaries, E is the error term that is upper bounded in Eq.(225), and r is the average infidelity for a single application of the logical noise channel. As long as the physical noise strength is below the fault-tolerant threshold, r will be exponentially small in the code distance L. Therefore, Eq.(228) states that the term growing quadratically in m is exponentially small in L relative to the term growing linearly with m. In this sense, the coherence of the logical channel is heavily suppressed.
Proof. We start with a noise model consisting of Z rotations by angle θ on every qubit in the L × L block of toric code. We seek to approximate the coherent logical noise component χ Z 1 I and relate it to the incoherent logical noise componentχ Z 1 Z 1 . First, let us calculate the coherent component. We writeχ Z 1 I as a sum over strings and partitions with a connected and disconnected part as in Eq.(171). Applying Lemma 3 to the connected part, we neglect high-weight terms, leaving only the logical strings with length ≤ L + 2k for a fixed k, and making an error which is exponentially small in k. For this step, the magnitudes of the sines of the single-qubit rotation angles are required to be below a threshold value 1/L. We apply Lemma 11 to argue that disconnected part is equal to 1 up to a small correction. Lemmas 9 and 10 tell us that we can treat all short logical strings L as typical and make another small error. Now that we have only short typical logical strings, we apply Lemma 4 to perform the sum over partitions. We conclude that where the sum is over all connected logical Z 1 strings L with length such that ≤ L + 2k and α = (1 − L sin θ) −1 . The error terms come from Lemmas 10, 3 and 11. A further error due to neglecting exceptional partitions is subdominant according to Lemma 4, and is not shown in Eq.(230). Now, we will lower bound the incoherent logical incoherent noise componentχ Z 1 Z 1 . Lemma 5 implies that we can neglect the contributions of noise terms (O U ρO U ) such that |O U | + |O U | > L + 2k + 1. The error we make by truncating the sum is exponentially small in k, so long as the rotation angle θ satisfies |θ| < 1/L. The incoherent logical noise component can be put in the form of a sum over logical strings as in Eq.(205). Using Lemma 7, we restrict to the connected terms where the logical string L is short and has typical shape and |O U | = |O U |. We can also keep only the terms with |O U | = ( + 1)/2, because just as in the repetition code, these higher-weight partitions are suppressed by factors of sin θ 2 and the binomial coefficients are decreasing as we consider higher-weight O U . The disconnected part is equal to 1 up to small correction according to Lemma 12. Finally, Lemma 6 gives a lower bound on the contribution of each logical string to the incoherent logical noise. All together, we have the following lower bound onχ Z 1 Z 1 : The error terms come from Lemmas 10, 5, and 12. A subdominant error term from Lemma 4 is suppressed in Eq.(230). Putting together Eq.(229) and Eq.(230), we conclude the following about how the coherent and incoherent terms in the logical chi matrix are related: where We used to arrive at Eq.(231).
We have restricted our attention to the coherent logical component (L aρ ) and the corresponding incoherent component (L aρLa ), where L a was either X or Z on one of the two encoded qubits. We did this because these are the largest components in the logical noise. This is proven in appendices H and I. In Lemma 14 we prove that we can neglect the logical noise components (L aρ ) where L a is a logical Y or a non-trivial operator on both encoded qubits. In Lemma 15, we prove that we can neglect coherent terms of the form (L aρLb ) with a = b. Using these results, we can bound the sum of all coherent logical terms relative to the sum of all incoherent logical terms. There are two off-diagonal terms for each diagonal term, e.g.χ Z 1 I andχ IZ 1 are matched withχ Z 1 Z 1 , so we have We have proven Eq.(224). The term on the right hand side is proportional to the infidelity by Eq.(295). Going back to Lemma 1 and Eq.(43) we can write where d L is the dimension of the physical Hilbert space, which is 4 for the toric code without boundaries. We can combine this with Eq.(234).
Finally, we can use Lemma 16 with Eq.(234) to derive Eq.(226). So far we have considered a noise model consisting of the same Z rotation by angle θ on every qubit in the code block. We can use Lemma 8 and Lemma 13 to prove that this noise channel produces maximally coherent logical noise in a region around uniform rotations. The single-qubit rotation angles are allowed to differ so long as the deviation is not too great. Therefore, the relation we found between coherent and incoherent logical noise components for the Z rotation noise model bounds the coherence of the logical noise channel for small rotations about any axis, so long as the rotations are close to uniform across the qubits.
There are some subtleties in the interpretation of Theorem 3. We address these in the next subsection, but first we will make a remark about the error bound in Eq.(225). This error bound is satisfactory for finite code size, L; however, we will need make a small modification before the bound is suitable for the L → ∞ limit. This is because the term O (L sin θ) 2k in Eq.(225) contains a factor polynomial in L. If the single qubit rotation angle θ satisfies | sin θ| ∝ 1/L, then this polynomial factor would make the truncation error large as L → ∞. The polynomial factor comes partly from the fact that the truncation error in Eq.(229) has a factor of 2 L relative to the factor of L L+1 2 that appears in the lowestweight incoherent noise terms. The ratio is proportional to √ L. The other contribution to this polynomial factor comes from the path counting in Lemma 5, where we neglected a factor polynomial in w + w in Eq.(201). We can cancel the polynomial factor by slightly modifying our truncation procedure. Denote this polynomial factor p(L). Instead of neglecting noise terms with weight > L + 2k in the coherent logical noise components, we neglect noise terms with weight > L + 2 k log(L) for a constant k . We perform a similar truncation for the incoherent logical noise components. Then, we can choose k large enough that (L sin θ) 2k p(L) is decreasing with L. The minimum value for k such that this is decreasing depends on the degree of p(L) and the magnitude of L sin θ. If k is greater than this minimum value, then (L sin θ) 2k p(L) is bounded above by a|L sin θ| λk log(L) , where a is a constant that is determined by the coefficients in the polynomial p(L), and λ is a constant that depends on k , the degree of p(L), and the magnitude of L sin θ. Our new truncation rule slightly alters the other error terms in Eq.(232). The new error term is where a, k , and λ are constants. a and λ are determined as we described above. We are free to choose k , so long it is greater than a minimum value. Now, if we take the limit L → ∞, we find that the error term in Eq.(237) remains small. Therefore, we may apply Theorem 3 in the limit of L → ∞ with Eq.(237) replacing Eq.(225).

Interpreting Bounds on Coherence
We have proved a relation between the diagonal and off-diagonal components of the chi matrix of the logical noise channel. The interpretation is a bit subtle, so it is worth commenting on here. We upper bounded the off-diagonal components by 1/| sin θ| times the diagonal components, and we were forced to assume that | sin θ| < 1/L because our analysis only applies to logical strings with length ≤ L + 2k where k is a constant. With this assumption, the factor of 1/| sin θ| implies that the coherent component of the logical channel may be L times larger than the incoherent component. This might seem to indicate that the coherence of the logical channel is not suppressed for large L, but that is not the best way to think about the comparison. In Eq.(228), the term quadratic in m has a coefficient proportional to r/ (sin θ) 2 relative to the linear term. But the average infidelity r is exponentially small in L. Thus the coefficient of the quadratic term is really exponentially smaller in L relative to the coefficient of the linear term. In Eq.(226), the constant c 2 is not really a constant, since it scales like L 2 if θ scales like 1/L. The point is that if the logical noise channel were fully coherent, i.e. unitary, then c would scale like 1/ √ r, but we find that 1/ √ r scales like L L/2 , which is vastly greater than L 2 . We conclude that, although the logical noise channel is not exactly incoherent, it is quite close to an incoherent channel as measured by our statements about the growth of average infidelity and the relation between diamond distance and average infidelity.
We could also consider writing our logical noise channel as a product of a unitary rotation and a Pauli channel. We can solve for the single parameter in each of these two channels. In the limit of low logical noise strength, the angle of rotation of the unitary channel approximately equals one of the off-diagonal chi matrix elements, and the probability of error in the Pauli channel is comparable to one of the diagonal components of the chi matrix. Theorem 3 implies that the logical channel can be written as a product of a unitary channel and a Pauli channel where the angle of rotation of the unitary is larger than the error probability of the Pauli channel by a factor which is approximately 1/| sin θ|, and therefore enhanced by a factor of L if θ scales like 1/L. Again, this might make it seem like the coherence is not suppressed; however, the coherent channel makes a contribution to the average infidelity proportional to the rotation angle squared. This is why we find that the growth of average infidelity becomes nearly linear in m as the code size L increases. As the code block becomes large, the diamond distance for the logical noise channel is much smaller than what one would expect for a coherent channel based on the value of the average infidelity r. This is another way of making the same point as in the previous paragraph.
One might wonder whether a tighter upper bound than Theorem 3 can be derived on the strength of the coherent part of the logical channel relative to the incoherent part. In fact, a substantially tighter upper bound is not possible, if we want this bound to hold for arbitrary small rotation angles. For instance, we could choose to set every rotation angle equal to zero except for the qubits along a single length L logical string. For this case, the computation of the logical channel is similar to our computation for the repetition code, where we were able to compute the logical channel quite precisely. Alternatively, for a fixed code size we could choose sufficiently small uniform rotation angles θ such that the lowest-weight terms dominate in the logical noise. In this case the computation of the logical channel is again similar to that of the repetition code. Since the bound we proved for the toric code nearly matches what we found for the repetition code, we know that our result is optimal in this special case. Of course, for some other particular set of single-qubit rotations, the logical noise channel may be less coherent than our upper bound predicts.

Conclusions
We have studied characterizations of coherence in quantum channels. One useful method for diagnosing the coherence of a channel N is to consider applying N m times in succession, and to investigate how the average infidelity r of the composite channel N m increases with m. For incoherent channels r is linear in m, while for highly coherent channels it can grow quadratically with m. Another useful diagnostic is provided by the relationship between r and D ♦ (N ), the distance between N and identity channel as measured by the diamond norm. For incoherent channels this distance scales linearly with r, while for highly coherent channels it scales like √ r. Using these criteria we have investigated the coherence properties of logical channels. To define a logical channel, we choose a particular quantum error-correcting code and decoding method; then we consider encoding an initial input state, subjecting the physical qubits to a noise model, and finally applying the decoder to obtain the channel's output. Our main conclusion is that, for the code families we examined, even if the physical noise model is highly coherent, the coherence of the logical channel is heavily suppressed in the limit of a large code block.
For the case of the quantum repetition code, we can compute the logical channel precisely, and verify that the logical channel is highly incoherent for large block size. Most of this paper was devoted to the analysis of a more challenging case, the L × L two-dimensional toric code subject to independent unitary noise. Our main conclusion about this case is encompassed by Theorem 3. Regrettably, for the case of the toric code we concluded that the coherence of the logical channel is suppressed only under an unrealistic assumption: that as the size L of the code block increases, the rotation angle θ applied to each qubit scales like 1/L.
Under this assumption, we can estimate the logical channel well enough for our purposes by expanding it to a constant (L-independent) order in θ, and argue that the higher-order terms we ignore make a contribution that can be safely neglected. A key step in our argument is the observation, backed up by Lemmas 9 and 10, that, for L| sin θ| < 1, the logical channel is dominated by logical strings with an easily characterized typical shape. For the logical strings of this typical shape, Lemmas 4, 7, 11, and 12 provide a sufficiently accurate estimate of the logical channel to prove Theorem 3.
Our main conclusion, that the coherence of the logical coherence is heavily suppressed, applies to unitary physical noise such that each qubit is rotated independently, even if the rotation axis and rotation angle vary from qubit to qubit, as long as the rotations are close to the same and sufficiently small. It also applies for some highly correlated noise models. The result also extends to physical noise channels which are convex combinations of unitary channels, or convex combinations of unitary channels and depolarizing channels. (Depolarizing physical noise is mapped to an incoherent depolarizing logical channel under error correction.) We emphasize that our result is an asymptotic statement in the limit of large code size L, albeit under the assumption that the noise strength scales like 1/L. For codes of fixed size our results may not be tight; the coherence of logical channels for finite code blocks has been studied elsewhere [17,11,13]. Our goal was to study a family of codes with an accuracy threshold instead. When the noise is below threshold, the logical channel approaches the identity as the code block increases in size. In addition, under conditions where Theorem 3 applies, the coherent component of the logical channel vanishes much more rapidly than the incoherent component.
It is reasonable to expect that our conclusion -that the logical channel becomes increasingly incoherent as L grows -continues to hold even if we allow L to increase while the rotation angle θ has a fixed constant value. But proving this will be challenging. For one thing, if θ is a constant we cannot accurately estimate the logical channel by expanding to a constant order in θ. Instead, logical strings with length ≤ L(1 + β) need to be included for some constant β. These logical strings are not easy to count. A logical string can be regarded as a self-avoiding walk on the square lattice whose endpoints are a distance L apart, but previously derived upper bounds on the number of self-avoiding walks with specified length [29,30,31] do not treat the case where the distance between the endpoints differs from the length of the walk by an O(1) multiplicative factor. And even if we could count the logical strings accurately, we would still need to overcome some additional obstacles to prove the coherence of the logical channel is suppressed.
First, to prove Theorem 3, we disposed of the "exceptional" terms (Definition 7), those in which the uncorrectable error on a logical string has lower weight than the correctable error, by arguing that these terms are sufficiently rare as to make a negligible contribution to the coherent part of the logical channel. But for logical strings with length L(1 + β), exceptional terms will be far more common.
Second, when we calculated the contribution to the coherent or incoherent logical noise, we separated the computation into a sum over a connected part and a disconnected part, and argued in Lemmas 11 and 12 that the disconnected part contributes a multiplicative factor close to 1. But the proofs of these lemmas required the logical strings to be short, of length L + 2k for constant k; these proofs don't apply for longer logical strings of length L(1 + β).
Third, and even more dauntingly, our proof of Theorem 3 made use of a relationship between the physical noise terms that contribute to the coherent and incoherent logical noise. But as the logical string length increases, the contributions to the coherent and incoherent component of the logical channel become less and less alike. Each contribution to the coherent logical noise is associated with a logical string. In contrast, each contribution to the incoherent logical noise is associated with a pair of logical strings; these strings have segments in common, but they fluctuate relative to one another apart from those shared segments. For short logical strings, these fluctuations are relatively mild, and did not prevent us from relating the incoherent and coherent logical noise, as described in Section 6.9. For longer logical strings, the combinatorics become much harder to handle.
Unable to overcome these obstacles ourselves, we settled for proving a weaker result that applies for L| sin θ| < 1 rather than constant θ. Perhaps a more ambitious combinatoric analysis can push the proof through even for constant θ. Or perhaps a completely different approach will be more successful. Conceivably, it's not true that the coherence of the logical channel becomes heavily suppressed for large L and sufficiently small constant θ, though we consider that possibility unlikely.
Further numerical studies of logical coherence may also prove to be instructive. The problem has already been studied numerically [17,18,16,11]; however, our methods for organizing the estimate of the logical channel suggest different approaches to numerically simulating the logical channel. Numerics could help to resolve the issues that prevented us from extending Theorem 3 to the case where θ is an L-independent constant.
In our analysis of the toric code subject to single-qubit unitary rotations, we used minimal-weight decoding because it can be systematically analyzed. However, we don't expect our conclusion about suppression of logical coherence to be very sensitive to the choice of decoding method. The suppression arises from averaging over many error syndromes, and therefore should occur for other families of stabilizer codes with good decoders. Many of the elements from which we built the proof of Theorem 3 can be applied to more general stabilizer codes, including "logical strings," partitions, exceptional terms, and the decomposition into connected and disconnected parts.
It has long been suspected that error correction suppresses the coherence of noise. Such suppression had been observed numerically for the toric code [17], but no rigorous argument supporting this conclusion had been previously known for any code family with an accuracy threshold. Our goal in this project was to prove that, for the toric code subject to sufficiently weak independent or weakly correlated unitary noise, the logical channel after decoding is highly incoherent in the limit of a large code block. We fell short of this goal, settling for a proof that coherence is suppressed in the case where the noise strength decreases as the code block grows. Nevertheless, we hope and expect that the tools we have developed will prove to be useful in future studies of quantum error correction.

A Chi matrix and Pauli transfer matrix for qubits
Here we verify Lemma 1 for qubits by expressing all non-diagonal terms in N kl in terms of χ ij explicitly: When we collect all the terms in a =b N 2 ab which are quadratic in {χ XY , χ Y X }, we obtain using χ ij = χ * ji , as required by complete positivity. The same applies to the terms involving χ IX , χ IY , χ IZ , χ ZX , χ Y Z , and their complex conjugates.
To prove the claim we must verify that the linear terms cancel. This can be shown using the general argument in Lemma 1, but in the qubit case it may be easier to verify the cancellation explicitly. For example, the contributions to N ab involving χ IX , χ XI , χ Y Z , χ ZY are and we therefore see that the cross terms cancel in N 2 IX + N 2 XI and in N 2 Y Z + N 2 ZY . Similar cancellations occur for all other cross terms.

B Approximating sums
We wish to evaluate the sum in Eq.(86): where p = s 2 = sin 2 θ/2, and (1−p) = c 2 = cos 2 θ/2. Note that P n (p) is the probability of a decoding error for the n-bit repetition code subject to independent noise with bit-flip probability p. It is convenient to redefine the summation index obtaining From the Stirling approximation, we have neglecting a multiplicative (1 + O(1/n)) correction. Making another (1 + O(1/n)) multiplicative error, we may replace the exponential inside the sum over r by 1, obtaining and we also make a negligible error (assuming p < 1 2 ) by extending the upper limit on the sum to infinity, finding We conclude that assuming p < 1 2 . Using we find C Correlated Noise: Leading behavior for large n Here we'll describe an alternative way of understanding Eq.(163), where we saw that the coefficient of h q 2 in the logical channel is O(m 3q/2 ). This leading behavior results from cancellations of terms higher order in m which occur when we perform the sum over k R in Eq.(160). What is the explanation for these cancellations?
In Section 5 we calculated the coherent and incoherent logical components for the bit-flip code of size n subject to correlated unitary rotations given by a Hamiltonian of the form: We expressed the logical coherent componentχ XI and the logical incoherent componentχ XX in terms of functions Ω and ∆ such that and only even values of q contribute. Here k R is the number of times the Hamiltonian term ih 2 XX acts on the density operator from the right, and k L = q − k R is the number of times −ih 2 XX acts from the left. We were able to compute Ω and ∆ by counting the ways of decomposing each physical noise term into combinations of one and two-body Hamiltonian terms. Repeating equations (148) and (151), we found and Let's evaluate the sum over k R to leading order in 1/m in both Eq.(252) and (253). We focus on the second factor in each equation, which contains all of the k R dependence. In Eq.(252) this factor is The dominant term for m large is given by Then when we sum over k R we have where we have made use of the identity Here P c (b) denotes any polynomial in b of degree c < a. The situation for ∆ is the same except that m is replaced by m + 1. Therefore, the leading term for m large in equations (252) and (253) in which the prefactor multiplying q k R is a second-degree polynomial in k R , so that Eq.(257) implies that the sum over k R vanishes if q > 2. Likewise, the coefficient of m 2q−r is a polynomial in k R of degree of 2r, and the sum over k R vanishes if 2r < q. Recalling that only even q contribute, we see that the leading term that survives the summation over k R has r = q/2 and is therefore order m 2q−r = m 3q/2 . We have now seen why terms higher order in m cancel. The term of order m 3q/2 can be evaluated using the identity a b=0 The identities Eq.(257) and (259) can be derived by performing the binomial expansion of (1 + x) a , differentiating repeatedly, and then setting x = −1.

D The Shape of the Logical String
In this appendix, we prove that among short logical strings nearly all have typical shape as in Definition 6.
Lemma 9. In a size L toric code, all but order 1/L of the logical strings running left to right across the code with length ≤ L + 2k consist of single steps up and down, so that no vertical segment is longer than one qubit.
Proof. If the size of the code is L and we consider all length L + 2k logical strings for fixed k, we will count the number of strings that satisfy the condition that each step up or down is only length one. First, we start with a horizontal logical string of length L and then pick k sites along it. We have k upward steps and k downward steps, and we need to fix an ordering. Alternatively, we could think of choosing k sites for the upward steps and another k sites for the downward steps. In total the number of strings of this type is number of strings with steps of one = L 2k Next, we will count the total number of strings that consist of no backwards steps, that is starting from the left of the code block the strings move only right, up, and down. These strings potentially contain upward and downward steps of more than one. In general, such a string involves q 1 distinct steps up with k total length and q 2 steps down also totaling k in length. The number of ways of writing k as a sum of q 1 terms, not ignoring order, is given by the number of compositions of the integer k into q 1 terms, which is k−1 q 1 −1 . Each of the q 1 steps up and q 2 steps down can be placed independently. This gives us L q 1 L−q 1 q2 combinations of possible configurations. In total we have number of strings with q 1 steps up and q 2 steps down = k − 1 (262) such strings. When q 1 = q 2 = k, we recover the case where each step up or down is by one lattice site. Then, we can isolate the L dependence in Eq.(262): We can compare this to Eq.(261), and we see that there are fewer paths with steps larger than one. The ratio is proportional to number with q 1 steps up and q 2 steps down number with k single steps up and down = O(L q 1 +q 2 −2k ).
Then, if we count the paths with a single step of two and the other steps are all one, there are order 1/L of these relative to the number of paths with single steps up and down. We must also count the the number of logical strings where the string backtracks on itself. There are even fewer of these than the strings with jumps up or down by two. Each Qubits L Figure 13: The logical string L has back-tracking. Among short logical strings, those with back-tracking are unlikely relative to strings without.
string with backtracking can be produced from a string with a jump up or down of at least two lattice spacings. We add some additional cap onto the vertical segment of at least length 2. The number of strings with one instance of backtracking like Fig. 13 will be proportional to the number of strings of length two shorter that also have at least one step up or down of more than one. For this reason, strings like the one in Fig. 13 are an exponentially smaller minority that the strings with steps up and down of more than one. Then, we conclude that nearly all short logical strings spanning the code left to right consist of steps up and down by only one qubit.
Lemma 10. For the class of length L + 2k strings described in Lemma 9 (those with exactly k steps up and k steps down as they span the code block from left to right), for large L nearly all will have spacings between the steps growing proportional to √ L. We choose a small constant γ and define typical strings as those for which all vertical steps are separated by at least γ √ L. If we fix the length of logical strings and combine this lemma with Lemma 9, we can make the following statement about the fraction of strings of that length that have atypical shape: Number of strings with atypical shape Total number of strings Proof. The total number of strings of the type in Lemma 9 is L 2k 2k k . Now let us compute the number of strings such that each step up or down is separated from others by γ √ L for some constant γ. We can lower bound the number by starting with a length L string running left to right across the code and placing our steps up and down. We suppose that each step we place prohibits placing another step on a further 2γ √ L of the sites. This is a lower bound because in the true answer these intervals will sometimes overlap. The lower bound is Compared to the total, n − i has been replaced by n − i(2γ + 1) for each i, so that when γ = 0 we recover the total number of strings. In general, the ratio of this limited set to the total for fixed k and γ is given by number of length L + 2k strings with widely separated steps number of length L + 2k strings ≈ We can lower bound this by This approaches 1 as L increases, and we see that with high probability a short logical string will have the property that the steps up and down are separated by more than γ √ L, as L becomes large.

E Disconnected Errors
Fix a coherent logical noise component and consider the sum in Eq.(171). In Section 6.7 we argued that the disconnected term is 1 for disconnected errors that do not change how a given connected term is decoded. This allows us to write the sum as The sum over L includes all typical short connected logical strings. P(L) is the set of likely partitions of connected logical string L. This excludes the partitions we called "exceptional terms" in Definition 7 and Lemma 4. "Error" contains all the terms we have neglected. This includes the contribution of long logical strings, short logical strings with atypical shape, and exceptional terms. It also includes the terms with disconnected pieces that we did not consider in Section 6.7. These are all of the terms where the disconnected errors flip the way the partition is decoded, where we start with a partition and after adding disconnected errors to each side, the error that was originally uncorrectable becomes correctable and vice versa. These terms will not follow the analysis we did in Section 6.7. We will describe these terms now and show that they are negligible in the following lemma.
Lemma 11. In Eq.(269) the error from the neglected terms E can be expressed where E 1 contains the contributions that we have already proven are negligible-long connected logical strings, logical strings with an atypical shape, and exceptional partitions. E 2 contains the contributions from terms where the disconnected errors have flipped the way the partition is decoded. These are the terms we neglected in Section 6.7. The following is true: Proof. We start with a typical short connected logical string and take a partition into a correctable operator and an uncorrectable operator, denoted (O U ρ O C ). Now we add disconnected errors, D L and D R to the left and right side of the partition. In some cases the uncorrectable error may become correctable and vice versa. That is, O U D L will be correctable, while O C D R is uncorrectable. For example, the term that contributes to theχ Z 1 I component of the logical noise might be (O C D R ρ O U D L ). Our treatment of the disconnected part in Section 6.7 assumed that the added errors did not flip the correctable and uncorrectable sides of the original partition. In this section we will justify this assumption by proving that such terms are negligible. First, we must understand the conditions when an added error will turn the uncorrectable side of a partition into a correctable error. O U is the uncorrectable side of the partition, so the minimal-weight correction to O U is equal to O C up to stabilizers. For O U D L to be correctable requires that the minimal-weight correction is equal to O U D L up to stabilizers and not equal to O C D L up to stabilizers. Note that we write D L and not D R because D L and D R have the same syndrome, so as far as the decoder is concerned they are equivalent. This implies min where S x is an element of S, which denotes the stabilizer group, and |·| denotes the weight of a Pauli operator. The weight of the minimal-weight operator equivalent up to stabilizers to O C D L is no greater than the sum of the weights of the minimal-weight operators equivalent up to stabilizers to O C and D L individually. We can continue: (273) We conclude that the added error must be such that the minimal-weight correction of O U D L is less than the minimal-weight correction of O U plus the minimal-weight correction of D L . This happens when the disconnected error D L lies near O U such that the minimal-weight decoder will tend to form a loop out of parts of D L and O U . This is possible only in cases like the one in Fig. 14.
The condition in Eq.(273) requires a special combination of disconnected error and original partition. This is possible for both coherent-and incoherent-type disconnected errors as defined in Section 6.7. Let us consider incoherent-type disconnected errors first. This is what is illustrated in 14. The disconnected error causes the uncorrectable side of the partition to become correctable when D L contains at least two errors in a row adjacent to O U . Based on our condition, we observe that the number of added errors that flip the partition is greatest for the lowest-weight partitions. These terms require the fewest added errors to flip. We also see that the number of these added errors increases with the length of the logical string. A longer string has more adjacent qubits. This implies that the value of the disconnected part is decreasing with string length. This fact was used in Lemma 3.
We seek to prove that terms like the one in Fig. 14 are negligible in the coherent logical noise components. We will do this by mapping each combination of a partition of a connected logical string and a set of disconnected errors such that uncorrectable and correctable errors in the partition are flipped to a partition of a longer connected logical string. There exists a unique stabilizer operator that will multiply the starting partition plus disconnected errors and produce a partition of a longer logical string. This is illustrated in Fig. 15. Our condition in Eq.(273) says that the minimum stabilizer-equivalent operator to O U is lower weight than O U . The stabilizer operator we need to map Fig. 14 to Fig. 15 is the product of O U D L and its minimal-weight correction. The resulting connected logical string is longer than the original connected logical string, but the total weight of the noise term (connected and disconnected) is smaller. This must be true because we have lowered the weight of the errors in blue (O U D L ), and we have not changed the weight of the errors in red (O C D R ).
In our previous analysis of the connected part of the coherent logical noise, we neglected Notice that the connected logical string is longer but the total weight of the term is smaller.
logical strings with length > L + 2k for a cut-off constant k. Then, we neglected short logical strings with atypical shape. Finally, we neglected the unlikely partitions of each string, which we called exceptional terms. In this proof we began with a likely partition of a short, typical connected logical string. We added disconnected errors, and in cases like the one in Fig. 14 where the added errors flipped the partition, we mapped these terms to partitions of a different connected logical string like in Fig. 15. The final part of our proof is to argue that we can neglect this class of terms, where disconnected errors changed how the connected partition is decoded. First, we observed above that the weight of the new connected term produced by our mapping is less than the weight of the original term with disconnected errors. This means that the term in Fig. 14 is suppressed in powers of sin θ/2 relative to the term in Fig. 15. Second, we will argue that the new connected term is one we have already neglected. Recall how we constructed new connected terms like the one in Fig. 15. We took a likely partition of a typical, short, connected logical string and added disconnected errors to it in such a way that we flipped how the partition was decoded. The original uncorrectable side became correctable and vice versa. Then, we multiplied the correctable error O U D L by a particular stabilizer operator to produce a new term that is a partition of a new connected logical string. We make the following observation. One of two things must be true. One is that the new logical string has an atypical shape, specifically if the logical string runs left to right across the code, the steps up and down in the lattice are separated by less than γL. γ was our chosen constant from Lemma 10 that lower bounded the separation between the vertical steps in the logical string. The alternative is that the new connected logical string has a typical shape but the partition we produce is unlikely. If the stabilizer we multiply by in Fig. 15 has a width of at least γL the shape of the new connected logical string may be typical. However, in that case the partition we get for the longer connected logical string has a row of γL qubits all belonging to the blue error. We proved in Lemma 4 that partitions with this feature are an exponentially small (in √ L) fraction of the total partitions. We neglected these partitions in our earlier sum over connected terms. We find that the terms with disconnected errors that flip how the partition is decoded are in one to one correspondence with terms we have already neglected and moreover, the magnitudes of the terms with disconnected errors are smaller by a number of powers of sin θ/2. We conclude that such terms contribute less to the logical noise than the terms we have already neglected.
So far in this proof we dealt with incoherent-type added errors. This was for simplicity, so that we had only one picture in mind. The argument for coherent-type added errors is the same. Coherent-type disconnected errors can also flip the correctable and uncorrectable sides of a partition of a connected logical string. We have already stated the condition when this occurs. The disconnected terms on the uncorrectable side D L must contain a contiguous set of errors near a contiguous set of errors in the uncorrectable part of the partition O U .
We bound the contribution of the disconnected coherent-type errors that flip the correctable and uncorrectable sides of the partition in the same way as we did the incoherenttype. We will use a mapping that takes such a term and produces a partition of a longer connected logical string. The mapping multiplies by a suitable stabilizer operator as depicted in Fig. 17. In this case the new connected term is negligible for the same reasons as in the incoherent-type added error case. The connected logical string produced from the original partition plus the disconnected coherent-type errors either has an atypical shape or the original partition was exponentially unlikely (in √ L). We conclude that the contribution of terms with disconnected errors that flip the partition is negligible in the logical noise.

F The Disconnected Part of the Incoherent Logical Noise
In Appendix E we proved that for the dominant noise terms in the coherent logical noise components, the disconnected part was equal to 1 up to small corrections. We will now prove the same statement for the dominant noise terms in the incoherent logical noise components.
Lemma 12. In Eq.(205) we wrote an incoherent logical noise component as a sum over the contributions from individual logical strings. This included a disconnected factor. Here we prove that we can set the disconnected factor equal to 1 and make only a small error. In other words suppose we writẽ Qubits Qubits Figure 17: This is connected string and partition that corresponds to Fig. 16, with the new uncorrectable error O U in red and the new correctable error O C in blue. We can always multiply right or left hand sides by a stabilizer to produce different coherent terms. This term is produced from Fig. 16 by multiplying the correctable side in blue by the stabilizer operator in gray crosshatching. The connected logical string is longer than the one in Fig. 16 but the total weight of the term is less.
where the sum over L includes all short typical logical strings, O U , |{O C }|, and O U are all as described in Section 6.9, and E 1 is the error we make by neglecting various connected terms, including high weight terms and terms with mismatched weight. In Lemma 5 we proved that E 2 represents the error we make when we set the disconnected factor equal to 1. Then, Proof. We can follow the argument from Section 6.7 and Appendix E. The connected noise terms we considered in those sections had the form (O U ρ O C ). Here, we will consider noise terms with the form (O U ρ O U ). We will imagine arriving at these noise terms in the manner of Section 6.9. Namely, we begin with a short logical string with a typical shape. We partition the logical string into O U and O C . Then we choose an operator O U with the same syndrome as O U . We denoted the set of possible O U by {O U } Now that we have a connected noise term (O U ρ O U ), we can think of dressing it with disconnected errors in exactly the same way as we did in the coherent case. In Section 6.7 we observed that the added errors that make up the disconnected part can be divided into coherent and incoherent-type. The coherent-type added errors are when we add different errors to O U and O U . In this case the errors we add to O U and O U form a loop (with nonzero area). We saw that as long as the loop was positioned such that the added errors did not change how the connected noise term was decoded, the sum over the possible ways of dividing the errors in the loop between O U and O U gave zero. We considered incoherent-type added errors where we added the same error to O U and O U . In this case the contribution was nonzero. As long as the added error did not change how the connected noise term was decoded, the incoherent-type added errors contributed a sin 2 θ/2 term on each qubit. Together with the cos 2 θ/2 term corresponding to no error on each qubit, this gives 1 for the disconnected part. This applies to the added errors that do not change how the connected part was decoded. Therefore, our approach is to write the disconnected part as 1 plus a correction that comes from the configurations of added errors that change how the connected part is decoded.
In Lemma 11 we considered added errors that change how O U is decoded in the coherent logical noise components. For connected noise terms that enter into the incoherent logical noise the correction to the disconnected part comes from the same source, certain added errors that change how the connected term is decoded. In that case the added errors flipped the correctable and uncorrectable sides of the partition, which gives a phase of −1. In the incoherent case, if O U is made correctable by the added errors, then so is O U , and the resulting term contributes to the identity part of the logical noise. In effect, there are disallowed added errors, which reduce the value of the disconnected part. The counting of such terms is identical to what we did in Lemma 11. Recall that these added error terms were related to connected terms that either had an atypical shape or an unlikely partition. The contribution to E 2 from these terms is proportional to the fraction of atypical logical strings from Lemma 10. The contribution from the unlikely partitions is exponentially small in √ L as before. There is another class of added errors that contribute to the correction to the disconnected part. Some errors are near the correction E s so that once the errors have been added, they become part of a new connected term. An example is shown in Fig. 18. This class of terms contains only incoherent-type added errors. Coherent-type added error placed here will still give 0 after we sum over the ways of splitting the errors into the left and right disconnected errors. This is the same as for coherent-type added errors far from the connected logical string. For a likely partition of a short logical string with typical shape, E s will have the same weight as O C . The incoherent-type added errors that join the connected part may either lie near to E s or they may be contained in E s . We will first study the case where the added errors are not contained in E s . These terms are closely related to the situation we just analyzed where the added errors sit next to O U . The condition on the added error is analogous to Eq.(273). Let D denote the added incoherent-type error and S denote the stabilizer group. Then, If this condition is satisfied, then the added error becomes part of a new connected term . Together with the new lowest-weight correction O U D forms a new connected logical strings. This logical string is similar to the old logical string, but it contains a detour where it veers off to include the error D. This logical string either has an unlikely shape because it includes two closely spaced vertical steps, or the error D has width > γL. In the latter case we also require that the original correction E s included > γL consecutive qubits. This is exponentially unlikely according to the counting we did in Lemma 4 and the bound we wrote in Eq.(190). We conclude that the contribution to the error term 2 from the noise terms we arrive at by adding errors proximate to O C is small. This leaves the added errors the lie within E s or one of the operators with the same syndrome and weight. In this case the new connected logical string that results from adding the errors is the same as the old logical string. We will compare the set of terms we arrive at by adding errors in this manner to the set of terms with the same O U but a higherweight O U . We will argue that there are more of the terms with higher-weight O U . We have already neglected such noise terms in Lemma 7, so we conclude that the correction to the disconnected part is small.
We start with a connected noise term (O U ρ O U ), where |O U | = |O U | = w. We form a higher-weight noise term by adding an incoherent-type error D within one of the operators O C to produce a new connected noise term (O U Dρ O U D). This adds a total of two to the weight of the term. We can place the added error anywhere within one of the operators O C . The number of possibilities is O(w). Now consider the possible connected noise terms with the same O U but instead of choosing O U with weight w, we set |O U | = w + 2. Suppose we start with an operator O U with weight w. We can construct one with w + 2 by adding an extra "cap" consisting of three qubits around a single plaquette or star. This is illustrated in Fig. 19  The full set of possibilities will generally be larger. If we consider a pair of added errors lying within O C , then we compare to the connected noise terms with |O U | = |O U | + 4. The noise terms where O U and O U have different weights were discussed in Section 6.10. We proved in Lemma 7 that the contribution of these terms is negligible. Finally we conclude, just as in Lemma 11 that each of the disconnected errors that contribute to the error E 2 can be matched with a connected noise term that we have already neglected. In other words, the error E 2 is less than E 1 . Therefore, we can say that the disconnected part = 1 and make only a small error for low-weight connected terms.

G Physical Y Errors
In Lemma 8 we considered rotations in the X-Z plane where the single-qubit rotation angles were allowed to differ. Here we prove that allowing for rotations partly around the Y axis on the physical qubits will decrease the coherence of the logical noise channel.
Lemma 13. Consider an L × L toric code and a noise channel that consists of single-qubit rotations by an angle θ about an arbitrary axis. Suppose that | sin θ| < 1/L as in Lemmas 3 and 5. Then, the connected contribution to the logical noise from low-weight terms is most coherent when the single-qubit rotations are about an axis in the X-Z plane. We proved elsewhere that the low-weight connected contribution dominates the logical noise components.
Proof. Let θ X , θ Y , θ Z denote the rotation angles about the X, Y , and Z-axis, respectively, so that θ 2 X + θ 2 Y + θ 2 Z = θ 2 . Of the coherent logical noise components, according to Lemmas 14 and 15, the dominant components are the ones (L aρ ), whereL a is a logical X or Z operator on one of the encoded qubits. We apply several of our lemmas to restrict the noise terms we consider, just as in Theorem 3. Among the noise terms that contribute to the coherent logical noise, we keep the terms with short, typical logical strings and nonexceptional partitions. Among the noise terms that contribute to the incoherent logical noise, we keep the terms where the logical string L is short and typical, |O U | = (|L| + 1)/2, |O C | = |E s |, and |O U | = |O U |.

Error Locations
First suppose that θ Z = 0. Then, the logical (X 1ρ ) noise component is generated from noise terms (O U ρ O C ) where O U and O C together contain X acting on every qubit along an X 1 logical string. Meanwhile, the incoherent logical (X 1ρ X 1 ) noise component is also generated by X 1 logical strings. In Theorem 3 we state a bound on the relative magnitude of these logical noise components. Here θ X plays the role of θ in Eq.(231). Under our θ X and θ Y rotation noise model, we also have a non-zero logical (Z 1ρ ) noise component. This is generated by connected noise terms, (O U ρ O C ), where O U and O C together contain both X and Y acting on every qubit along a Z 1 logical string. The number of Z 1 logical strings with length is the same as the number of X 1 logical strings with length . However, the weight of the noise terms that contribute to (Z 1ρ ) is 2 . The contribution of each noise term is (sin θ X /2) (sin θ Y /2) . In contrast, the noise terms that contribute to (X 1ρ ) are all proportional to (sin θ X /2) . Therefore, (Z 1ρ ) is exponentially smaller in L relative to (X 1ρ ) for any choice of rotation axis in the X-Y plane. The (Z 1ρ ) noise component has a negligible effect on the relative magnitudes of the coherent and incoherent logical noise components. We also have an incoherent (Z 1ρ Z 1 ) noise component. This is generated by noise terms, (O U ρ O U ), where O U and O U contains Y errors along an uncorrectable subset of a Z 1 logical string. These noise terms have magnitude (sin θ Y /2) +1 , which is exponentially large relative to the noise terms that contributed to (Z 1ρ ). It follows that the logical coherence is maximized when θ Y = 0 and θ X = θ. We began by supposing θ Z = 0. Next, we will consider the case where θ X , θ Y , and θ Z are all nonzero.
Suppose |θ Z | ≥ |θ X |. If not, switch the role of X and Z in what follows. Fix a Z 1 logical string L. The contribution of the logical string L toχ Z 1 I is a sum over the partitions of L. For each partition (O U ρ O C ), we can replace a Z error in O U with a Y error if we add an X error on the same qubit to O C . Similarly we can replace a Z error in O C by a Y error if we add an X error on the same qubit to O U . The Z syndrome is unchanged, but now we also have a non-trivial X syndrome corresponding to the X error on the chosen site. This does not change how any partitions are decoded, but it does change the weight. The contribution of each partition toχ Z 1 I is a sum over all combinations of either a Z error or a Y and an X error on every qubit in L. The terms with Y errors have higher weight. This means they contain extra factors of sin θ Y /2, which is small since | sin θ| < 1/L. At the same time, the logical string L contributes to theχ Z 1 Z 1 logical noise component. These noise terms include some that feature only Z errors and others with some number of Z errors replaced by Y errors. Unlike the contributions toχ Z 1 I , these terms with Y errors are not higher-weight. There are no extra factors of sin θ Y /2, and we conclude that the incoherent logical noise components are made larger relative to the coherent logical noise components. Therefore, the logical coherence is maximized when θ Y = 0.

H Other Logical Maps
In the Section 6.4, we restricted our attention to logical coherent terms of the form (L aρ ), where L a is an X or Z operator on one of the encoded qubits. Now we would like to consider the case where L a acts nontrivially on both encoded qubits or as Y on one or both of the encoded qubits.
Lemma 14. Consider the toric code with minimal-weight decoding and a noise model that consists of uniform single-qubit unitary rotations about a fixed axis. Then, the coherent logical noise components, (L aρ ), whereL a is a Y -type logical operator orL a is a non-trivial logical operator on both encoded qubits, are negligible relative to the components whereL a is an X or Z-type logical operator on one encoded qubit.
Proof. Suppose we haveL a = X 1 Z 2 . Logical strings of this type are the product of two operators of the type we have already considered. Each connected noise term that contributes to the logical noise component, (X 1 Z 2ρ ), is a product of a connected noise term that contributes to (X 1ρ ) and a connected noise term that contributes to (ρZ 2 ). It follows that up to corrections that come from the disconnected part, (X 1 Z 2ρ ) ≈ (X 1ρ )(Z 2ρ ). The logical components, (X 1ρ ) and (Z 2ρ ), are both small if error correction is working, so the logical (X 1 Z 2ρ ) component will be negligible. IfL a is Y -type operator on the first encoded qubit, the argument is the same, since Y -type logical strings are products of X and Z-type logical strings, Y 1 = X 1 Z 1 .
If we haveL a = Z 1 Z 2 , the logical component (Z 1 Z 2ρ ) is no longer a product of (Z 1ρ ) and (Z 2ρ ). This is because Z 1 and Z 2 -type logical strings can overlap, and this changes the counting of logical strings of a fixed weight. Fig. 20 shows two examples of this kind of logical string. At length 2L, where L is the code distance, there are many connected logical strings because we can have a single connected string that wraps the torus along both directions. If we count the shortest paths between two points in the square lattice separated by distance l 1 in the horizontal and l 2 in the vertical, we get number of shortest paths travelling l 1 horizontal and l 2 vertical spaces = l 1 + l 2 l 1 .
We can use this to bound the number of weight-2L logical Z 1 Z 2 strings. Fix two sites in the code, qubit i along the vertical edge of the code and qubit j along the horizontal edge. Now count the number of shortest paths that connect these points. We have Qubits L 1 L 2 Figure 20: Here are two examples of lowest-weight Z 1 Z 2 logical strings, L 1 and L 2 , that act as Z on both encoded qubits. Notice that red and green connect the edge points in different (but topologically equivalent) ways.
for the two ways of linking the edge points. We simply apply the result Eq.(278). In the end we find that number of weight-2L Z 1 Z 2 logical strings ≈ 4 L √ πL .
In Section 6.5, we counted logical strings that act as X or Z on one of the encoded qubits starting from length L, and we found exponentially many logical strings at higher weights. If we consider weight-2L logical strings, we find order µ 2L logical strings, where µ ≈ 2.64 for the 2D square lattice. This is more than 4 L , so we have more of the high-weight logical strings that act on only one encoded qubit. Further, in our path counting in Lemma 3, we neglected all logical strings of length > L + 2k for a constant k. The strings of length ≥ 2L contribute negligibly for large L. Then, we conclude that the logical noise components, (L aρ ), where L a is a Y -type logical operator or acts on both encoded qubits are negligible relative to the noise components where L a acts as X or Z on one of the encoded qubits.

I More General Coherent Terms
We have considered coherent logical noise components (L aρ ), whereL a is a logical operator that acts as X or Z on exactly one of the encoded qubits. We must also consider logical noise components (L aρLb ), whereL a andL b are different non-trivial operators on the encoded qubits.
Lemma 15. Consider the L × L toric code with noise that consists of single-qubit unitary rotations about a fixed axis by angle θ on every qubit, where | sin θ| is < 1/L as in Lemma 3. Each coherent logical noise component of the form (L aρLb ), whereL a andL b are different nontrivial logical operators, is negligible relative to the coherent logical noise components with L b =ĩd. Each of the more general coherent terms is given by (L aρ ) and (ρL b ) are both small (because we are interested in the regime where error correction succeeds with high probability.) Therefore, we may safely neglect all logical noise components (L aρLb ), whereL a andL b are different nontrivial logical operators.
Proof. Our approach here is to bound the coherent logical noise components (L aρLb ), wherẽ L a andL b are different nontrivial logical operators, by the coherent logical noise components we have already considered. This follows because the short connected logical strings with different logical action do not overlap much. Overlap here means that the strings contain the same error acting on the same qubits. One possible overlap is between Z 1 and Z 2 logical strings. Pick a Z 1 logical string, L 1 , and a Z 2 logical string, L 2 . One string runs left to right, and the other runs top to bottom. If the horizontal string is longer than L, the code distance, then it has vertical steps along it, and these steps may overlap with the vertical logical string. An example is given in Fig. 21. We assume L 1 and L 2 both have length ≤ L + 2k because of Lemma 3. Then, we use Lemma 9 to restrict to the case where all the steps are one lattice spacing at a time. Any possible overlap is on at most two sites as shown in Fig. 21. Further, if we consider all possible pairs of a Z 1 logical string and a Z 2 logical string, only order 1/L strings have any overlap at all, so we can neglect possible overlap. Because the two logical strings L 1 and L 2 are approximately disjoint, when we sum over partitions, each partition approximately factors into a partition of L 1 times a partition of L 2 . That is, each connected noise term in the sum for the logicalχ Z 1 Z 2 is a partition (O (2) C = L 2 . Therefore,χ Z 1 Z 2 ≈χ Z 1 IχIZ 2 up to small corrections from the overlap between L 1 and L 2 and from the disconnected part. Each of the termsχ Z 1 I and χ IZ 1 will be 1 if we are in a regime where error correction succeeds. Therefore, theχ Z 1 Z 2 logical noise component will be negligible relative to theχ Z 1 I logical noise component. The same holds for the other logical noise components with a nontrivial logical operator on each side ofρ. Then, we may safely neglect the more general coherent terms and consider only the (L aρ ) components.
Qubits Z 1 Z 2 Figure 21: Here we have a Z 1 and a Z 2 logical string. They have an overlap of two qubits, but if we fix one string and consider all possible paths for the other string, we see that only order 1/L have any overlap.

J Growth of Infidelity
The expression for the average infidelity after m applications of the noise channel from [20] is an upper bound.
where r m is the average infidelity after m applications of a fixed noise channel, r is the average infidelity after one application of the channel, d is the dimension of the Hilbert space on which the channel acts, and Θ is the coherence angle. For anything save unitary or completely coherent channels, the upper bound has a linear component. We expect that this linear part is not only an upper bound, but that the average infidelity will grow linearly to lowest order. Working in the Pauli transfer matrix representation, a unital noise channel is written as When channels are composed, we multiply the Pauli transfer matrices. After applying the same noise channel twice, we have diagonal entries (1 − λ j ) 2 + l|l =j β j,l β l,j .
After m applications of the noise channel, the diagonal entries are Then, the infidelity after composing the channel m times is proportional to To lowest order the infidelity grows proportional to r the first term in the upper bound in Eq.(282).

K Diamond Distance Bound
The diamond distance from identity can be bounded in terms of the average infidelity, r, and the sum of squares of the off-diagonal (coherent) components of the chi matrix.
Lemma 16. In Eq.(47), we upper bounded the diamond distance from identity for a channel by a function f based on [9]. This function depended on the components of the Pauli transfer matrix for the channel. With a little algebra, we can show where the constants are given by c 1 = d 2 L and c 2 = 2(d L + 1) 2 and d L is the dimension of the logical space.
Proof. We start with Eq.(47), and rewrite the Pauli transfer matrix in terms of chi matrix. We expand (1 − N i,i ) 2 and compare to r 2 . Eq.(49) reads where N is the Pauli transfer matrix representation of the noise channel. The diamond distance from identity is bounded by a constant times f . We can expand f in terms of the chi matrix elements. Recall that we already have Lemma 1 concerning the off-diagonal elements. Also, the infidelity r is related to the trace of the Pauli transfer matrix or the (0,0) element of the chi matrix.
We can write the diagonal components of Pauli transfer matrix in terms of the diagonal components of the chi matrix in the following way: where the set C i includes all the Pauli operators σ j that commute with σ i and the set A i is all Pauli operators σ l that anticommute with σ i . For example, in the case of a single-qubit channel N 1,1 = χ 0,0 + χ 1,1 − χ 2,2 − χ 3,3 .
Then, we can sum over all the diagonal components of N using the fact that the identity operator commutes with every operator.
where d L is the dimension of the logical space. Next, we can expand the diagonal term from Eq.(288): where we have used the trace preservation condition i χ i,i = 1. Because the noise channel is unitary, the diagonal components of the chi matrix are real and greater than 0. Then, we can bound When we substitute into Eq.(288) and use Lemma 1 for the off-diagonal terms, we have the following bound the diamond norm distance from identity: Finally, the average infidelity r is given by in the chi matrix representation. Eq.(287) follows.