Logical Error Rate Scaling of the Toric Code

To date, a great deal of attention has focused on characterizing the performance of quantum error correcting codes via their thresholds, the maximum correctable physical error rate for a given noise model and decoding strategy. Practical quantum computers will necessarily operate below these thresholds meaning that other performance indicators become important. In this work we consider the scaling of the logical error rate of the toric code and demonstrate how, in turn, this may be used to calculate a key performance indicator. We use a perfect matching decoding algorithm to find the scaling of the logical error rate and find two distinct operating regimes. The first regime admits a universal scaling analysis due to a mapping to a statistical physics model. The second regime characterizes the behavior in the limit of small physical error rate and can be understood by counting the error configurations leading to the failure of the decoder. We present a conjecture for the ranges of validity of these two regimes and use them to quantify the overhead -- the total number of physical qubits required to perform error correction.


Introduction
Quantum computers are sensitive to the effects of noise due to unwanted interactions with the environment. To overcome this, fault-tolerant protocols that utilize error correction codes have 1 Deceased 19 October 2012. been developed. These schemes allow arbitrary quantum gates to be performed in spite of the noise that is ubiquitous in current models of quantum computing.
The surface code [10,11] is one of a family of topological codes, and is the basis for an approach to fault-tolerant quantum computing for which high thresholds have been reported [12][13][14][15]. The toric code [16] is among the most extensively studied of this family of codes, revealing much insight into related topologically ordered systems. A great deal of work has concentrated on calculating thresholds for various error models [17][18][19], and on the discovery and implementation of new classical decoding algorithms [19][20][21][22][23][24][25]. The toric code performs well, with high thresholds for some commonly studied noise models.
A high threshold is a very desirable property of an error correcting code since for all error rates below the threshold, increasing the number of physical qubits encoding the quantum information reduces the logical error rate. In a realistic setting the code must be operating at an error rate below the threshold. Other quantities then become important to characterize the performance of a quantum computer, for example the code overhead, the number of physical qubits comprising the code that are required to adequately protect the encoded quantum information. This is an important consideration for the practical implementation of fault-tolerant quantum computation and has recently begun to draw some attention [26][27][28].
The logical failure rate of the error correction, denoted here as P fail , is a key metric of the performance of a code, since it describes the likelihood of failing to protect the encoded quantum information. In this work we seek the logical failure rate of the toric code for fixed code distance and physical error rate, p. The code distance is the minimum length of a stringlike operator that has a non-trivial effect on the code space, and in the case of the toric code such operators have a length equal to the lattice size L.
The toric code is a simple model that is closely related to other, more physically realistic systems. We expect therefore that results for the logical error rate scaling of the toric code could be applied in a range of other physical systems-most obviously the planar code (with open, rather than periodic, boundary conditions) and with noisy syndrome measurements. The techniques to determine the scaling of the logical error rate should be analogous although the numerics would be expected to differ from the toric code case [29]. Furthermore, once the scaling has been determined it can be used to calculate the fault-tolerant overhead for the planar code using the methods presented in this paper.
Below the threshold, the logical failure rate of a topological code is expected to reduce exponentially as we increase the code distance [16]. Although the code performance improves rapidly with increasing L, in the lattice of the toric code the total number of physical qubits scales as O L ( ) 2 . Manufacturing, storing, and manipulating resources with such a scaling is a non-trivial task with technology available at present. We should then ask not simply how large we can make the code, but how many physical qubits are required to achieve a desired error correction performance.
In order to answer this question, we examine the behaviour of the toric code in the presence of uncorrelated bit-flip and phase-flip noise. We numerically simulate the error correction procedure and use this to find the failure rate as a function of the input parameters L and p and find two operating regimes. The first of these, which we will call the universal scaling hypothesis, extends ideas by Wang et al [30] and uses rescaling arguments based on a mapping to a well-studied model in statistical physics (the two-dimensional random-bond Ising model, or RBIM). This approach provides a good estimate for P fail when the error weight (the number of qubits an operator acts on non-trivially) is high and code distance is large.
Rescaling arguments apply in the thermodynamic limit, and close to criticality, where the correlation length of the RBIM diverges and the appropriate length scale is the ratio of the lattice size to the correlation length, L ξ. As p decreases there is a point at which finite-size effects begin to dominate and we no longer expect the universal scaling hypothesis to apply. This limit corresponds to low physical error rates, as well as small lattices.
The second approach extends ideas by Raussendorf et al [12] and Fowler et al [15] to find an analytic expression for P fail in the limit p 0 → . When the error weight is low and the code distance is small this expression gives a good estimate of the logical failure rate. We will refer to this as the low p expression.
Although we know the limits in which each of these approaches is valid, we would like to make some quantitative statements about the range of parameters for which each is applicable. We shall present a heuristic argument for the range of L and p for which each regime gives a good approximation to the numerical data.
The structure of the paper is as follows. In section 2 we review the toric code and its properties. Readers familiar with this material may wish to skip to section 3 which discusses the universal scaling regime, in which rescaling arguments are used to estimate the logical failure rate. Section 4 describes the regime in which finite-size effects dominate the logical failure rate and the failure rate is dominated by spanning errors. In section 5 we present our conjectures regarding the ranges of validity of each of the two regimes described. In section 6 we use these results to demonstrate techniques to determine the overhead as a function of the single qubit error rate and the logical error rate. We conclude in section 7.

Background
In the toric code, physical qubits reside on the edges of an L L × square lattice, as shown in figure 1. There are n L 2 2 = physical qubits comprising the code. Periodic boundary conditions are imposed and the lattice can be imagined to be embedded on the surface of a torus.
The toric code is described by a set of two types of commuting stabilizer generators-the so-called vertex, A v , and plaquette, B p , operators, defined as where X and Z are the conventional single-qubit Pauli operators, v indicates a vertex and p a plaquette of the lattice. The A v operators therefore act on the four qubits surrounding a vertex of the lattice, and the B p operators act on the four qubits surrounding a plaquette, see figure 1. These four-body measurements can be decomposed into four two-qubit CNOT gates with the addition of an ancilla [16].
We denote the logical encoded state of the toric code by toric ψ . In the absence of noise, measuring any element of i toric toric The stabilizer group is generated by S with multiplication being the group action. All elements of the stabilizer group act trivially on the code space. The code space of the toric code is four-dimensional and hence can encode two logical qubits. This is independent of n, hence the toric code protects a constant number of logical qubits regardless of its lattice size.
The symmetry between the primal lattice and the dual lattice (constructed by replacing plaquettes of the primal lattice by vertices and vice versa) shown in figure 1, reveals a useful symmetry in the stabilizers of the toric code. On the dual lattice the A v operators act on the qubits surrounding a plaquette, as shown in figure 1. By considering both the primal and dual lattices we can view all stabilizers as closed loops, meaning that all plaquette-type operators on the primal lattice have an analogous vertex-type operator on the dual lattice. It follows that all results calculated for either bit-flip or phase-flip errors are interchangeable with results for the other type.
In the language of algebraic topology, all of the stabilizers correspond to homologically trivial cycles. In figure 2 we show an example of a homologically trivial cycle that is generated by multiplying two adjacent stabilizer generators together. We see that all homologically trivial cycles act trivially on the code space.
The logical operators are also represented by cycles of Pauli operators. However, these cycles wrap around the torus and are not homologically equivalent to stabilizers. The logical operators correspond to homologically non-trivial cycles and have a non-trivial effect on the code space. The minimum weight of a logical operator is L.
There are two sets of Z and X logical operators addressing the two encoded qubits (overbar indicates a logical operation). One of these, labeled Z 1 , is shown in figure 2 spanning the lattice vertically. The corresponding X 1 is also shown, and forms a closed horizontal loop on the dual lattice. By multiplying a logical operator by a subset of stabilizers we can continuously deform the minimum-weight cycle Z 1 into any other operator spanning the lattice vertically. The set of operators that are equivalent up to stabilizer operations belong to the same homology class [31].
Errors are detectable if they anticommute with at least one element of the set of stabilizer generators S. In this work we assume that stabilizers are measured perfectly. It follows that if any non-trivial eigenvalues are observed, this indicates the presence of errors with certainty. The pattern of stabilizers that anticommute with a given error reveals some information about the location and most likely type of error, although it cannot uniquely identify the error. This ambiguity is due to the code degeneracy.
The set of all errors on the lattice is called a chain, E. We use notation from algebraic topology to indicate the boundary of the chain of errors as E ∂ (a good introduction to algebraic topology can be found in many textbooks, for example see ref. [32]). The errors commute with the stabilizers except at the boundary of the chain where the measured eigenvalues are non- Figure 2. Left: Z 1 is a minimum-weight homologically non-trivial cycle, equivalent to a logical operator acting on the encoded information. Top: the X 1 operator, drawn as a cycle on the dual lattice (lattice not shown). The X 1 logical operator shares a single physical qubit with Z 1 and hence they anticommute. Right: an example of a homologically trivial cycle generated by multiplication of two adjacent plaquette operators. − eigenvalues because the stabilizer and error chain anticommute at these locations. Note that if the X error chain forms a cycle then it will not be detectable.
trivial. The full set of stabilizer eigenvalues is called the syndrome. Figure 3 shows a string of X errors and the two plaquette operators that anticommute with it.
Once the syndrome has been established we employ a classical algorithm called a decoder to decide which correction chain, E′, to apply. The goal of the decoder is to pair the non-trivial syndromes such that the total operator C E E = + ′ has the highest probability of being a homologically trivial cycle and thus a member of the stabilizer group. Failure of the decoding algorithm corresponds to the creation of a homologically non-trivial cycle. The decoder used in this work, the minimum-weight perfect matching algorithm (MWPMA), is described in the next section.

Error correction
The optimal threshold for the independent noise model that we consider here has been calculated using numerical techniques to be p 0.1093 c = [33][34][35][36]. However, there are no known efficient decoding algorithms that can obtain this threshold for the independent noise model on the toric code.
Several classes of sub-optimal efficient decoding algorithm exist [19,22,25,37]. The one used in this work is a version of Edmonds' MWPMA [38,39]. This algorithm pairs the nontrivial syndromes via a correction chain that has the least weight possible while satisfying the condition that its boundary matches the error chain boundary, i.e. E E ∂ = ∂ ′. This ensures that the total operator, C E E = + ′, is a cycle. We denote the threshold for the MWPMA by p c0 . Numerical simulations suggest that p 0.1031 0.0001 c0 = ± [30]. Although this algorithm gives a high threshold [30], we shall consider a heuristic modification described in detail by Stace and Barrett [40], that includes the effects of the degeneracy of E′ and can give thresholds up to p 0.106 c0 ≈ . Degeneracy counts the number of possible paths that the chain can take, given that its boundary and weight are fixed. Matchings with higher degeneracy have a higher probability of arising so they may be a priori more likely than some matchings with a lower weight.
The degeneracy itself is simple to calculate for a given (minimum-weight) matching. For instance, for a path m between two non-trivial syndromes, a and b, the degeneracy of that path D m is given by the number of different combinations of the links in the matching. The product of all individual D m is the total degeneracy of the matching, D M .
To take degeneracy into account we compute the matching using the MWPMA, where the edge weights d ab are modified by the effect of the degeneracy of that path. Then the weight passed to the algorithm becomes d D ln . Here τ is a weighting that we assign to the degeneracy term. The degeneracy is added in such a way due to entropic considerations, see ref. [40] for details. The decoding algorithm minimizes this quantity globally and this has been shown to lead to an improved threshold [40]. We refer to this enhanced version of the minimum-weight perfect matching simply as the PMA decoder.

Simulating noise and error correction
An important tool in this work is the numerical simulation of the detection and correction of errors on a toric code. Repeating random trials allows us to examine the failure probability of the code over a wide range of parameters. As stated earlier, we consider uncorrelated bit-flip and phase-flip errors arising at a rate p. It suffices to perform simulations for only one of these types of error since the results will be equivalent for the other.
The behaviour of the toric code is simulated by placing an error with probability p on each individual qubit of the toric code lattice of linear dimension L, giving rise to a (usually disjoint) error chain E. The syndromes are measured and the PMA decoder is used to determine the correction chain E′. These correction chains are added, modulo 2, to E and a parity check with each of the appropriate logical operators is used to determine the homology class of the total operator C. The result of this random sample indicates whether the error correction succeeds or fails.
The outcome of the Bernoulli trial (a single simulation of error correction) is assigned the value n 0 f = if C is in the trivial homology class and n 1 f = if it is in any of the non-trivial homology classes. To gather statistics we repeat this procedure N times for the same input parameters (L, p). Of these N trials, N n will have failed to perform error correction successfully. We therefore estimate the error correction failure probability as P N N fail f = and the variance of such a distribution is ( ) fail characterize the toric code performance.

The universal scaling hypothesis
In [30], Wang et al used ideas from the theory of critical phenomena in finite-sized systems to show that there is a critical point in the failure probability of the toric code. To do this, they used the two-dimensional RBIM which is a model of ferromagnetism in which antiferromagnetic couplings arise at random. The probability distribution of antiferromagnetic couplings in this model matches the probability distribution of errors in the toric code, hence a mapping between the two models can be constructed [16,21,30]. The RBIM has been extensively studied and it is known to undergo a phase transition from an ordered to a disordered phase as the concentration of antiferromagnetic bonds is increased. This implies a phase transition in the corresponding quantity of the toric code: its logical failure rate.
is the RBIM correlation length, we expect scale-invariant behaviour. This argument leads to the conjecture that in this regime the failure probability of the toric code is a function only of L ξ [30].
Below the threshold the failure rate is expected to depend exponentially on the system size [12,16], and also more generally in the fault-tolerant case [41] P L ln .
(3) fail ∝ − Numerical evidence for this will be provided later, in figure 4.
Together, the exponential dependence on L and the scaling hypothesis fix the functional form of P fail In this expression A and a are constants that can be determined using numerical fitting techniques, see section 5.1 and appendix A.
In practice the toric code will be operating in the correctable (p p c0 < ) regime so we use the rescaled variable x L ( ) 1 0 ξ = ν (alternatively this may be written as x p p L ( ) we can rewrite the universal scaling hypothesis as We determine the values of A, p c0 and ν 0 from a fit to data close to the threshold. In the remainder of this section we give evidence that the numerical data meet the two conditions required for the universal scaling hypothesis, namely an exponential decay of the failure rate as L increases and scale invariance.

Evidence for the universal scaling hypothesis
To observe the dependence of P fail on L and p we have generated a set of Monte Carlo data for p 0.01 0.08 ⩽ ⩽ and odd lattice sizes in the range L 5 23 ⩽ ⩽ . We use the simulation method outlined in section 2.3 with each simulation repeated N 10 7 = times using Kolmogorovʼs Blossom V MWPMA implementation [42]. We pass modified weights to the algorithm to account for degeneracy as described in section 2.2.
In figure 4 we plot the logical failure rate on a logarithmic scale, as a function of the lattice size. The shaded portion of the figure indicates the region where this exponential relationship is not expected to hold according to a conjecture that will be explained in section 5.
Each set of data in figure 4 is fitted using a quadratic ansatz in L P L L ln .
fail 2 α β γ = + + For data in the range p 0.035 0.08 ⩽ ⩽ and L 5 23 ⩽ ⩽ the quadratic coefficient γ is typically 2-3 orders of magnitude smaller than the linear coefficient β. This is strong evidence for a linear fit to the (logarithmic) data, suggesting a fit of the form P e L fail ∝ − , matching equation (3). For data with values of p 0.035 < the quadratic coefficient was comparable in magnitude to the linear coefficient. A selection of this data is also shown in figure 4, demonstrating that the behaviour of the data for these values of physical error rate is ambiguous. Nevertheless, figure 4 establishes an exponential dependence of the logical failure probability on L for a wide range of the data. The universal scaling hypothesis in equation (4) also requires the system to be scale invariant which implies that the behaviour of P fail should depend only on the length scale L ξ. This is demonstrated in figure 5 which shows the results of numerical simulations of the toric code failure rate close to threshold. The plot will be explained in detail in appendix A but now we simply note that rescaling the numerical data using the variable x L ( ) 1 0 ξ = ν leads to data collapse. This phenomenon describes the situation when data generated in different systems, in this case different lattice sizes, fall onto the same curve after an appropriate rescaling has been applied.

The low single qubit error rate regime
The universal scaling hypothesis is a good model for the logical failure rate when the lattice size is large and when there are sufficiently many errors. For a fixed lattice size, as p is reduced the universal scaling behaviour should not be expected to hold indefinitely. Indeed the numerical evidence suggests that when p becomes sufficiently small the scaling hypothesis fails. In the p 0 → limit the behaviour is given by the low p analytic approximation This is justified by considering the uncorrectable error configurations in the p 0 → limit and calculating P fail directly. Restricting ourselves to low single qubit error rates we consider the minimum number of errors that can cause the error correction to fail, L 2 ⌈ ⌉. To cause the error correction to fail these errors must lie along a single minimum-weight homologically non-trivial cycle of the toric code. If they fall in this way the PMA will certainly apply the remaining L 2 ⌊ ⌋ single qubit operators required to ensure C E E = + ′ is a logical operator. Figure 6 shows a sketch of how this happens.
Thus the expression in equation (8) for the failure rate is constructed via a counting argument. The first factor, L 2 , is the number of minimum-weight homologically non-trivial cycles of the code that exist. The second is the binomial coefficient which counts the possible combinations of L 2 ⌈ ⌉errors along a cycle of weight L. Finally we include a factor that accounts for the likelihood of exactly L 2 ⌈ ⌉ errors occurring on a lattice constructed from 2L 2 qubits, which is p p (1 ) . The single qubit error rate is small so we can neglect the final factor of p (1 ) L L 2 2 2 − −⌈ ⌉ to obtain equation (8). In the low p limit the L dependence is P e L fail 2 ∝ −⌈ ⌉ and we see that it is quantitatively different to the universal scaling regime, P e L fail ∝ − .

The validity of the two regimes
The range of parameters we consider in our numerical simulations encompasses both the small p limit and the universal scaling limit. For small single qubit error rates the weight of the errors is typically much smaller than the code distance and the low p analytic expression is applicable. Conversely, for large L the number of errors can be much larger than the code distance and we expect a universal scaling hypothesis to apply. These regimes are distinct, as we see from their differing dependence on the code distance. Each of the two regimes will provide a good approximation to the numerical data over some region of parameter space. We shall now make a heuristic argument to quantify those regions.
In order to make a conjecture about the validity of the regimes we consider the distribution of the number of errors that arise on a lattice of fixed size, at a known physical error rate. We will relate this distribution to L 2 ⌈ ⌉, half the code distance. This number is significant to the PMA decoder because if the weight of the error chain, E | |, is less than this number then the error is certainly correctable. In the case when E L | | 2 ⩾ ⌈ ⌉ a subset of the possible error configurations will lead to an incorrect pairing of syndromes, causing a logical failure. These are the spanning errors illustrated in figure 6.
The typical weight of errors on the lattice can be shown to be L p 2 2 . If L p L 2 2 2 < ⌈ ⌉ then the expected number of errors is less than half the code distance and logical errors are dominated by spanning chains, see figure 6. For a fixed p, as L increases this inequality is violated. When the number of errors is much greater than L but they are typically correctable, this is the universal scaling limit.
Requiring p L 1 ≫ (up to a numerical factor) leads to a relationship between L and p that determines a minimum single qubit error rate for a given lattice size below which the universal scaling hypothesis breaks. We make the arbitrary but natural choice that the mean number of errors on the lattice must be two standard deviations above L 2 ⌈ ⌉, leading to the expression

≈ + +
This expression, derived fully in appendix B, determines whether the behaviour can be considered to be within the universal scaling regime. We can find an equivalent expression for p L 1 ≪ , when the single qubit error rate above which the low p expression no longer provides a good approximation to the numerical data. This can be shown to be there is a 'crossover' region, in which the logical failure rate cannot be considered to be well approximated by either regime.

Testing the range of validity of the universal scaling hypothesis
Substituting p USH given by equation (9) into the universal scaling hypothesis in equation (4) yields an expression for the minimum P fail , for a fixed L, that belongs to the universal scaling regime. This expression is plotted as a grey line in figure 4 and hence the grey region indicates the region of parameter space where we do not expect the universal scaling hypothesis to hold. This supports the previous observation that most of the data we have obtained for p 3.5% < would lie outside the universal scaling region and therefore be poorly fit by equation (5).
We have fitted the universal scaling ansatz, equation (4) to the data that fall outside this grey region (the values of A, p c0 and ν 0 are all determined from the fit to the data around threshold). From the fit to the data in the universal scaling regime we find a 32.31 0.13 = ± . The data obeying the validity condition and the fit are shown in figure 7.
Let us now fix the code distance L and vary the single qubit error rate to see how the full set of data behaves in relation to the universal scaling limit. For each fixed L in figure 8, reducing x corresponds to reducing p. When p becomes sufficiently small the scaling hypothesis fails and as expected the failure rate deviates below the universal scaling law.

Testing the range of validity of the low error rate regime
We have proposed that, in the low p limit, spanning errors of the type illustrated in figure 6 dominate when L p L 2 2 2 < ⌈ ⌉. This is the validity condition we use for the low p regime, see equation (10).
We can rewrite equation (8) in terms of L and the rescaling variable, x. Figure 9 shows this analytic expression plotted for some small values of L along with the numerical data. As the probability of errors decreases on a fixed lattice the mean number of errors will approach L 2 ⌈ ⌉. As expected, the low p expression gives a good approximation for small lattice sizes and low = runs. Also shown in black is the fit of the ansatz, equation (4) with all values taken from the threshold fit (see appendix A) except for a which was extracted using a fit to the data set shown.
The universal scaling fit is also shown in black. The data are plotted on a logarithmic scale and coloured according to lattice size L. For fixed L, decreasing x corresponds to reducing p. As we do this the universal scaling hypothesis breaks at a point predicted by equation (9). This is indicated for a single lattice size (L = 11) as a vertical line. physical qubit error rates. The data and low p analytic expression converge as x decreases, so for fixed lattice size as the physical error rate decreases the approximation improves.

Comparison of the overhead in the two regimes
So far we have concentrated on determining the logical error rate as a function of the lattice size and single qubit error rate. Now we wish to demonstrate that it is possible to invert these relationships to find the overhead, Ω. This will be a function of the experimentally determined single qubit error rate, p, and maximum tolerable logical failure rate P fail .
In this work we demonstrate the calculation for the toric code with perfect stabilizer measurements. However the same techniques shown here will also be applicable to more physically realistic settings, for example a planar code with noisy stabilizer measurements. Although the numerics will differ from those presented here, the methods used are expected to be directly analogous.
The first step in calculating the overhead is to determine which of the two regimes (universal scaling or low p) the code is operating within. To do this we use the expression for p USH in equation (9), to find the minimum error rate for which the universal scaling hypothesis holds. Similarly we find p Lp , the maximum error rate for which the low p expression holds, using equation (10). In figure 10 we plot these two bounds, and the regions of validity that they indicate. Figure 10 therefore shows the region of (P fail , p) parameter space for which each of the regimes is expected to give a good approximation to the logical error rate. Once the correct regime has been identified, the overhead can be calculated.
In the universal scaling region the logical failure rate is P Ae a p p L fail ( ) c0 0 = − − ν . By using this to find the lattice size L as a function of P fail and p, and recalling that there are 2L 2 physical qubits comprising the toric code, we find the overhead in the universal scaling regime is given by c USH fail 2 fail Figure 9. The full set of renormalized data, coloured by lattice size. The low p analytic expression, equation (8) is shown for some small lattice sizes. As x decreases the analytic expression tends towards the data. This numerical evidence suggests that the analytic expression is an underestimate of the failure rate for this range of parameters.
where the constant a has been determined from fits to the data in this work, see section 5.1. The remaining parameters, A, p c0 and ν 0 , can be determined from a fit to data generated close to threshold, see appendix A for this calculation and for their numerical values.
The analytic expression for the low p regime, equation (8), can be simplified by assuming that L L L 2 2 2 ⌈ ⌉ = ⌊ ⌋ = and using Stirlingʼs approximation n n n ! ( e) 2 n π = . Inverting this simplified expression we obtain a solution for L that uses the Lambert W function [43]. We can simplify this using the approximate form for the lower branch of the function [44]. It follows that an approximate expression for the overhead in this regime is given by  Figure 11 shows a 3D plot of the overhead as a function of P fail and p. There is a significant gap between the two plots for most of parameter space (see figure 12) and an increase in overhead is seen as both p and P fail are increased. Allowing a higher logical failure rate will naturally reduce the overhead required, as will reducing the single qubit error probability.  This plot reveals the gap between the two regimes over the whole region of parameter space considered. It also reveals drop in overhead as the single qubit error rate is reduced, which is particularly striking for the low p regime. Figure 12 shows the difference between the required overhead in the two different regimes. For the range of parameters considered the low p expression always gives an estimate of the overhead that lies below the value given by the universal scaling hypothesis.
The low p expression tends to underestimate the logical failure rate for the range of numerical data simulated. Hence this may be considered to be a practical lower bound on the overhead required for those parameters. Conversely, the universal scaling hypothesis is an overestimate of the logical failure rate for most of the numerical data, and hence can be considered to be a practical upper bound to the resources required.

Conclusions
We have found two distinct operating regimes of the toric code. In one, the data can be rescaled and an ansatz based on this scaling and the exponential dependence of the failure rate on L can be used to find an empirical expression for P fail . In the other, a counting argument gives rise to an analytic expression for the failure rate in the p 0 → limit. We propose, using the probability distribution of the error weight for fixed (L, p), heuristic conditions for the range of validity of each expression. The expressions describing the two regimes have been inverted to calculate the system size required to achieve a desired logical success rate for a given single qubit error rate. We have used the expressions for the logical failure rate to demonstrate techniques to calculate the overhead, P p ( , ) fail Ω . We expect that the techniques we have demonstrated in this work will be applicable in a wide range of settings. In particular, more physically realistic geometries such as the planar code, whose logical failure rate is expected to be higher than that of the toric code [29]. Furthermore, we expect that the methods we have demonstrated can be used to calculate the overhead of a fault-tolerant quantum memory, in which the stabilizer measurements are imperfect. Since all topological codes are based on similar principles the techniques outlined in this work can be expected to be directly applicable despite the fact that the numerics in these cases will differ from those presented here.
Based on the numerical evidence, we claim that for most practical purposes the two regimes bound the required overhead. The numerical results presented in this work are dependent on the choice of the decoder. Similar scaling relationships would be expected for other decoding algorithms, particularly renormalization group-based decoders such as [22,37].
This work raises several open questions. It has been shown that the MWPMA decoder has a quadratically lower logical failure rate than the renormalization group algorithm [45]. However, we still believe that a comprehensive comparison of all existing decoders over the whole region of (relevant) parameter space would be interesting and worthwhile. A possible scenario is that the size of the topological code that can be realized will be fixed by technological limitations. In that case, a comparison of the analysis presented in this work for all known decoders below threshold would reveal which should be implemented to minimize the logical failure rate.
Decoders with high thresholds usually require a longer running time than those with more modest thresholds. We expect a tradeoff between time and space resources, suggesting that those decoders with longer running times may have smaller physical qubit overheads. This is interesting, because although a high threshold is desirable, for practical implementations the running time and physical overhead are also important constraints. Therefore it seems that a balance between these three figures of merit may be of interest for practical quantum computation.
Several of the limitations we faced have been addressed by Bravyi and Vargo in [26] during the preparation of this manuscript. The first of these addresses the crossover region between the two regimes we have identified. Bravyi and Vargo have constructed a heuristic ansatz that interpolates between the dependence on L of the low p regime, P e L fail 2 ∝ −⌈ ⌉ , and the dependence expected for larger physical error rates, P e L fail ∝ − . These functional forms match the two regimes we have identified so the ansatz by Bravyi and Vargo could lead to a method for interpolating between them.
Another benefit of the technique by Bravyi and Vargo is that it provides a fit to the numerical data in the small and moderate p regimes. A significant limitation we faced was the availability of resources to run the Monte Carlo simulations of the error correction procedure. For example, it was impossible to obtain data for P 10 fail 7 < − due to the running time of the decoder. Bravyi and Vargo have discovered a new technique for probing very low error rates on surface codes [26]. Obtaining data for very low logical error rates using this algorithm would help us to verify the conjecture of the range of validity of the low p expression, particularly for larger lattice sizes than we were able to test.

= −
ν . In order to do this, we must first establish the values of the threshold, p c0 , and critical exponent, ν 0 . The universal scaling hypothesis, equation (4), also relies on knowing the failure rate at threshold in the large L limit. In this appendix we show how these quantities are obtained from a fit to data close to the threshold.
The threshold for the stand-alone MWPMA decoding has been calculated previously as 10.306 0.008% ± [30]. Since we allow the degeneracy of the matching to affect the choice of correction chain, we repeat the calculation in this work to obtain the threshold for our enhanced PMA decoder.
To find the logical failure rate P fail we numerically simulate the error correction protocol, enhanced minimum-weight perfect matching (PMA), using the same method described in section 3.1. We performed N 10 6 = simulations of the error correction procedure for p close to 10.3% and for odd lattice sizes in the range L 5 25 ⩽ ⩽ . This set of data was only used for the purpose of finding the threshold and critical exponent, and is not the main data set used in this work.
The lattice sizes we use are far from the large L limit, so following the method from Wang et al the fitting ansatz was constructed by taking a quadratic expansion in x around the threshold x = 0 and accounting for finite-size effects by adding a single non-universal term that is dependent on the lattice size [30]. The ansatz is Here ν 0 is the critical exponent and p c0 is the threshold error rate for our PMA decoder. Figure 5 shows the rescaled data with finite-size effects subtracted, and the fit to the data. The relevant parameters were found to be The threshold for our modified decoding algorithms was found to be in agreement with the value found by Wang et al for the unmodified MWPMA [30]. This does not achieve the maximum threshold of p 10.6% c0 ≃ that is possible when the degeneracy of the matching is included [40]. This is because in the simulations performed for this paper we allow only a weak dependence of the choice of matching on the degeneracy in our modified PMA decoder. This means that the choice of matching is only weakly dependent on the degeneracy of the matching and the effect on the threshold is small. The value of the critical exponent ν 0 found here is in agreement with the value found by Merz and Chalker when calculating the optimal threshold value [34], although it does not agree with value found by Wang et al for the MWPMA decoder.
The analysis presented in this appendix establishes the validity of the rescaling approach to the analysis for this choice of decoder by demonstrating that the scaling asatz, equation (A.1) provides a good fit to the collapsed data close to the threshold.

Appendix B. Deriving the validity conditions
In this appendix we outline the derivation of the validity condition for the universal scaling hypothesis, p USH given in equation (9). The validity condition for the low p expression, p Lp given in (10) is not explicitly shown, but can be reproduced using a similar argument.
The single qubit errors occur independently and at a rate p. The weight of the error that arises, E | |, obeys a binomial distribution with a mean that coincides with the typical error weight  For the universal scaling hypothesis, the condition we have proposed is that μ, the mean of the probability distribution, is large with respect to L 2 ⌈ ⌉. This implies that the weight of the error chain that results is larger than L 2 ⌈ ⌉ with high probability. We can write this as L 2 μ ≫ ⌈ ⌉, or where n is the number of standard deviations above L 2 ⌈ ⌉ we require the mean to lie. We have chosen n = 2 for both the universal scaling hypothesis and corresponding condition for the low p expression. Substituting