Magic State Distillation with Low Space Overhead and Optimal Asymptotic Input Count

We present an infinite family of protocols to distill magic states for $T$-gates that has a low space overhead and uses an asymptotic number of input magic states to achieve a given target error that is conjectured to be optimal. The space overhead, defined as the ratio between the physical qubits to the number of output magic states, is asymptotically constant, while both the number of input magic states used per output state and the $T$-gate depth of the circuit scale linearly in the logarithm of the target error $\delta$ (up to $\log \log 1/\delta$). Unlike other distillation protocols, this protocol achieves this performance without concatenation and the input magic states are injected at various steps in the circuit rather than all at the start of the circuit. The protocol can be modified to distill magic states for other gates at the third level of the Clifford hierarchy, with the same asymptotic performance. The protocol relies on the construction of weakly self-dual CSS codes with many logical qubits and large distance, allowing us to implement control-SWAPs on multiple qubits. We call this code the"inner code". The control-SWAPs are then used to measure properties of the magic state and detect errors, using another code that we call the"outer code". Alternatively, we use weakly-self dual CSS codes which implement controlled Hadamards for the inner code, reducing circuit depth. We present several specific small examples of this protocol.

The possibility of a large scale quantum computer relies on fault-tolerant architectures, in which errors are corrected faster than they are created [1][2][3].The standard approach is to use stabilizer codes to protect logical qubits from noise [4,5], and perform quantum gates at the encoded level.The overhead of the fault-tolerance is only polynomial in the logarithm of the desired accuracy, but in practice the overhead is estimated to be overwhelmingly large [6,7].Particularly expensive operations are non-Clifford gates such as the π/4-rotation (T-gate) and Toffoli gate.A compelling approach is to inject a special state, called a magic state, into a Clifford-only circuit, and pass the cost of implementing the non-Clifford operation to the preparation of the magic states, which are distilled from noisy ones [8][9][10].
There exist several distillation protocols for the magic state (T -state) using specialized quantum error correcting codes [8,[10][11][12].Each code can provide a fixed degree of fidelity improvement that is given by the code distance.In order to achieve arbitrary good fidelity, one typically concatenates small routines.In terms of the number of input magic states of low fidelity per one output magic state of high fidelity, the best protocols to date are those in Refs.[12,13].However, these protocols require a large batch of thousands of magic states to be useful.
In this paper, we introduce an infinite family of distillation protocols, extending one of the very first protocols by Knill [8], and another by Meier, Eastin, and Knill [11].Our protocol produces n T -magic states using at most cn qubits and achieves at least c n-th order error suppression under the assumption that the sole noise source is the T gate, where c, c are small universal constants.Since the degree of error suppression is high, there is no need to concatenate small routines, reducing the space overhead significantly.
Our protocol is also asymptotically superior (conjectured to be optimal) in terms of noisy T count.For any fixed odd d ≥ 5, we show that the number of noisy T gates per one output magic state with error suppressed to d-th order converges to d exactly in the large code length limit.
Beyond the magic states for T gates, our protocol can distill magic states for rotation by π/2 k for k = 3, 4, . . .adapting the idea of Ref. [14], and any gate in the third level of Clifford hierarchy [15].(See also Ref. [16] for smaller angle (k ≥ 3) rotations, though we do not use ideas there.)For the latter, the asymptotic performance is similar to the T gate case.
Small instances of our family demonstrates reduction of space overhead, with a modest input T count.If noisy π/4 rotations can be directly done on qubits, an instance of our family operates on 34 qubits including measurement ancillas, produces 15 T -magic states with 5th order error suppression, and requires 29 noisy T gates per output.In comparison, to the authors' knowledge, any previous protocol that can operate on less than 50 qubits have either have lower order of error suppression, or requires more non-Clifford gates per output.
Recent innovations show that the π/4-rotation and Toffoli gate can be implemented faulttolerantly on a class of error correcting codes [17][18][19][20][21][22][23][24][25].These schemes achieve computational universality through local operations while circumventing no-go theorems [26,27] by going back and forth between two code spaces.This approach removes the need for magic states, but, it is not a simple question to tell which approach is better.This question depends on an architecture and underlying physical qubits' characteristic, and thus we leave the realistic cost analysis and comparison to future work.
The organization of the present paper is as follows.We start in Section 1 with a basic explanation of our ideas by exhibiting examples of small sizes.Section 2 explains how to convert magic state distillation protocols for T gates into those for Toffoli gates.In Section 3 we show that any weakly self-dual CSS code can be used in distillation protocols by implementing measurement of Clifford operators.In Section 4, we give asymptotic constructions of the codes, in the limit of either large distance or large code length.In Section 5, we give results of numerical simulations; in this section we also present some specific additional small size protocols that are not described elsewhere in the paper.We conclude with discussion in Section 6.In Appendix A, we give details, including stabilizer checks, for some of the specific codes used in the paper.Appendix B gives circuits for some of the protocols used.Appendix C describes unexpected relations among different distillation protocols.Appendix D explains an extension to qudits, using classification of symmetric forms over finite fields in Appendix E.
Throughout this paper, all classical codes that we consider will be linear codes over the binary field F 2 and all quantum codes will be qubit stabilizer codes (except for Appendix D).
Given a bit vector v, we use |v| to denote its Hamming weight.Our magic state is the (+1)-eigenstate |H = cos π 8 |0 + sin π 8 |1 of the Hadamard operator H.We use matrices 1 Basic Distillation Protocols Distillation protocols for magic states to date can be put roughly into three classes.Those in the first class implement a non-Clifford π/4-rotation to a stabilizer state such as |+ or |0 [10,12,16].The non-Clifford rotation must be done fault-tolerantly, so the protocols in this class focus on finding error correcting codes that admits a transversal non-Clifford rotation.This requires the underlying code to have a special symmetry, which is rather rare.The protocols in the second class [8,9,11,13,14,28] implement measurements of "stabilizers" of the magic state, based on the fact that a magic state in the third level of Clifford hierarchy [15] is an eigenstate of a Clifford operator.To measure a Clifford operator one needs a non-Clifford operation which has been implemented fault-tolerantly by a distance-two code.The third class uses yet different symmetries of codes [10,29], and has high threshold for distillation, but the success probability does not reach 1 even with perfect input magic states.
Our scheme in the present paper belongs to the second class, and is an extension of the idea of Knill [8].There are two levels of error correcting codes in our scheme, which we call inner and outer codes.Roughly speaking, the outer codes specify a certain set of measurements of Clifford operators on a set of input magic states, and the inner codes specify how to implement these measurements.We illustrate aspects of our ideas by two examples.They are not the best protocols in regards to e.g. the total number of non-Clifford gates and states, but will be simple to explain.A more general class of protocols is presented in later sections.
Without loss of generality, by a standard Clifford twirling argument, we can assume that each π/4 rotation and undistilled magic state suffers from independent Y errors with probability .We refer to this error model as the stochastic error model.

Trivial outer code
If we could implement the control-Hadamard, then the distillation is trivial: Prepare an ancilla qubit in |+ state, apply the control-Hadamard with the control on the ancilla and the Hadamard on an arbitrary target qubit, and measure the ancilla in X-basis to accept +1 outcome.The accepted target qubit is projected onto the magic state.
The control-Hadamard belongs to the third level of Clifford hierarchy, and thus cannot be implemented with Clifford operations.To obtain an approximate control-Hadamard with noisy non-Clifford rotations, we must use an error correcting code that can implement H on the logical qubits fault-tolerantly.

[[7, 1, 3]] inner code
To this end, we observe that the Steane code [30]  the group generated by which is fixed under the H ⊗7 : X i ↔ Z i , and the logical operator pair is which are interchanged by the transversal Hadamard.Using an identity (1.3) we see that the logical control-Hadamard is possible by replacing the middle Z by the control-Z.
The T gate can be noisy as they act on the physical qubits of the Steane code.This way, we have built a Hadamard measurement routine that is fault-tolerant.Then, a magic state distillation protocol is as follows: 1. Prepare a noisy magic state in the "data" register, and |0 in 6 check registers, and embed them into the Steane code 2. Prepare an ancilla in |+ and implement control-H ⊗7 using Eq.(1.3), where the control is the ancilla and the targets are the physical qubits of the Steane code.
3. Inverse the embedding of the Steane code.
4. Measure the ancilla in the X basis, the check qubits in the Z basis.
5. Upon +1 outcome in all 7 measurements, a distilled magic state is in the data qubit.
Let us examine the pattern of errors that may go undetected.There are two possibilities.
• The initial magic state is faulty, and this is undetected due to malfunction of the control-Hadamard.
The first possibility is because a pair of simultaneous errors sandwiching the control-Z can alter the ancilla measurement: Thus, the first possibility occurs with probability 3 to leading order.One can easily see that this is the only possibility for weight 2 errors from the control-Hadamard to escape.The second possibility occurs at order 3 since Steane's code has distance 3. Overall, the protocol operates on 8 qubits (assuming T gates are applied in place), consuming 14 T -gates and 1 state of error rate , producing 1 output T -state whose error rate is O( 3 ).
It is useful to think of the above protocol as a Hadamard measurement (H-measurement) routine that introduces a new error of order 3 to the target, and another error of order2 to the control.The error on the control is easy to fix; repeat the measurement. 1The error on the target is inherent to the choice of the inner quantum code, and should thus be overcome by another quantum code.

[[17, 1, 5]] inner code
There exists a distance 5 code on 17 qubits with H ⊗17 being the logical Hadamard.It is an instance of the color code [21,31].We include the binary matrix for this code in Appendix A. In a similar way as above, this H-measurement routine has error rate O( 5 ) on the target, and O( 2) on the control.By repeating the H-measurement twice using this inner code, the control's error rate becomes O( 4).The control error goes undetected only if the initial magic state is faulty.Overall, only weight 5 errors may be undetected.This protocol consumes 17 × 4 + 1 = 69 noisy T 's.
In fact, we can pipeline the two H-measurement routines: First, H-measure a noisy magic state using [ [7,1,3]] code, and then H-measure the outcome using [ [17,1,5]] code. 2 Hence, we have obtained a distillation routine with fifth order error suppression that operates on 18 qubits in total, consuming 1+7×2+17×2 = 49 noisy T 's.It is worth comparing this with the 69 → 1 protocol in the preceding paragraph.By using codes of smaller distance in the early stage of the protocol, we obtain a more efficient protocol.We loosely call this modification as a pipelined protocol.The circuit is in

Repetition outer code
Imagine we have n outer noisy magic states that are to be distilled.Under the stochastic error model, we can think of the noisy magic states as an probabilistic ensemble of n outer -bit strings where 0 denotes a good magic state, and 1 denotes a bad one.The protocol in the previous subsection examines one qubit at a time, and in terms of the bit strings, this amounts to checking individual bits.If our goal is to suppress the overall error to d-th order where d < n outer , the bit-wise check might be overkill.A better way is to devise a measurement routine that can check the parity of several bits.

[[4, 2, 2]] inner code
The simplest case is when n outer = 2 and the desired error suppression is quadratic.If we can measure H ⊗2 , then by postselecting on +1 outcome the noisy state is projected to the even parity subspace, which is O( 2 ) away from the pair of perfect magic states.We can describe the situation by saying that we have a repetition code on n outer = 2 bits with one parity check.This is an outer code.
A corresponding inner code should implement control-H ⊗2 to accuracy O( 2 ), both in the target and the control.Meier, Eastin, and Knill have designed such a measurement routine [11].The four qubit code [[4, 2, 2]] whose stabilizers are X ⊗4 and Z ⊗4 admits the transversal Hadamard H = H ⊗4 as a logical operator.If we choose the logical operators as then the transversal Hadamard swaps the two logical qubits.Using Eq. (1.3), this means that we can implement control-Swap to accuracy O( 2).Now, a trick is to use the control-Swap twice sandwiching the Hadamard: where the superscript C denotes the control that is common for both control-Swaps, and the subscripts 1 and 2 denote the qubits the operator acts on.The extra H 1 does no harm since the magic state is its eigenstate.The obtained control-H ⊗2 is accurate up to error O( 2) on the target since the distance of the four-qubit code is 2, and also O( 2) on the control due to Eq. (1.4).This is a quadratic distillation protocol operating on 5 qubits, consuming 18 noisy T 's to produce 2 outputs.316,6,4]] inner code and pipelining The classical Hadamard code [16,5,8] has a property that every code word has even overlap with any other code word.By the CSS construction, using these classical codewords as stabilizers, we obtain a [ [16,6,4]] code; see Appendix A.1.2 for the stabilizers.We will later show that there is a choice of logical operators such that the transversal Hadamard H ⊗16 implements simultaneous pairwise swaps on the three pairs of logical qubits.This implies that we can measure any even product H⊗2 , H⊗4 , or H⊗6 of Hadamards on k inner = 6 magic states.For example, we can generalize Eq. (1.6) to The H-measurement routine puts quadratic error to the control and quartic error to the target.Imagine that n outer = 6 magic states are laid on a ring.We measure H ⊗2 on every nearest neighboring pair of the magic states.There are six checks in total.The measurement pattern follows the parity checks of the classical repetition code; there is a redundancy in the checks, which turns out to be necessary.Let us see how this achieves quartic error suppression.In order for an error on one of n outer magic states to pass the measurement routines, the two checks that involve that input state must both be faulty.This process gives an O( 5) error, i.e., the probability of both checks being faulty is O(4 ), so including the error on the input magic state the error is O( 5 ).Note that if we did not have a redundancy in the checks of the outer code, using only 5 checks, one qubit would be checked only once and we would achieve only third order error suppression.More generally, any process involving one or more input magic state errors gives an error which is at most O( 5 ).The dominant error after all the H-measurements is then from the logical error by the H-measurement routine, which happens with probability O( 4 ).Overall, the protocol consumes 6 + 6 × (16 × 4) = 390 T 's to produce 6 outputs.
We can pipeline the [[4, 2, 2]] code routine in front of the [[16, 6, 4]] code routine to lower the complexity of the distillation circuit.For instance, we can run the three H-measurement routines by the [[4, 2, 2]] code on pairs of magic states (12), (34), and (56), and then run the three H-measurement routines by [[16, 6, 4]] code on pairs of magic states (23), (45), and (61).The number of T 's consumed is now 6 + 3 × (4 × 4) + 3 × (16 × 4) = 246, while the number of outputs is still 6.It is left to the readers to show that the modified version also achieves quartic error suppression.We have simulated the modified version, and the results can be found in Section 5.The circuit is in Fig.

Third level of Clifford hierarchy
The protocol above can be straightforwardly generalized to distilling other magic states to implement gates at the third level of the Clifford hierarchy.Consider a state |ψ on q qubits such that |ψ = U |φ where U is a gate at the third level of the Clifford hierarchy [15], and |φ is a stabilizer state.Here we show that any such state |ψ can be distilled.An example of such a state |ψ is the magic state to produce a CCZ gate, which is equivalent to the Toffoli gate up to a Hadamard on the target [1].
As |φ is a stabilizer state, we can identify q operators, S(1), S(2), . . ., S(q), which are products of Paulis, generating the stabilizer group of |φ , so that |φ is the unique (up to global phase) +1 eigenstate of those operators.For the CCZ, we see S(1) = X 1 , S(2) = X 2 , and S(3) = X 3 .Then, the state |ψ is the unique +1 eigenstate of the operators W (a) ≡ U S(a)U † , for a = 1, . . ., q. (2.2) These operators W (a) commute with each other by construction, and belong to the second level of the hierarchy, the Clifford group.For the CCZ, we see W Here is an example protocol for CCZ state distillation using three copies of the [[4, 2, 2]] code, comprising a [ [12,6,2]] code.We regard the three copies as a single [[12, 6, 2]] code and index the logical qubits by 1, . . ., 6.We encode one CCZ state stabilized by W (a) into logical qubits 1, 3, where the control qubit is common for every gate.The simultaneous control-Swaps are implemented by the control-H ⊗12 on the [[12, 6, 2]] code, where the control-H ⊗12 in turn is implemented by noisy T gates.Thus, we obtain a measurement routine for W (a) ⊗ W (a). 4  Then the protocol is to measure W (1)W (1), W (2)W (2), and W (3)W (3).Overall, this protocol takes 2 noisy CCZ states and 3 × (12 × 4) = 144 noisy T gates to produce 2 CCZ states at a lower error rate.Note that this explicit example protocol performs worse than existing ones in terms of non-Clifford gate count [32][33][34].By applying the Clifford stabilizers W uniformly at random to a noisy magic state for CCZ, it becomes a mixture of eigenstates of W 's. Hence we may assume an error model where an error flips at least one of W (1), W (2), W (3) with probability .Since the measurement routine puts measurement error at rate O( 2 ) and logical error at rate O( 2 ), the protocol achieves quadratic error reduction for CCZ state.For higher order reduction, one should use inner and outer code of higher distances.
A related discussion on the error model for the T state is given in Section 5.1.
In passing, we note that the Clifford unitary a,b,c=0,1 (−1) abc |abc .Since the CCZ state is permutation invariant, we obtain six such stabilizers.They do not commute, but any triple of them uniquely determines the CCZ state.The controlled version can be implemented with only four T gates [32] (See also [33]): It might be possible to use these stabilizers with normal codes such as [ [7,1,3]] and [ [17,1,5]], but because they do not commute the resulting measurement routine rejects faulty inputs with a probability less than 1, even in the limit → 0.

Inner Codes
In this section, we find a general class of inner codes that can be used in distillation protocols.On the first read, a reader may wish to skip the discussion on symmetric forms, noting only the magic basis in Definition 3.3, and the construction of codes in Theorem 3.5.

Symmetric forms over F 2
We consider finite dimensional vector spaces over the binary field F 2 .The space F n 2 is equipped with a symmetric dot product v • w = i v i w i ∈ F 2 .This dot product on F n 2 is non-degenerate, i.e., for any nonzero vector v ∈ F n 2 there is a vector w ∈ F n 2 such that v • w = 0. Let S be a null (self-orthogonal) subspace of F n 2 , on which the dot product identically vanishes.Since S is null, the dot product of F n 2 canonically induces a dot product on the quotient space and [w] denote the equivalence classes (members of the quotient space) represented by v and w, respectively.Let S ⊥ denote the orthogonal complement of S with respect to the dot product.
Lemma 3.1.The induced dot product on S ⊥ /S is non-degenerate.
Proof.First, we claim that (S ⊥ ) ⊥ = S.It is clear by definition that S ⊆ (S ⊥ ) ⊥ .Interpreting the orthogonal complement as the solution space of a system of linear equations, we see that the claim holds by dimension counting.For For any basis {[v (1) ], . . ., [v (k) ]} of S ⊥ /S, we consider the symmetric matrix Λ representing the dot product: Proof.The two options are not equivalent since λ n means that the every vector is selforthogonal, whereas I n implies that not every vector is self-orthogonal.For completeness, we give an elementary algorithmic proof by manipulating symmetric matrices.First, we claim that any symmetric matrix can be brought to a direct sum of I p and λ q for some p ≥ 0 and q ≥ 0, where q is even.If there is a nonzero diagonal element one can bring this to the top-left by permutation.Gaussian elimination on the first column and row reveals that I 1 is a direct summand.Induction gives a direct summand I p , and we are left with a symmetric matrix Λ with the zero diagonal.Any column cannot be zero since Λ is non-singular, and thus some permutation brings 1 to (2, 1) and (1, 2) entries of Λ .Gaussian elimination on the first and second columns and rows reveals a direct summand λ 2 .By induction, our first claim is proved.
The second claim is that I p+2 ⊕ λ q−2 ∼ = I p ⊕ λ q whenever p, q > 0, whose proof is immediate: Therefore, whenever p > 0, we have I p ⊕ λ q ∼ = I p+q .If p = 0, there is nothing more to prove.
The classification motivates the following notion of bases.
Definition 3.3.Given a null subspace S ⊆ F n 2 , a basis of S ⊥ /S is called (p, q)-magic if the symmetric matrix Λ representing the dot product on S ⊥ /S among the basis vectors is equal to I p ⊕ λ q for some p ≥ 0 and q ≥ 0. We call a magic basis normal if q = 0, or hyperbolic if p = 0.
We summarize the results of this section into a theorem.Theorem 3.4.For any self-orthogonal subspace S ⊆ F n 2 , there exists a (p, q)-magic basis for S ⊥ /S, where p + q = dim F 2 S ⊥ /S.If p > 0 and q > 0, then a (p + 2, q − 2)-magic basis exists. 5Any finite dimensional symmetric forms over the binary field is the intersection form of a closed topological surface X, defined by the cup product H 1 (X; F2) × H 1 (X; F2) → H 2 (X; F2) ∼ = F2 of cohomology.Indeed, In corresponds to the connected sum of n copies of RP 2 , and λn the connected sum of n/2 copies of 2-torus.The classification in the lemma is the orientability of a given surface, and Eq.(3.3) is expressing the fact that two RP 2 's can be turned into a torus in the presence of another RP 2 .We thank Michael Freedman for pointing this out.

CSS codes from self-orthogonal matrices
It is standard to associate a bit string v = (v 1 , . . ., v n ) to a Pauli operator: X where X j is the Pauli σ x on qubit j, and (3.4) The CSS construction of quantum codes applies to a self-orthogonal (null) subspace S ⊆ F n 2 : For every vector v ∈ S, we define an X-stabilizer X(v), as well as Z-stabilizer Z(v).The set of equivalence classes of X-type (Z-type) logical operators is then in one-to-one correspondence with S ⊥ /S.The number of logical qubits is thus We encode logical qubits by choosing a complete set of logical operators X(j) and Z(j) as follows.Choose a (p, q)-magic basis {v (1) , . . ., v (p) , w (1) , . . ., w (q) } of S ⊥ /S. 6Then, we define By definition of the magic basis, these logical operators obey the canonical commutation relation of Pauli operators on k qubits: Note that the commutation relation can be realized with arbitrary signs ± in the choice of the logical operators, but induced Clifford logical operators will depend on the signs.We enforce (3.6) in order for the transversal Hadamard H = H n inner to be the logical Hadamard H(a) .We have defined CSS codes based on self-orthogonal subspaces over F 2 : Theorem 3.5.Let S ⊆ F n 2 be a self-orthogonal subspace with a (p, q)-magic basis of S ⊥ /S.Then, there exists a CSS code on n qubits with p + q logical qubits and a choice of logical operators such that transversal Hadamard H ⊗n implements the logical Hadamards for the logical qubits 1, . . ., p, and simultaneously the swaps between the logical qubit p + 2j − 1 and p + 2j where j = 1, . . ., q/2.
We will call a weakly self-dual CSS code normal if a normal magic basis exists, and hyperbolic otherwise.It is possible for a normal code to have an even number of logical qubits, an even number of physical qubits, and an even distance.Every hyperbolic code, however, must have an even number of logical qubits, an even number of physical qubits, and an even distance.For instance, in Sec. 1, the Steane code [ [7,1,3]] and the [ [17,1,5]] color code are normal.The [[4, 2, 2]] code and the [ [16,6,4]] are hyperbolic.We have used (0, 2)-magic basis for [ [4,2,2]], and (0, 6)-magic basis for [ [16,6,4]].The "H-code" by Jones [13] is a normal code with parameters where k is even.Below, we will mostly use normal codes with (p, 0)-magic basis for distillation protocols.
We note that the CSS codes derived from a self-orthogonal matrices are not too restrictive.By representing each qubit in any stabilizer code of parameters [[n, k, d]] by Majorana modes, we obtain a weakly self-dual CSS code of parameters [[4n, 2k, 2d]] [35].We will briefly review this mapping in Section 4.2, where we will also present other families of such codes with improved rate.

Coding Theory and Asymptotic Performance 4.1 Asymptotic Performance
In this section we consider the asymptotic properties of the class of protocols defined above, for appropriate choice of inner and outer codes.We ignore all possibilities of pipelining, and use only a single inner and outer code to define each protocol; this will reduce the question of asymptotic properties to the question of the existence of code families with certain properties.
"Asymptotic" will refer to one of two limits.In the first limit, we consider a family of protocols parametrized by d, the order of reduction in error.An instance in the family reduces error probability from to a constant times d in the limit of small .We prove that Theorem 4.1.There is a family of protocols parametrized by an integer d ≥ 1 to obtain a d-th order reduction in error, using a total of Θ(d) physical qubits, producing n outer = Θ(d) magic states.The total number of T gates used is n T = Θ(d 2 ), so that the number of T gates per magic state is Θ(d).The T -gate depth of the circuit is also Θ(d), where the T -gate depth refers to the circuit depth assuming that an arbitrary Clifford can be executed in depth 1.
In the second limit, we fix d and consider a family of protocols parametrized by n outer , the number of magic states produced.We prove that Theorem 4.2.For any odd d ≥ 5, there is a family of protocols using n outer • (1 + o( 1)) physical qubits, producing n outer magic states with a d-th order reduction in error.The total number of T gates used is The reason d is odd is that the minimal weight of an error that is not caught in the protocol due to wrong H-measurement outcomes is always odd.
Given one particular protocol with out = C d in consuming n T /n outer T gates per output, an infinite family of protocols can be defined by concatenation with itself.For this concatenated family, the number of T gates to achieve an arbitrarily small error rate δ in output magic states scales like O((log 1/δ) γ ) where the scaling exponent [10,12] is Smaller values of γ reflect asymptotically more efficient distillation protocols.The triorthogonal codes [12] achieve γ → log 2 (3), and "multilevel" protocol [13] achieves γ → 1 + .We will comment on this multilevel protocol in Discussion 6.It was conjectured that no protocol could achieve γ < 1 [12].Both families in Theorems 4.1 and 4.2 achieve γ → 1 + .
We note that the measure γ slightly underestimates the T -count efficiency of the family in Theorem 4.1.In order to achieve an arbitrary small final error rate δ from a fixed initial error rate, say, = 0.01, we can pick a member P d of the family of error reduction degree d such that δ > C d ( /2d 2 ) d .Here C d is the leading coefficient of the output error probability of the protocol P d , which is at most the number of ways that weight d errors occur among n T = O(d 2 ) T gates; C d ≤ αd 2d for some α > 0 independent of d.For the condition δ > C d ( /d 2 ) d , it suffices that d > (log(1/δ) + log α)/ log(1/ ).We initially distill magic states to suppress the error rate from to = /d 2 , by using a concatenated protocol P init .This takes n init = O(log d) γ input magic states per output magic states for some γ > 1.We can then feed P d with the outputs from P init at error rate .It follows that magic states at error rate suffice to achieve final accuracy δ.Thus, the scaling of n T /n outer is linear in log(1/δ) up to a logarithmic correction.(One can iterate the argument recursively to further slow down the dependency on 1/δ.) Theorem 4.2 will use normal codes.The reduced number of T gates required to implement checks with a normal code is essential to obtaining the number of T gates in the theorem (we would need roughly twice as many using hyperbolic codes).This explains why d is chosen odd.The case of d = 1 is of course trivial: no codes are needed.Thus, the reader may wonder why the case d = 3 is not used; this is explained further below.Theorems 4.1, 4.2 will follow almost immediately given certain families of inner and outer codes obeying certain properties of the codes that we define below.We will prove these theorems given these properties in this subsection and we construct families of inner and outer codes with these properties in subsections 4.2, 4.3.
Consider first the inner code.This code will have k inner logical qubits and n inner physical qubits.The distance of the inner code will be at least d.Consider then the effect of errors in the T gates inside the inner code; i.e., in the T gates acting on the encoded state.To obtain d-th order reduction in error, it suffices to consider the case that fewer than d errors occur in such T gates.Since the inner code distance is at least d, these errors cannot produce a logical error.There is one way, however, in which these errors can have an effect without being detected by the inner code.It is possible that a pair of errors act inside the inner code, both on T gates acting on the same qubit.The effect of these errors is to cause an error in the check being measured by the inner code, i.e., if the check was measuring a given product of W operators specified by the outer code, we instead measure the opposite sign; we call this a "measurement error".
The possibility of measurement errors affects some of the properties that we require of the outer code.Of course we need the outer code to have distance at least d, as otherwise a pattern of fewer than d errors in the input magic states could cause an undetectable error, but this is not sufficient.It is necessary that a pattern of fewer than d errors causes enough checks to be violated so that even a small number of measurement errors will lead to an error detected by the code.This is defined by the property of "sensitivity" that we now define.
The outer code will have m parity checks, encoded in an m-by-n outer parity check matrix M , where each row of the matrix indicates a given check.We can measure rows of this matrix with even weight using an hyperbolic inner code and rows with odd weight using a normal inner code.For simplicity we will either have all rows have even weight or have all rows use odd weight so that we can use the same inner code for all checks.(More generally, one could use both a hyperbolic and a normal code.)Then, this inner code must have k inner greater than or equal to the maximum row weight of M .The difference between the row weight of M and k inner must be an even number.In this case, we say that the inner code can implement the checks of the outer code.That is, for any such vector, the number of violated parity checks is at least s.
We emphasize that sensitivity is a property of the check matrix of the outer code, rather than the codewords of the outer code, and in some examples the rows of the check matrix may be linearly dependent.A ( d, s)-sensitive parity check matrix is ( d − 1, s)-sensitive by definition. 2 )-sensitive M ), and given an inner code of parameters [[n inner , k inner , d]] that can implement the checks defined by M , the protocol yields d-th order reduction in error.The protocol overall takes n outer noisy magic states, 2n inner m noisy T gates when the inner code is normal, or 4n inner m when hyperbolic, and outputs n outer magic states.
Proof.Any error pattern inside the inner codes with weight less than d cannot cause a logical error.Thus, if an error pattern inside the inner code does not violate a stabilizer of the inner code, it either has no effect or it leads to an error in measurement of a check of the outer code; the latter possibility requires at least two errors inside the inner code.Any input state with |v| ≥ 1 errors will violate at least |M v| ≥ (d − |v|)/2 checks of the outer code.If no violation of these checks is detected, there must be at least 2|M v| errors on T gates inside the inner code.Thus, there must be at least d errors in total.
The input T gate and state count is clear from Section 1.
We now define some asymptotic properties of the codes needed.Choose w = k inner .Choosing d ≥ d − 1 and s = (d − 1)/2, this gives us an outer code such that the checks can be performed by the given inner code and we need to perform n outer (s/w) = n outer (s/k inner ) checks with the inner code.Each such check with the inner code requires using 2n inner T gates, so that the total number of T gates needed to perform the checks with the inner code is equal to 2n outer s(n inner /k inner ).Additionally, we need to perform n outer T gates to create the input magic states to the outer code.Thus, the total number of T gates is Taking n inner large so n inner /k inner → 1, we conclude n T → n outer d.
We can now see better why we needed d ≥ 5 in Theorem 4.2.This is because for d = 3, we have s = 1 and Lemma 4.15 does not apply.The reader will see later why the case s = 1 is excluded from that lemma; roughly, this is because in this case, each bit participates in only a single check and we would lack certain expansion properties for a certain graph defined later.

Inner Codes
In this subsection, we give asymptotic constructions of inner codes.There are at least two constructions of weakly self-dual codes with good rate and distance in the literature.We review these, before giving an alternative probabilistic proof which has some advatanges.
First, in Ref. [36], it is shown that given any ratio d/n, one can find a family of weakly self-dual CSS codes with n qubits and distance d and given ratio d/n achieving a rate k/n → 1 − 2H 2 (d/n), where H 2 is the binary entropy function.The codes found in that paper all are hyperbolic codes.However, we can obtain normal codes from them by a "puncturing procedure" (see also [37,Sec. 3.5]): Definition 4.8.Given a hyperbolic weakly-self-dual CSS code C on n qubits with k logical qubits, define a "punctured code" C as follows.Choose a qubit i (the code C may depend upon the choice of i).Write the stabilizer generators of C such that only one X-type and one Z-type generator is supported on i. Define C by removing qubit i and removing the stabilizer generators support on i.Then C has n = n − 1 qubits and k + 1 logical qubits.The code C is a normal code by construction.
If C is non-degenerate with distance d, then C has distance d ≥ d − 1.More generally, d + 1 is greater than or equal to the minimum weight of an operator which commutes with the stabilizer group of C, because given an X-type logical operator O in C then either O or OX i must commute with the stabilizer group of C. Indeed, one may show that puncturing the codes of Ref. [36] reduces the distance by at most 1.
The only disadvantage of this proof is that it is a greedy proof that we not know how to implement efficiently.While it would be desirable to find an explicit family of codes achieving this rate, we do not know how to do this.However, another construction in the literature is a randomized construction which allows us to give codes which, with high probability, have the desired distance.Unfortunately, this construction will only achieve k/n → 1/2 − H 2 (2d/n).This construction uses a general method to construct weakly self-dual CSS codes in Ref. [35].
Consider a stabilizer code C qubit which acts on n physical qubits and has k logical qubits and distance d.From this code, one can derive a code for Majorana fermions C M ajorana which acts on 4n Majorana modes and has k logical qubits and distance 2d, where now the distance refers to minimum weight of a product of Majorana operators that is a logical operator.The code C M ajorana is derived in the following way: For each physical qubit of C qubit , one introduces four Majorana modes, γ 0 , γ 1 , γ 2 , γ 3 , and declares that the product γ 0 γ 1 γ 2 γ 3 is a stabilizer of C M ajorana .For each stabilizer of C qubit , one defines a stabilizer of C M ajorana by replacing an operator X on a qubit by iγ 0 γ 1 , Y by iγ 0 γ 2 , and Z by iγ 0 γ 3 .The stabilizer generators of C M ajorana are given by bit strings of length 4n such that the dot product over F 2 of any pair of such bit strings is 0. Thus, from C M ajorana , one can define a weakly selfdual CSS code C wsd with 4n physical qubits, 2k logical qubits and distance 2d.Since a randomized construction (see, for example, Eq. 7.200 of Ref. [38]) gives stabilizer codes Since the randomized construction gives a lower bound to the weight of any operator commuting with the stabilizer group, we can puncture these codes and reduce the distance by at most 1.
Here we give another proof of the existence of such good weakly self dual-codes.This will lead to rate k/n → 1 − 2H 2 (d/n).For any fixed distance d, one can obtain families of stabilizer codes with n physical qubits and k logical qubits with the ratio k/n → 1 as n → ∞.While this improvement is only by constant factors over the construction via Majorana codes, it will lead to nice asymptotic expressions for the number of T -gates, n T , required to attain d-th order suppression in error.It is also a randomized construction, showing that codes in a certain ensemble have the desired properties with high probability.
Define a random ensemble of c-by-n self-orthogonal matrices as follows, where a matrix M is defined to be self-orthogonal if M M T = 0. Choose the first row of the matrix to be the all 1s vector 1.Choose the second row uniformly at random subject to the constraint that it have vanishing dot product with the first row.Continue in this fashion, choosing the j-th row uniformly at random subject to the constraint that it have vanishing dot product with the first j − 1 rows.(Remark: the requirement that the first row be the all 1s vector is simply chosen to simplify some notation, so that we do not need to add the requirement that each row have even weight.)Lemma 4.9.Consider a fixed n-component vector v, with v = 0 and v = 1.For a random c-by-n self-orthogonal M , the probability that M v = 0 is at most 2 −c+1 + 2 −n+c+1 .
Proof.Let w 1 , . . ., w c be the rows of M .Let V j be the self-orthogonal subspace which is the span of the first j rows of M .We will estimate the desired probability by a union bound, considering separately the event that v ∈ V ⊥ c and v ∈ V c , and the event that because the constraint that (v, w j ) = 0 is independent of the constraints on the vector w j .Thus, for any k, For k = c, we find in particular that Now we estimate the probability of the second event.Note that if v ∈ V c , there is a least j such that v ∈ V j .So, We have where we used Eq.(4.6).

Now we estimate the probability Pr
. This is possibly nonzero only if v • v = 0. Consider the space of all n-component vectors modulo vectors in V j−1 ; this quotient space has dimension at least n − (j − 1).Let π be the natural map from the space of all vectors to this quotient space.The vector πv is nonzero by assumption.The vector w j is subject to at most j − 1 independent constraints from V j−1 .Consider the space of possible πw j , given that w j obeys those constraints; this space has dimension at least n − 2(j − 1) and so the probability that a random vector in this space is equal to πv is at most 2 Proof.This follows from Lemma 4.9 and by a first moment bound.For a random M from the above ensemble, the expected number of vectors v = 0 with Hamming weight at most Proof.Immediate for the hyperbolic case.Since the lemma 4.9 upper bounds the probability that an operator commutes with the stabilizer group one can also puncture these codes to obtain a normal code.

Outer Codes
In this subsection, we construct families of outer codes with good check rate and sensitivity.We begin with a randomized construction, and then show how to construct explicit families using previous results in coding theory.
Lemma 4.12.There exist families of outer codes with good check rate and sensitivity and even row weight.Similarly, there exist families of outer codes with good check rate and sensitivity and odd row weight.
Proof.We only give a proof for check matrices of even row weight.The proof for odd row weight case is completely analogous.Consider a random m-by-n outer parity check matrix M .Let d = n outer − 1. Choose each row independently but with the constraint that it should be of even weight.For any vector v with |v| ≤ d, the syndrome vector M v has independent entries from the uniform distribution.Thus, the probability that |M v| ≤ s for s ≤ m/2 is bounded by where is the binary entropy function, and O * hides polynomial factors.The number of such vectors v is bounded by 2 nouter .By a union bound, the probability that there is an error vector v of weight less than d such that the syndrome has weight less than s is bounded by For sufficiently large ratio m/n outer and sufficiently small ratio s/m, this quantity is exponentially small in n outer .
The above randomized construction is very similar to randomized constructions of classical codes with good rate and distance, where we define Definition 4.13.A family of classical error correcting codes with increasing number of bits n has good rate if the number of encoded bits k is Θ(n) and has good distance if the distance That is, even though we are considering very different properties (number of violated checks rather than distance of the code), the first moment argument above is very similar to standard first moment arguments to construct such codes with good rate and distance, with some additional technicalities required to ensure even weight of the parity checks.This is not a coincidence.As we now show, given a family of codes with good rate and distance, one can construct a family of codes with good check rate and sensitivity.Thus, the code with parity checks encoded by M has only two codewords (the all 0 vector and the all 1 vector) and any message which is not a codeword will violate at least d checks.
Proof.For any (k + 1)-bit vector v, the vector M v is a codeword of C. If v is nonzero and is not equal to the all 1 vector, then M v is a nonzero codeword of C and hence has weight at least d.
Since n outer = k + 1, in order to obtain an even n outer , if C has k even, we can simply define a new code C which encodes k − 1 bit messages into n bit codewords by using any (k−1)-dimensional subspace of the codewords of C, in this way obtaining a parity check matrix for a code with n outer = k − 1 + 1 = k.
Using lemma 4.14, we can construct explicit families of codes with good check rate and good sensitivity given any explicit family of codes with good rate and good distance.As an example of such a code family, we can use the expander codes of Ref. [39].Proof.A parity check matrix M defines a bipartite graph G, often called a Tanner graph.One set of vertices of the graph (which we call B labeled by the columns of M ) corresponds to bits of the code and the other set (which we call C labeled by the rows of M ) corresponds to checks, with an edge between a pair of vertices if M is nonzero in the corresponding entry.Equivalently, given such a bipartite graph G, this defines a parity check matrix.We claim that given a bipartite graph with all vertices in B having degree s and all vertices in C having degree w and with girth > 2 d, the corresponding parity check matrix defines a code with the desired properties.Once we have shown this, the lemma follows, since Ref. [40] shows the existence of such graphs.
Note first that the degree of vertices in C corresponds to the row weight of M .Next, note that if all vertices in C have degree w and all in B have degree s, then m = |C| = n outer s w with n outer = |B|.
To prove the claim, let V ⊆ B be a nonempty set of erroneous bits.By assumption, 1 ≤ |V | ≤ d.Consider a subgraph H of G defined by all vertices of V and its neighbors.By the girth condition on G, the subgraph H has to be a collection of disjoint trees.Thus, it suffices to prove the claim in case where H is connected.If |V | = 1, then the error violates s checks, and we are done.If |V | ≥ 2, let v 1 , v 2 ∈ V be a pair that are the furthest apart.The choice of the pair ensures that each of v 1 and v 2 has s − 1 leaves attached to it.Therefore, V violates at least 2s − 2 ≥ s checks.
Note that the ratio m/n outer = s/w in lemma 4.15 is the best possible, because each bit must participate in at least s checks (i.e., every column of the parity check matrix must have weight at least s).Note also that though the existence of desired graphs is guaranteed, they might be too large in practice; w s ≤ n outer ≤ poly(w s ) [40].However, one does not have to be too strict on the biregularity of the graph in practice.If small violation of the biregularity gives a much smaller graph, then it might be more useful.

Numerical Simulation
In this section, we give results of numerical simulations.We begin by explaining the error model we used for simulations.We then explain two protocols that we simulate that are not explained previously; one of these protocols uses a [ [21,3,5]] code.Then we give the simulation results.One interesting result of the simulation is how little effect the subleading terms have, even at fairly large noise values.

Magic state fidelity
When we inject a magic state µ for a π/4 rotation into a quantum circuit, there is a probability for correction K by angle π/2 to be applied.If we represent the overall procedure by a quantum channel C µ , it is C µ (ρ) = Π + (ρ⊗µ)Π + +KΠ − (ρ⊗µ)Π − K † , where Π ± denotes the measurement combined with a control-Pauli on the magic state and a target data qubit.Let |µ 0 be the ideal magic state, and |µ ⊥ 0 be the orthogonal state.Then, it is straightforward to calculate that This implies that for any initial approximate magic state µ, the result of the injection is the same as if µ had been through a twirling channel E that dephases the magic state in the basis {|µ 0 , |µ ⊥ 0 }: The twirled state is away from the ideal state in the trace distance, 7 resulting in error at most to the quantum circuit's outcome.The error can be expressed by the squared fidelity8 as This formula is convenient in that it yields the same answer regardless of whether or not twirling is applied to µ (this is the last equality in the above formula).When a state µ n that approximates µ ⊗n 0 is injected, the error from this multi-qubit magic state is given by 1 − F 2 (µ ⊗n 0 , µ n ).Note that F 2 (µ 0 , µ) is linear in µ.Below, we use 1 − F 2 as the probability of error to report our simulation results.

Error Models
The typical model to analyze distillation protocols is the stochastic error model.In typical distillation protocols, one has only a single output magic state, and so one is interested in the probability that the output magic state has an error as a function of the input, conditioned on no error being detected by the code; the error probability is a ratio of polynomials in , with the leading term being of order d for some d, with an integer coefficient.
For our purposes, since the codes used are fairly large, enumeration of all possible error patterns becomes difficult, especially if one wishes to go beyond leading order in .For this reason, we use numerical simulation.One could simulate a mixed state, using a quantum channel to describe an approximate T -gate; however, this is numerically prohibitive and so we prefer to use an approach that involves only pure states.One could numerically simulate pure states using the stochastic error model by choosing errors to occur with probability p, and sampling the output error probability.However, this simulation also becomes difficult, precisely because the codes lead to a high suppression in the error.For example, if the target error probability is 10 −10 , one would require ∼ 10 10 samples, with a fairly large number of qubits needed to be simulated in each run, to determine the output error probability accurately.
While there may be ways to overcome this sampling issue using importance sampling, we use another method.Instead of rotating by either π/4 or by 5π/4 as in the stochastic error model, each T gate rotates by an angle chosen uniformly in the interval [π/4 − θ, π/4 + θ], for some angle θ > 0.Then, conditioned on the code not detecting an error, we determine the error in the output state.
In fact, the model with input angles [π/4 − θ, π/4 + θ] and the stochastic error model describe the same average input state, assuming an appropriate choice of and θ.
Hence, one wants ≈ θ 2 /12.(We emphasize that this is in a notation where θ is the rotation angle in the Bloch sphere; the T -gate is a rotation by π/4, not by π/8.)In the stochastic error model with small , one must do roughly 1/ runs to obtain meaningful statistics, while here, one needs only a constant number of runs.The reason is as follows.Since θ is small, the simulated circuit can be approximated by an analytic series in θ, and the linear term amounts to a single error, which is projected out by the post-selection on measurement outcomes as our protocol always has d ≥ 2. Thus, in our post-selected simulation, a circuit with ρ x∈[−θ,θ] is equivalent at leading order to a circuit with ρ x where ρ x is ρ x with the linear term in x dropped.Then, the distance from n-sample average of ρ x to µ is O( / √ n), whereas the distance from n-sample average of µ i (i = 0, 1) to µ is O( /n).The acceptance probability depends on the fidelity to the ideal magic state µ 0 , which is 1 − O( ) = 1 − O(θ 2 ) in any case.

[[16, 2, 4]] Inner Code
In subsubsection 1.2.2 we explained a protocol using a [ [16,6,4]] inner code.This required using a total of 17 physical qubits, namely 16 for the code and one ancilla.We can also modify this inner code to a [ [16,2,4]] inner code, by turning some of the logical operators into checks.This inner code suffices to implement the H-measurements on pairs of states ( 23), ( 45), ( 61) and so it can implement the checks of the outer code used in subsubsection 1.2.2.Using a [ [16,2,4]] inner code, if we want to have n outer = 6, we need a total of 21 physical qubits, since we need 16 for the code, plus 4 for the logical qubits not encoded in the code, plus one ancilla.Thus, this requires additional physical qubits compared to the [ [16,6,4]] code.The reason for considering the [ [16,2,4]] code in numerics is to see if it reduces the prefactor in the error, since the [ [16,2,4]] code has fewer logical operators than the [ [16,6,4]] code.The number of T gates in the protocol using [ [16,2,4]] is the same as that using [ [16,6,4]].We pipeline the protocol with the [ [16,2,4]] inner code in the same way as we did with the [ [16,6,4]] inner code.See

[[21, 3, 5]] Inner Code
Another inner code that we used is a [ [21,3,5]] inner code, described in Appendix A. This allows us to obtain fifth order reduction in error.We used n outer = 4 with the outer code having check matrix A simple pipelining can reduce the noisy T gate count compared to this protocol.Distill three independent magic states using [ [7,1,3]] inner code.(The outer code is trivial in this case.)The three distilled magic states is then pipelined into the [ [21,3,5]] inner code.This produces 3 magic states with error O( 5), consuming, per output, 28 T gates and one T state with error .

Results
The results of the simulations are shown in Fig. 5.1.Note that the plots are close to linear on a log-log plot, with only small deviations at high error rate.Each data point represents the average of at least 10 4 runs, with statistical flucutations negligible on the scale of the plot.The asymptotic behavior is within statistical error of that given by an enumeration of minimum weight error patterns.
We emphasize that out indicates the probability that there is any error in the output state which is a multi-qubit state.Suppose that two protocols give the same value of out for a given in , but one protocol has a large n outer .If the total number of magic states needed in our computation is large compared to n outer , the number of times we need to call the protocol is inversely proportional to n outer , and so the protocol with the larger n outer for the given out is less likely to produce an error.
The probability that no error is detected by the protocol is roughly (1 − in ) n T .This result would be exact if any error in an input T gate led to the protocol detecting an error.Instead, some high weight error patterns do not lead to any error detected by the code, leading to slight corrections to this formula.

Discussion
We have given a general scheme to construct distillation protocols using inner and outer codes.If desired, our protocols can be concatenated with other protocols.However, on their own, they achieve asymptotic behavior which is conjectured to be optimal, as well as having small size examples which perform well.The concatenation can be useful in an architecture where Clifford gates are performed at a logical level so that one can tune the fidelity of Clifford gates to match the relatively low of output magic states from early stages of concatenation [41][42][43].In contrast, our schemes without concatenation can be useful if Clifford gates are already of high fidelity.
One of the major advantages of our protocols is the small number of qubits that they use, as they maintain a constant ratio of physical to logical qubits in the asymptotic limit.It is interesting to consider the asymptotics of this overhead between physical and logical qubits.Note that given any distillation protocol, there is a trivial way to define a new protocol with a fixed ratio of physical to logical qubits.Suppose, for example, that some protocol uses n phys qubits to produce 1 output magic state.Here n phys includes all physical qubits used in the protocol, not only initial noisy T states.Call this protocol P .One can define a new protocol P that works on 2n phys qubits to produce n phys output magic states: First, by working on n phys qubits, leaving other n phys inactive, we obtain 1 output, which is put away.Next, we use n phys qubits out of 2n phys − 1 qubits to obtain the second output, which is put away.Next, we use n phys qubits out of 2n phys − 2 qubits to obtain the third output, which is put away.Continuing, we apply P a total of n phys times sequentially.However, the circuit depth of P now is proportional to n phys times the depth of P .
In contrast, in Theorem 4.1, we obtain d-th order error reduction at fixed ratio of physical to logical with a T -gate depth proportional to d.That is, the protocols we have constructed are space and time efficient in terms of T -gates.It should be noted that we have ignored all Clifford gates, and thus the T -efficiency claim is meaningful when T -gates are much more costly than Clifford gates.
Including Clifford gates, the entire circuit of Theorem 4.1 has depth O(d 2 log d) if geometric locality of the gate is ignored.Indeed, an n-qubit stabilizer code's encoding circuit can be constructed in depth O(n log n).It is essentially Gauss elimination of a binary symplectic matrix.The Gauss elimination for a single column can be done by a circuit of depth O(log n), and hence the entire Gauss elimination can be done by a circuit of depth O(n log n).Since we are using a good family of codes, the circuit depth of an encoding circuit is O(d log d), and there are O(d) encoding/decoding steps.The total gate count (spacetime) per output is O(d 2 log d).
Jones [13] constructed a family of protocols giving γ → 1. (For the definition of γ, see the discussion below Theorem 4.1.)This protocol builds upon Knill's [8], and is in fact a subclass of ours.Implicitly in Ref. [13], the inner code is obtained by concatenating a (For the definition of normal and hyperbolic codes, see 3.) The outer code is a hypercubic grid with checks along coordinate axes.The dimension of the grid is proportional to ν • 2 ν , since a check using an inner code of distance 2 ν has to be implemented 2 ν−1 times along 2 ν−1 independent axes, and the concatenation for the inner codes is also performed on a grid.For a given ν, in the large k limit, one obtains a protocol that consumes 2 ν + 1 = d + 1 noisy T gates per output at d-th order of reduction in error.The asymptotic performance of this family is similar to our Theorem 4.2.If we worked with normal codes of even distance in Theorem 4.2, then we would have concluded that the input T count per output is d + 1.Note that the space requirement of Jones' scheme is much larger than that of Theorem 4.1, as the grid outer code used by Jones would require roughly k ν2 ν = k O(d) qubits.This exponential dependence on d also holds for the protocols in the proof of Theorem 4.2.In contrast, Theorem 4.1 requires only O(d) qubits.
In comparison with Jones' scheme, our main technical contribution is to explicitly separate inner and outer codes with general criteria for them to be useful in distillation.The criteria are that inner codes have to be weakly self-dual, and outer codes have to be sensitive (i.e., the parity check matrix M should satisfy 2|M v| + |v| ≥ d for any nonzero error vector v).These requirements are rather simple, so we were able to consider random constructions for them.In fact, a large pool of existing codes can be incorporated.In particular, we note that there are quantum BCH codes [44], some of which have encoding rate greater than 1/2 with code distances 5 and 7 at modest code lengths.
For Theorem 4.2 we have resorted to a graph theoretic construction of outer codes from Ref. [40].This is sufficient for the proof, but one may wish to have more concrete examples.In fact, a hypercubic grid of dimension D ≥ 3 yields an outer code of desired sensitivity for d = 2D + 1, which will be analyzed in detail elsewhere [45].

A Specific Small Inner and Outer Codes
In this appendix, we give some specific inner and outer codes, either giving the stabilizers or referring to the literature.Some of these codes are explained in the basic distillation section 1 or in numerical simulations 5 in the body of the paper.Other codes have other useful properties that we describe for specific codes.
When we give stabilizers for an inner code, each row gives one stabilizer generator.Each row consists of a binary string, of length equal to the number of qubits, with a 1 indicating that that stabilizer acts on that qubit, i.e., we give the parity check matrix.
This is explained in Section 1.

A.1.4 [[17, 1, 5]] Inner Code
This is an instance of color code [21,31].It is the smallest normal code that we found with k inner = 1 and distance 5.The stabilizers are: 11011010101000010 01100011001100110 00110110010011001 00010101000111110 00001110010011101 00000101000110000 00000011111011010 00000001010100001 A. 1.5 [[21, 3, 5]] and [ [23,1,7]] Inner Codes The (extended) Golay code is a classical self-dual code which has parameters [24,12,8].Puncturing a bit by collecting all code words that has zero on that bit, we obtain a self-orthogonal [23,11,7].From this, we obtain a weakly self-dual CSS code which is [ [23,1,7]].(Reichardt has used this code in a very different distillation protocol [29].)There are many positions to puncture, but due to high symmetry of the Golay code, the resulting codes have the same weight enumerators.One can pipeline the [ [23,1,7]] code after the protocol of section 1.1.2to give a protocol with one output magic state and seventh order suppression in error.

A.1.6 Other Inner Codes
Some other examples of inner codes can be found in Ref. [46], from which we reproduce optimal k inner found for given distance and n inner in Table A.1.For stabilizers, see Ref. [46].n inner 16 20 24 28 30 20 28  Table A.1: Parameters of small hyperbolic weakly self-dual CSS codes [46].The code [ [20,2,6]] can be constructed from the five-qubit code [ [5,1,3]] by going through the Majorana operators [35], while the others cannot be constructed in this way.

A.2.1 Petersen Graph Code
The outer code in section 5.3.2 has 4 qubits uses 4 checks of weight 3.However, from Lemma 4.15, we know that there is some n outer such that there is a code which is (4, 2) sensitive with weight-3 checks, which has only (2/3)n outer checks.We now explain this code.The proof of Lemma 4.15 reduces the problem of finding such a code to finding a bipartite graph G. Since the set B of that lemma has degree 2, we can equivalently define the code by a graph H such that the vertices of the graph H correspond to checks and the edges correspond to bits; i.e., in the case that B has degree 2, the possible bipartite graphs G are in one-to-one correspondence with degree-3 graphs H.Then, from the proof of Lemma 4.15 we know that if H has girth at least 5, then the corresponding code is (4, 2) sensitive.The smallest such graph H is known to be the Petersen graph.This is a degree-3 graph with 15 vertices and 10 edges.This graph can be thought of as the dodecahedron with antipodes identified.Note that the girth being 5 is optimal in this case, because if H has girth 4, then there is a weight 4 error that violates no checks.

B Circuits
In this section we give circuits for some of the protocols above.Boxes labelled Enc or Enc denote encoding and decoding circuits, which are Cliffords.The number in the box indicates what code is used.H denotes Haamard, M denotes measurement in Z basis, JM x denotes measurement in X basis, Czs denotes control-Z operations.

C Coincidence among protocols
The Steane code has 7 Y -logical operators of weight 3.In the distillation protocol using the Steane code as the inner code, each logical error may appear in 4 different ways in the column that implements control-H ⊗7 [8].The measurement error at the lowest order can happen in 7 ways.Overall, the cubic error can happen in 7 • 4 + 7 = 35 ways.This number matches the number of logical operators of weight 3 in the Bravyi-Kitaev 15-to-1 protocol [10].
Reichardt [29] has noted this equivalence.When we pipelined [ [7,1,3]] to [ [17,1,5]], there are 48 T gates and 1 T states.The number of logical operators of weight 5 in [ [17,1,5]] is 51.Each logical operator can appear in 16 different configurations in the column that implements control-H ⊗17 .The measurement error from the 17-qubit code routine occurs in 17 ways at the leading order.Thus, the output error Bravyi and Cross [21] gave a recursive construction for triply even codes.They showed how to convert a pair of a (classical) triply even code of length n t−1 with dual distance 2t − 1 and some (classical) self-orthogonal code of length m t with dual distance 2t + 1 into a triply even code of length n t = 2m t + n t−1 with dual distance 2t + 1.The formula gives another coincidence with our pipeline.n t−1 is the number of T gates/states, sitting before the final H-measurement routine in the pipeline, and m t is the code length of the final H-measurement routine.Thus, the recursive formula n t = 2m t + n t−1 correctly counts the number of T gates/states used in the pipeline.
A similar coincidence was observed by Jones [13], where the leading error probabilities of the distillation protocols by a family of weakly self-dual [[k + 4, k, 2]] codes with (k, 0)-magic basis and those by a family of triorthogonal codes [12] are shown to be the same as (3k + 1) 2 .The total number of T gates/states were also the same as 3k + 8.

D Qudits
In this section, we consider an extension to qudits with local Hilbert space dimension p > 2, with p a prime.Previously, Reed-Muller codes over prime fields were used [47,48], but our approach is more efficient.In terms of the scaling exponent γ (see Sec. 4), previous schemes for a fixed p did not achieve γ → 1, whereas our protocols below will.9Specifically, we consider Consider a basis of states |j , where j = 0, 1, . . ., p − 1 is periodic mod p.We use the following operators and phase factor which generate the Clifford group.It holds that ZX = ωXZ.We will work with a generalization of normal codes throughout this section, ignoring hyperbolic codes.One reason is that we cannot achieve control-Swap in the same way as we could previously.The general method in the qubit case was to use some non-Clifford operation such as a T gate, conjugating controlled Pauli to obtain control-Swap on the code space of some code.However, Swap is of order 2 while control-Z is of order p.One might hope to obtain a control-permutation of order p, but we do not consider this possibility.For normal codes, we do not try to implement the control-Hadamard as was done before, because Hadamard is of order 4 for p > 2, and hence is not conjugate to control-Z.

D.1 Preliminary
Let us first define a T -gate [47].The cases p = 3 and p > 3 are going to be different.Define where the second line is because 6 is invertible in F p>3 , and ensures that g is a well defined function on F p>3 .All arithmetic in the exponent of ω, Z, X, and S will be over F p for both p = 3 and p > 3. Define the T -gate as These show that in both cases the T gate is at the third level of the generalized Clifford hierarchy.More generally, we find Any state |ψ m for m = 1, . . ., p − 1 will be a "magic state."How would one use these magic states?Suppose p > 3. Consider a pair of qudits in a state j a j |j ⊗ |ψ m .Apply a control-X operation with the first qudit as source and the second qudit as target.This maps the state to Now measure the second qudit in the computational basis, obtaining a result .This gives a state on the first qudit j a j ω mg( −j) |j .Thus, the transformation implemented on the first qudit is j ω mg( −j) |j j|.Expanding the exponent, we have The first term on the right-hand side of Eq. (D.9) corresponds to an irrelevant global phase factor.The second term, −mg(j), corresponding to implementing transformation T −m on the first qudit.The third term gives a phase factor that can be corrected by applying a power of the S gate and the last term gives phase factors that can be corrected by a power of the Z gate.Thus, the state injection procedure works, in that we can use a magic state |ψ m to produce a transformation T −m up to Clifford corrections.When p = 3, we use the same state injection, with m = 1.One finds after some calculation that if the measurement outcome is = 0, the implemented operations is T −1 to the source, if = 1, it is e −2πi/9 ST −1 , and if = 2, it is e 2πi/9 Z −1 S −1 T −1 .Thus, in all cases, the implemented operation is T −1 up to a Clifford correction.
The injected T gates together with Cliffords form a universal gate set [47,App. D].This is a corollary of [49,Thm. 7.3] that says the Clifford group is a maximal finite subgroup of U (p n ) up to global phase factors, and [50,Cor. 6.8.2] that says any infinite subgroup (even after quotienting out phase factors) containing the Clifford group is dense in U (p n ).
Note that T m and T −m are interconvertible by Cliffords.More generally, it is possible to use Clifford operations to convert a gate T m into another gate T m with m = mn 3 for n = 0, by U = U (n) = j |nj j| gate.For p > 3, we have U † T m U = j ω mg(nj) |j j| where (D.10) , and so indeed T m = C 1 T m C 2 for some Cliffords C 1 , C 2 .For p = 3, we see T = U (−1)T −1 U (−1).Now, for which pairs m, m can we find an n such that m = mn 3 ?The multiplicative group F × p is cyclic of order p − 1.Therefore, when p − 1 is not a multiple of 3, then p is a bijection, and any T m can be interconverted into any other T m .If p − 1 is a multiple of 3, there are three distinct classes of T gates.Since −1 = (−1) 3 , T m and T −m are always interconvertible.

For arbitrary vector
As in the weakly self-dual CSS code construction for qubits, it is straightforward to define a stabilizer code starting from a self-orthogonal subspace S ⊂ S ⊥ ⊂ F n inner p : The stabilizer group is generated by X(v) and Z(v) where v ∈ S. The quotient space S ⊥ /S is in one-to-one correspondence with the set of X-type (Z-type) logical operators, and the induced dot product on S ⊥ /S is non-degenerate.In Section E below, we show that there is a basis {v (1) , . . ., v (k inner ) } of S ⊥ /S such that v (i) • v (j) = α j δ ij where the scalars α j are all 1 possibly except the last one.For simplicity we restrict ourselves to cases where (1, 1, . . ., 1) ∈ S, (D.11) i.e., the second condition is that all scalars α j are equal to 1.The first condition demands that n inner to be a multiple of p.The second is a mild restriction, since (S ⊕ S) ⊥ /(S ⊕ S) always has a basis such that (D.12) holds.Given a basis {v (j) } satisfying (D.12), we define logical operators of the inner code as which indeed obey the commutation relation of the generalized Pauli operators on k inner qudits.Thus, this is a generalization of the normal codes in the qubit case.Due to (D.11), the transversal gate S = S ⊗n inner is a logical operator: where in the second equation the phase factor vanishes when v ∈ S.
We will implement the measurement of the stabilizer T m XT −m of the magic state |ψ m using the inner codes.The measurement becomes feasible if C (T m XT −m ) can be implemented for logical qudits.We begin searching for its fault-tolerant implementation by observing an identity C (T m XT −m ) = T m ( C X)T −m that enables us to implement some controlled Clifford on logical qudits.The actual action on logical qubits depends on the inner code, but our conditions (D.11, D.12) will make it uniform across all logical qudits.
Recall T m XT −m = η −1 S m X where η = 1 if p > 3 and η = e 2πi/9 if p = 3.The action of the transversal gate T m X T −m can be deduced by looking at the logical operators and phase.The answer is Suppose p > 3.In order to implement C ( Sm X), we consider an equation and a solution where the control is common for every gate, and u, x, y, z, s, t are variables.(Using C A = j |j j| ⊗ A j , one can evaluate matrix elements on both sides.)Note that the operators in the brackets are powers of C (S m Z m/2 ).This implies that indeed simultaneous C ( Sm X) on all logical qudits can be implemented using T m ( C X3m/4 ) T −m , T m ( C Xm/4 ) T −m , controlled Pauli logical operators, and a power of Z on the control.
When p = 3 it suffices to consider m = 1.To remove the phase factor η k inner −n inner we require that the k inner is a multiple of 3.This can be achieved by considering three copies of a given code if necessary.n inner is already a multiple of 3 due to (D.11).We can implement We have shown that it is possible to build a fault-tolerant routine to measure T m X T −m .
We have not yet shown how to construct such inner codes.It is possible to generalize Lemma 4.9 to the case of matrices over a field F p for p > 2; however, the generalization is more difficult since the self-orthogonality constraint implies a nonlinear constraint on the rows of the matrix so that each row is null; see Lemma E.4.Let us give an alternative construction which achieves the scaling similar to Lemma 4.11, namely that for any distance d, one can find a family of normal weakly self-dual qudit CSS codes with X( 1) in the stabilizer group such that the ratio k inner /n inner → 1 as n inner → ∞.This construction is derived from Reed-Muller codes.Let C = RM Fp (r, m) be a classical Reed-Muller code over F p ; the codewords have length p m .The dual code is C ⊥ = RM Fp (m(p − 1) − r − 1, m); see Theorem 5.4.2 of Ref. [51].For any fixed r, for large enough m, C ⊂ C ⊥ , so the codespace of C is self-orthogonal, and 1 is in the codespace of C. We use the codespace of C as the space S, and use the CSS construction to define a weakly self-dual code.For fixed r, the rate of C tends to zero at large m, so the rate of the resulting weakly self-dual tends to 1. See Ref. [52] for weakly self-dual qubit codes derived from Reed-Muller codes.To make (D.12) hold, it may be necessary to use S ⊕ S instead of S.

D.3 Outer codes
If the inner code has code distance d, then we should use an outer code with a parity check matrix that is 2 )-sensitive.In full generality, one would want to use a parity check matrix with entries in F p , where an entry β = 0 would mean a stabilizer (η −1 S m X) β .This makes it necessary to have a different logical operator choice than we have used above.
However, a check matrix that is given by the adjacency matrix of a biregular graph with large girth is sufficient for us.Such a check matrix has only 0 and 1 entries, so no other choice of logical operator is necessary beyond what we have given above.Recall that a graph with large girth is locally a tree.Hence, a bad magic state will be caught by many checks because it flips a single stabilizer in these checks, and the required sensitivity is guaranteed.

E Symmetric forms over finite fields
We have classified nondegenerate symmetric forms over the binary field F 2 in Section 3.Over a field of odd characteristic, the set of all finite dimensional vector spaces with nondegenerate symmetric forms (quadratic spaces for short) constitute an abelian group under the direct sum, after identifying hyperbolic planes as the identity.This group is known as the Witt group of the field, and the group structure is well known.Here we present a self-contained and elementary treatment of the Witt group of F p , and classify the quadratic spaces over fields of odd characteristic.A square element, or a square for short, is any member of the set {x 2 : x ∈ F 2 p }.It is natural to distinguish two cases depending on whether −1 ∈ F p is a square, since a one-dimensional quadratic space is classified by p is a cyclic group of order p − 1, the element −1 being the unique element of F × with multiplicative order 2, is a square if and only if p = 1 mod 4.
The part of the argument in Section 3 applies here without any change where we have inductively converted any non-degenerate symmetric matrix to a direct sum of a diagonal matrix and blocks of 0 1 1 0 , which represents a hyperbolic plane.Below, we assume that symmetric matrices are block diagonal in this form.It is then easy to explain why quadratic spaces constitute a group: This means that the one-dimensional quadratic space with form (−a) is the inverse of the space with form (a). It is important here that 2 is an invertible element of the field.We note that the determinant of the symmetric form up to squares is a nontrivial invariant valued in the multiplicative group F × p /(F × p ) 2 which is isomorphic to the additive group Z/2Z.Let α ∈ F p be a non-square.
Case I: p = 1 mod 4 so that −1 ∈ (F × p ) 2 .Consider a block diag(a, a) of the symmetric matrix.Since −1 is a square, we see diag(a, a) diag(a, −a) diag(1, −1) diag(1, 1) under congruent transformations.Therefore, there are four classes of symmetric matrices up to hyperbolic planes: diag(1), diag(α), diag(1, α), and diag (1,1).By looking at the determinant of the form and the parity of the dimension, we see that the four classes are distinct elements of the Witt group, which is hence isomorphic to Z/2Z ⊕ Z/2Z.Given a dimension of quadratic spaces, we see there are only two exclusive possibilities: diag(1, 1, . . ., 1, 1), and diag(1, 1, . . ., 1, α). (E.2) Case II: p = 3 mod 4 so that −1 / ∈ (F × p ) 2 .In this case, we can set α = −1.We claim that diag(1, 1) is not hyperbolic.If v = av 1 + bv 2 is a vector in this two-dimensional space, where v 1 , v 2 are basis vectors with v 2 i = 1 and a, b ∈ F p , then v • v = a 2 + b 2 .Since −1 is not a square, the equation a 2 + b 2 = 0 does not have any nonzero solution, and this proves the claim.Next, we show that diag(1, 1) diag(−1, −1).To this end, we will find a solution to a 2 + b 2 + 1 = 0 over F p .Once we have such a solution, then we see The existence of the solution follows from (F × p ) 2 + (F × p ) 2 ⊆ (F × p ) 2 , which implies that (F × p ) 2 + (F × p ) 2 −1.If (F × p ) 2 +(F × p ) 2 ⊆ (F × p ) 2 , then (F × p ) 2 would be a monoid under addition contained in a finite group, and hence would be a group itself, which must contain 0 / ∈ (F × p ) 2 .Therefore, quadratic spaces given a dimension are classified by the determinant of the form up to squares.
Lemma E.1 (Chapter XV Theorem 10.2 of Ref. [53]).Let Q be a nondegenerate quadratic space.If two subspaces V and U are isomorphic by an isometry σ : V → U , then there exists an isometry σ : Q → Q such that σ| V = σ.
Lemma E.2.Let N be a null subspace (on which the symmetric form vanishes) of a nondegenerate quadratic space Q over F p .Then, Q is isometric to the orthogonal sum of N ⊥ /N and a minimal hyperbolic subspace that contains N .
Proof.Applying Lemma E.1 to the identity map σ, we conclude that any orthogonal set of vectors extends to an orthogonal basis.Since the form is nondegenerate, there exists a minimal hyperbolic subspace that includes N (hyperbolic extension), and the symmetric form can be written as Λ ⊕ λ.where λ is hyperbolic, and Λ is nondegenerate.It is then clear that N ⊥ /N has the symmetric form Λ .Lemma E.4.Let w 1 = 1 ∈ F n p be the all-1 vector where n is a multiple of p ≥ 3. Assume c < (n − 2)/2, and let w 2 , . . ., w c be null vectors of F n p chosen inductively such that w j is chosen uniformly at random from Z(F n p , V j−1 ) where V j−1 = span(w 1 , . . ., w j−1 ).Let M be a c-by-n matrix with rows w j .
Consider a fixed n-component vector v, with v = 0 and v = 1.The probability that M v = 0 is bounded from above by because w j has to be orthogonal to span(v) + V j−1 , which is null and is a proper superset of V j−1 .Thus, for any t, Now assume v • v = 0.The event that v ∈ V ⊥ j happens only if w j is chosen from v ⊥ .We bound the decomposition Pr . The first term is bounded by 1 trivially.For other factors, we observe that the dimension of v ⊥ is n − 1, and a maximal null subspace in v ⊥ has m ≤ m.Under the conditioning v ∈ V ⊥ j−1 , the null space V j−1 is a subspace of v ⊥ , and #Z(v ⊥ , V j−1 ) = ζ(n − 1, m , k j−1 ) ≤ ζ(n − 1, m, k j−1 ).Hence, where in the second inequality, we used the assumption that k j−1 ≤ c < (n − 2)/2 ≤ m.
Let us turn to the second event, assuming v • v = 0. Note that if v ∈ V c , there is a least j such that v ∈ V j .So, Pr[v ∈ V j and v ∈ V ⊥ j−1 and v ∈ V j−1 ]. (E.17) We have Pr[v ∈ V j |v ∈ V ⊥ j−1 and v ∈ V j−1 ], (E.18) where we used Eq.(E.14).The second factor is bounded as because w j belongs to span(v) + V j−1 .Hence, Pr[v ∈ V j and v ∈ V ⊥ j−1 and v / ∈ V j−1 ] ≤ 5( 3 5 ) n−j .So by Eq. (E.17 Summing the probabilities of (E.16) and (E.20), we conclude the proof.
Fig. B.2 in Appendix B.

Definition 4 . 3 .
An m-by-n outer parity check matrix M for a classical linear code is said to be ( d, s)-sensitive if any nonzero bit vector v of length n outer with |v| ≤ d, we have |M v| ≥ s.

. 10 )Lemma 4 . 10 .
By a union bound, adding probabilities in Eqs.(4.7,4.10), the lemma follows.Let n, c, d be such that(2 −n+c+1 + 2 −c+1 )Then, there exists a c-by-n matrix M such that M M T = 0 and such that M v = 0 for any v = 0 with v having Hamming weight at most d.

Lemma 4 . 14 .
Let C be a classical error correcting code that encodes k bit messages into n bit codewords.Let C have distance d.Let v 1 , . . ., v k be a basis for the codewords of C. Let M be the n-by-(k + 1) matrix whose columns are the vectors v 1 , . . ., v k , w where w = v 1 + . . .+ v k .Then, all rows of M have even weight and M is a parity check matrix for a code with n outer = k + 1 bits which is ( d, s) sensitive with s = d and d = n outer − 1.

Lemma 4 . 15 .
Given integers d, w ≥ 1 and s ≥ 2, there exists an m × n outer parity check matrix M that is ( d, s)-sensitive where m = n outer • s/w and every row of M has weight w exactly.
Fig. B.3 in Appendix B.
d n T /n outer n outer Prefactor "

Lemma E. 3 .
Let Q be a nondegenerate quadratic space of dimension n over F p .Every maximal null subspace of Q has the same dimension m.Given any null subspace N of dimension k ≤ m, the number of null vectors of Q that are orthogonal to N is#Z(Q, N ) = p n−k−1 + p m − p n−m−1 =: ζ(n, m, k).(E.5)Proof.To prove the first claim, suppose M, M are maximal null subspaces.If dim M ≤ dim M , then any injection from M to M is an isometry, which can be extended to Q as σ.Then, σ−1 (M ) is a null superset of M , and hence is M itself since M is maximal.Thus, dim M = dim M .Let Z(Q, N ) be the set of all null vectors of Q that are orthogonal to N .(Z is not a subspace in general.)Consider φ :Z(Q, N ) → Z(N ⊥ /N, 0), a restriction of the canonical projection map Q → Q/N .The map φ is surjective by definition of ζ.If x, y ∈ Z(Q, N) are mapped to the same element, then x − y ∈ N .This implies that φ maps exactly #N elements to one.(Here, # denotes the number of elements of the finite set.)Therefore,#Z(Q, N ) = (#N )(#Z(N ⊥ /N, 0)).(E.6)Due to the preceding lemma, the dimension of a maximal null subspace ofN ⊥ /N is m − k.Thus, it remains only to prove the lemma when k = 0 since#Z(Q, k) = p k (p n−2k−1 − p n−m−k−1 + p m−k ) = p n−k−1 − p n−m−1 + p m .(E.7)A definite quadratic space is one in which w • w = 0 implies w = 0.10 To count all null vectors, we work in a basis such that the n-by-n symmetric matrix isΛ = Λ ⊕ Λ 2m (E.8)where Λ definite, and Λ 2m = I m is an orthogonal sum of m hyperbolic planes.In this basis, let us write any vector x as x ⊕ (u, u ).The nullity is then expressed by a quadratic equation of coordinatesx • x + u • u = 0. (E.9)The solutions of this equation are divided into two classes: x • x = 0 or x • x = 0.In the former case, x = 0 and u • u = 0. Given arbitrary u there is u such that this equation holds.The number of solutions is p m + (p m − 1)p m−1 .In the latter case, we must have u = 0, andu • u = c = −x • x = 0 isa inhomogeneous equation in u , whose solution always exists.For any given nonzero c, there are thus (p m − 1)p m−1 choices of (u, u ).x can be any nonzero vector, so there are p n−2m − 1 choices.In sum, the number null vectors in an n-dimensional quadratic space Q over F p is #Z(Q, 0) = p m + (p m − 1)p m−1 + (p n−2m − 1)(p m − 1)p m−1 = p n−1 − p n−m−1 + p m .(E.10) Proof.We will estimate the desired probability by a union bound, considering separately the event that v ∈ V ⊥ c and v ∈ V c , and the event thatv ∈ V ⊥ c and v ∈ V c .The second event is possible only if v • v = 0.By the classification of symmetric forms, a maximal null space of F n p has dimension m such that n − 2 ≤ 2m ≤ n.The assumption that c < (n − 2)/2 implies that k j := dim V j ≤ j ≤ c ≤ m − 1.(E.12)Consider the first event, assuming v• v = 0. Let j > 1.Then Pr[v ∈ V ⊥ j and v / ∈ V j |v ∈ V ⊥ j−1 \ V j−1 ] ≤ ζ(n, m, k j−1 + 1) ζ(n, m, k j−1 )

14 )
For t = c, we find in particular thatPr[v ∈ V ⊥ c and v ∈ V c ] − 1, m, k j−1 ) ζ(n, m, k j−1 ) ), Pr[v ∈ V ⊥ c and v ∈ V c ] ≤ .1) Lemma 3.1 is equivalent to saying that the matrix Λ is non-singular.Any basis change of S ⊥ /S induces a congruent transformation Λ → M T ΛM where M is the invertible matrix of the basis change.We consider equivalence classes of Λ under the congruent transformations.
Proof of Theorem 4.2.In subsection 4.2, we show that, for any d, there exist families of both hyperbolic and normal inner codes with increasing n inner such that k inner /n inner → 1.To prove this theorem, we will only need the result for normal inner codes.Consider some code from this family with given k inner , n inner .In subsection 4.3 we show Lemma 4.15 which we reproduce here: Definition 4.5.A family of quantum error correcting codes with increasing number of qubits n has good rate if the number of encoded qubits k is Θ(n) and has good distance if the distance d is Θ(n).Definition 4.6.Given a family of outer codes with increasing n outer , we say that this family has good sensitivity if each code in the family is ( d, s)-sensitive for d = Θ(n outer ) and s = Θ(n outer ).Definition 4.7.Given a family of outer codes with increasing n outer , we say that this family has good check rate if the parity check matrix is m-by-n outer with m = Θ(n outer ).Lemma.Given integers d, w ≥ 1 and s ≥ 2, there exists an m × n outer parity check matrix M that is ( d, s)-sensitive where m = n outer • s/w and every row of M has weight w exactly.
).For any fixed d, one can find a family of M with increasing n such that the ratio c/n tends asymptotically to zero and such that Eq. (4.11) is obeyed.Hence, for any distance d, one can find a family of hyperbolic or normal weakly self-dual CSS codes such that the ratio k inner /n inner → 1 as n inner → ∞.
19 binary parity check matrices M that are sensitive enough.Those from biregular bipartite graph can be used.The parity check matrices being binary is for technical simplicity, and is not conceptually crucial ingredient, though relaxing this condition might involve complicated calculation.The main difference from the qubit protocols is in inner codes.We only consider analogs of normal codes, where transversal S gates become logical S gates.We show that measurement in the eigenbasis of SX can be implemented fault-tolerantly on an inner code of parameters [[n inner , k inner , d]] p using 4n inner T gates if p > 3, or 2n inner T gates if p = 3.Therefore, the Lemma 4.4 generalizes to odd prime dimensional qudits, where input T count is n outer + 4n inner m if p > 3 where m is the number of checks in the outer code, or n outer + 2n inner m if p = 3.We complement the construction in this section with a probabilistic existence proof for a good family of inner codes in the sense of Section 4; see Lemma E.4 and combine it with the proof of Lemma 4.10.We also point out that quantum Reed-Muller code gives a family whose encoding rate approaches 1 for a given distance.Hence, the statement of Theorem 4.1 remains unchanged in either case where p = 3 or p > 3, but that of Theorem 4.2 for p > 3 becomes that, for odd d ≥ 5 the number of input magic states per output approaches 1 + 4(d − 1)/2 = 2d − 1 in the large code length limit.The input T count per output is still d(1 + o(1)) if p = 3.If one wishes to improve the asymptotic input count for p > 3, then one has to solve an equation analogous to (D.19).