The BQP-hardness of approximating the Jones Polynomial

A celebrated important result due to Freedman, Larsen and Wang states that providing additive approximations of the Jones polynomial at the k'th root of unity, for constant k=5 and k>6, is BQP-hard. Together with the algorithmic results of Freedman et al and Aharonov et al, this gives perhaps the most natural BQP-complete problem known today and motivates further study of the topic. In this paper we focus on the universality proof; we extend the universality result of Freedman et al to k's that grow polynomially with the number of strands and crossings in the link, thus extending the BQP-hardness of Jones polynomial approximations to all values for which the AJL algorithm applies, proving that for all those values, the problems are BQP-complete. As a side benefit, we derive a fairly elementary proof of the Freedman et al density result, without referring to advanced results from Lie algebra representation theory, making this important result accessible to computer science audience. We make use of two general lemmas we prove, the Bridge lemma and the Decoupling lemma, which provide tools for establishing density of subgroups in SU(n). Those tools seem to be of independent interest in more general contexts of proving quantum universality. Our result also implies a completely classical statement, that the_multiplicative_ approximations of the Jones polynomial, at exactly the same values, are #P-hard, via a recent result due to Kuperberg. Since the first publication of those results in their preliminary form (arXiv:quant-ph/0605181v2), the methods we present here were used in several other contexts. This paper is an improved and extended version of the original results, and also includes discussions of the developments since then.


Introduction
What is the computational power of quantum computers? This question is fundamental both from a computer scientist as well as a physicist points of view. This paper attempts to improve our understanding of this question, by studying the only non-trivial BQP-complete problem known to us today: the problem of approximating the Jones polynomial.
The Jones polynomial, discovered in 1985 [4], is a very important knot invariant in topology. Its importance was manifested in connections to numerous areas in mathematics, from the statistical physics model known as the Potts model to the study of DNA folding. Among its many connections, an extremely important one was drawn by Witten in 1989 [5], to quantum mechanics; more precisely, to Topological Quantum Field Theory (TQFT). Witten showed how the Jones Polynomial and other topological invariants of links naturally appear in the Wilson lines of the SU (2) Chern-Simons Topological Quantum Field Theory.
About a decade later, TQFT entered the scene of quantum computation, when Freedman suggested a computational model based on this theory [6]. The works of Freedman, Kitaev, Larsen and Wang [2,1,7] showed an equivalence between the TQFT model and the standard model of quantum computation.
On one hand, they gave an efficient simulation of TQFT by a quantum computer [1]. On the other hand, they showed that quantum computation can simulate TQFT efficiently [2]. These results draw interesting links between quantum computation and the Jones polynomial. The simulation of TQFT by quantum computers implicitly implies the existence of a quantum algorithm for the Jones polynomial approximation problem at the fifth root of unity e i2π/k , via the results of Witten; The simulation of quantum computers by TQFT implicitly implies the that the Jones polynomial approximation problem is BQP-hard. However, these important results were stated in TQFT language, and were not stated or proved explicitly in the above set of works. A clear statement of some of the algorithmic result, in a computational language, was given in [8], but without an explicit algorithm.
Only recently Aharonov, Jones and Landau [3] provided an explicit and efficient quantum algorithm for the problem of approximating the Jones polynomial at those roots of unity. The algorithm uses a combination of mathematical results of over 20 years ago due to Jones. In particular, the main ingredient of the algorithm is a unitary representation of the braid group called the path model representation, in which braids are mapped to operators acting on a Hilbert space, whose basis vectors are indexed by paths on a certain graph. This gave a simple to state algorithm for the problem, bypassing the TQFT language altogether. In the other direction, the universality proof due to [2] can also be made explicit in the standard quantum model language, without referring to TQFT. To do this one needs to encode the space of n qubits into the space of paths; a simple way to do it, encoding one qubit in four steps of a path, was independently discovered by by Kitaev [9], and Wocjan and Yard [10]. The latter also defined a universal set of gates that acts on these paths, and proved that certain additive approximations of the Jones polynomial are BQP-complete.
The above two results together give an explicit proof that the problem of approximating the Jones polynomial at the fifth root of unity, and in fact, for any primitive root of unity e 2πi/k , for constant k > 4, k = 6, 10 is BQP-complete. This is the only non-trivial BQP-complete problem known today 1 .
This fact highlights the importance of this problem in the context of the quantum computational power, and motivates deeper investigation of this topic. One natural direction to try to take is generalizations of the algorithms. Initial steps in this direction were made in [11,10]. Another interesting direction is to study the reasons for BQP-hardness, and its range of applicability. This is the path we take in this paper.
One natural question is the following. It turns out that the algorithms given in the work of Aharonov et. al work not only for constant k, but also for asymptotically growing k's. To be more precise, [3] gives an efficient quantum algorithm to approximate the Jones polynomial of an n strands braid with m crossings at a primitive root of unity e 2πi/k , where the running time of the algorithm is polynomial in m, n and k. The algorithm is therefore efficient even if k grows polynomially with n. On the other hand, we only know that the constant k case is BQP-hard. Therefore, in [3] the following natural question was raised: what is the complexity of approximating the Jones polynomial for polynomially bounded k? It was left open whether it is BQP-hard, doable in BPP, or maybe somewhere in between.
In this paper we resolve this question, and show that it is also BQP-hard: We outline the difficulties in the proof, and our methods to overcome them, below. We hope that this paper makes a significant step towards better understanding of the complexity of the Jones polynomial approximation problems, and of quantum computational complexity in general. In particular, we hope that the techniques we developed here would help in proving BQP-hardness of future quantum algorithms to follow.

Proof Outline
Given an algorithm that calculates the Jones Polynomial of any link at e −2πi/k (for some integer k > 4 and k = 6, 10) in polynomial time in the number of crossings in the link, and a classical Turing machine -we can simulate a Quantum computer efficiently. How is that possible? The key idea here, which is demonstrated in [3], is to use an intimate connection between two, seemingly distinct, worlds: links and unitary matrices. The connection is the so-called "path representation" which is a unitary representation of the braid group. For a fixed k, the kth path model representation of the braid group maps every braid to a unitary operator on a Hilbert space spanned by paths on a certain graph which depends on k. In particular, each generator of the braid group (which is simply one crossings of two adjacent strands) is mapped to a certain (k−dependent) unitary operator on the space of paths. The BQP-hardness proof boils down to showing that a general quantum gate can be approximated efficiently using these unitary operators.
Let us first review the easier constant k case. Here, the main difficulty in the proof is to show that the relevant set of operators is dense in some subgroup of the unitary group. Once this is shown, it is standard to apply the famous Solovay-Kitaev theorem to show that density implies efficiency. In other words, once the subgroup is dense, then Solovay-Kitaev give a method to approximate every gate in the quantum circuit by a short sequence of generators, and universality follows.
But how does one prove the density? The starting point is the fact that Kitaev's four steps encoding encodes two-qubit gates into the unitary group operating on the space spanned by 14 paths. Density thus means that we can approximate any matrix in SU (14), using our k-generators. The idea here is to first restrict attention to some two dimensional subspace, and show density in SU (2). This was essentially done by Jones [12]. We then gradually increase the dimensionality of the space on which we have density, to SU (14), by adding one or more dimensions at a time; to this end we introduce two general and very useful lemmas: the bridge lemma 4.1 and the decoupling lemma 4.2. This completes the density proof of the constant k case; we get an almost self-contained, fairly elementary proof.
We would now like to move to the asymptotically growing k case. Here, however, there is a subtle point in the above line of arguments. Indeed, density still holds. But the step of density implies efficiency fails. The starting point of the Solovay-Kitaev theorem is the construction, using the set of generators, of an epsilon-net in the unitary group, where epsilon is some small enough constant. Such an epsilon-net is easy to construct, given a fixed set of generators that span a dense subgroup -essentially, brute force would do the trick. However, if k is asymptotically growing in n, the set of generators is no longer fixed.
It is no longer clear that the very first step of the Solovay-Kitaev theorem, that of creating the epsilon-net, can be done efficiently.
We give here a very rough sketch of how we overcome this difficulty. We first choose a large enough but fixed k 0 . We now modify the k 0 -generators as follows: we leave their eigenvalues as they were, but change their basis to the basis of eigenvectors of the generators corresponding to the case k → ∞. This change of basis turns out not to affect the fact that the set of generators generates a dense subgroup.
Since the new set of generators is fixed (independent of n), we can find an epsilon-net which consists of these modified k 0 -generators simply by brute-force. The advantage of using the modified generators as the building blocks of our epsilon-net is that the eigenvectors of the modified generators are very close to the eigenvectors of generators corresponding to very large k's. If we want to create an epsilon-net using k-generators for very large k's, we only need to approximate the modified generators, using these k-generators. Since the eigenvectors more or less agree, we only need to approximate the eigenvalues, which can be done by taking the appropriate power of the relevant k-generator. We get an efficient construction of an epsilon-net consisting of k-generators. We can now apply the Solovay-Kitaev theorem using this net.
We now proceed to the detailed proof, but before this, we first need to review some basic facts about the Braid group, and introduce the path model representation. Loosely speaking, a braid is a set of n strands that connect two horizontal bars, such that each strand is tied exactly to one peg on the top bar and one peg on the bottom bar. When drawing the braid schematically on a paper, the strands may pass over and under each other, but at any point they must not be completely horizontal. Braids which can be deformed into each other without tearing any of the strands are considered identical. An illustration of a 4-strand braid is given in Fig. 1.
The set of all braids with n strands forms an infinite and discrete group which is called the braid group B n . The product rule for b 1 b 2 is defined by placing the braid b 1 above the braid b 2 and fusing the bottom of the b 1 strands with the top of the b 2 strands. The identity element is the braid with n straight lines that connect each peg at the bottom bar to its corresponding peg at the upper bar.
In [13], Artin proved that B n admits a finite presentation (the Artin presentation) with n−1 generators {σ i } that satisfy the following constraints:  Pictorially, σ i is a braid that is identical to the unity braid in all strands except for the i and i + 1 strands which cross each other once (the i + 1 → i strand goes over the i → i + 1 strand), connecting the lower i'th peg to the upper i + 1 peg and vice versa. An illustration of σ 2 is given in Fig. 2. It is an easy exercise to verify graphically that the braid generators indeed satisfy (1), (2).

The path representation
The path representation is a unitary representation of the braid group B n that depends on an integer k ≥ 3. The image of every b ∈ B n under this representation is denoted by ρ(b) and it acts on a finite Hilbert space. To understand the structure of this space, we introduce the graph G k which is a set of k −1 sites (vertices) and k − 2 edges that connect them. The sites are ordered from bottom to top one above the other, as described in Fig. 3. To each site we assign a number according to its position; the number 1 is assigned to the bottom site while the number k − 1 is assigned to the top site. We then consider all possible n-steps walks (paths) over the graph G k which start at site 1 and never leave G k . With each To define ρ, it is enough to specify its action on the braid group generators σ i , which we denote by We describe ρ i in terms of partial paths: if p is a bit string that corresponds to a path, then p| i denotes the first i − 1 bits of the string and p| i denotes the last bits of the string starting from the i + 2 bit. For example, if p = "11100101", then p| 3 = "11" and p| 3 = "0101", hence p = p| 3 10 p| 3 . Next, we let z i denote the position on the graph after the path p| i was taken. So for p = "11100101", we have z 1 = 1, z 2 = 2, z 3 = 3, z 4 = 4, z 5 = 3 etc. Finally, we define Notice that definition (4) is for ℓ = 1, . . . , k − 1. For ℓ < 1 or ℓ > k − 1, we set λ ℓ = 0.
Using this notation we define the operators Φ i with i = 1, . . . , n − 1: for every initial/final strings p i , Then ρ i are given by In [3] it is shown that that these operators indeed form a unitary representation of the braid group.
Notice that the operators ρ i are invariant under the same subspaces of the Φ i operators. Specifically, they multiply the vectors |p| i 00 p| i , |p| i 11 p| i by the phase A −1 , while mixing the vectors |p| i 10 p| i , |p| i 01 p| i . Therefore in the standard basis, ρ i breaks into one-dimensional and two-dimensional blocks (but notice that these are different blocks for different operators). The two-dimensional blocks mix the |p| i 10 p| i , |p| i 01 p| i vectors whose corresponding paths end at the same point. It follows that if |p corresponds to a path that ends at ℓ, then ρ i |p can be written as a linear combinations of vectors that correspond to paths that end at the site ℓ. This is true for all ρ i , hence the path representation breaks into representations over subspaces that correspond to paths that end at a particular ℓ. We denote these subspaces by H n,k,ℓ .

From braids to links to Jones Polynomial
As stated in the beginning of this section, the path representation provides a connection between the Jones polynomial of links to unitary matrices. We will now explore this connection in some depth. In particular, we will see that quantum mechanical expressions of the form 0|U |0 , which are sufficient to describe quantum computation, can be given in terms of the Jones polynomial.
We first notice that a braid can be transformed into a link by connecting its open endpoints. Such an operation is called a closure, and here we focus on one particular closure: the plat closure. This closure is defined only for braids with an even number of strands. It is the link that is formed by connecting the top pegs with odd numbers with the peg to their right, and doing the same with the bottom pegs. The plat closure of a braid b ∈ B n is denoted by b pl . Figure 4 shows the plat closure of the 4-strand braid from Fig. 1. It turns out that there is a very strong connection between the path representation of a braid and the Jones polynomial of its plat closure. In [3] it was shown that the Jones Polynomial of the plat closure of every b ∈ B n can be given by a "sandwich" product of the operator ρ(b) with a special vector |α ∈ H n,k . Specifically, let V b pl (·) denote the Jones polynomial of b pl , and |α be the "zig-zag" vector that corresponds to a path in which the first step is up, the second is down, the third is up etc. Then according to [3], the following equality holds: The exact definition of these two quantities is given in [14,3]. The other quantities in the formula are λ ℓ and A which are defined in Eqs. (4,5) respectively, and N and d which are defined by Note that the writhe is a trivial function of a link, as it is basically a sum over all its crossings. The Jones Polynomial, on the other hand, is a very non-trivial function of the link; the best known classical algorithms for its calculation are exponential in the number of crossings.
Notice also that these two quantities are only defined for oriented links, and therefore we must choose some orientation for b pl to calculate them. It does not matter, however, which orientation we pick since is independent of the orientation (in agreement with the LHS of the equation). It is actually the Kauffman bracket b pl (A) of the link b pl -which is also a polynomial of the link. For further details we refer the reader to [14].

Simulating a quantum computer
With the path representation at hand, together with Eq. (12), we can devise a plan to simulate a quantum computer. The idea is straightforward: given a description of a quantum circuit U over n qubits, which is a poly(n) sequence of unitary gates from a universal basis, we find a braid b such that Here the ≃ sign means that we can approximate the RHS by the LHS to any desired accuracy. Thus if we have an oracle that, given b, tells us the Jones polynomial V b pl (e −2πi/k ), then from Eq. (12) we can easily find the LHS of Eq. (15), which approximates 0|U |0 . This is sufficient to simulate quantum computation because of the following reasoning. Suppose we want to know the probability that a given quantum circuit Q outputs 0 in its last qubit when applied on an input string X and then measured. We replace Q by a circuit U with one more qubit, which starts by applying N OT gates on the qubits that are 1 in the input string X, then applies Q, then copies the last qubit of Q to the additional qubit by a CN OT gate, and then applies Q in reverse, and also reverses the initialization of the input qubits to get 0. It is a simple algebra exercise to show that 0|U |0 is equal to the probability to get 0 in the original circuit Q applied to X.
The algorithm for finding the braid b, must be polynomial in n (the number of qubits in U ) if we want our simulation to polynomial. Additionally, if we assume that we can calculate the Jones polynomial in a time which is polynomial in the number of its crossings, then we must also assure that the number of crossings in b is polynomial in n, in order to keep the overall simulation polynomial.

The 4-steps encoding
We would now like to find a braid b that corresponds to the operator U as in Eq. (15). We first simplify the matrices we deal with. It is easy to verify that any quantum circuit can be translated, with a linear overhead, to an equivalent quantum circuit which acts on qubits arranged in a one-dimensional array, and applies gates only on nearest neighbor qubits. We therefore assume that U is given by a product of poly(n) basis elements U = U L · . . . · U 1 , (L = poly(n)) each of them acting non-trivially only on two adjacent qubits.
The next step is to encode U in path space as it is the space on which the operators ρ(b) act. One such convenient encoding represents every qubit by a 4-steps path: Then a string of n encoded qubits is encoded as a 4n-steps path in H 4n,k , and every gate U is encoded as a 2 4n × 2 4n matrix U : Here |i denotes the 4n-steps path that encodes the qubits in the original vector |i . Notice, however, that this is not an arbitrary path in H 4n,k as it returns to the initial site every 4 steps. We denote the subspace that is spanned by all these paths by S.
Next, the product U = U L · . . . · U 1 naturally translates to U = U L · . . . · U 1 and so by finding braids Consequently we would have: In fact, we will not be so ambitious; we will only require that ρ(b i ) ≃ U i on the subspace S, and show that this suffices.
The advantage of using this particular encoding is that, together with the tensorial structure of the qubits, it allows us to concentrate on the "reduced" braid group B 8 instead of the larger group B 4n .
Let us explain exactly what is meant by that. Suppose that we wish to perform an operation on the s, s + 1 encoded qubits of some path |p ∈ S. Then we must use a braid b ∈ B 4n that mixes the 8 strands 4(s − 1) + 1 → 4(s + 1) while being trivial on the rest. However, since |p ∈ S, its path reaches the first site before the 4(s − 1) + 1 and 4(s + 1) + 1 steps. Therefore the three partial paths that are defined by the steps 1 → 4(s − 1), 4(s − 1) + 1 → 4(s + 1) and 4(s + 1) + 1 → 4n are all legitimate paths over the graph G k (i.e., they start and end at the first site and never leave G k ). We denote these partial paths by p 0 ,p and p 1 respectively, and write |p = |p 0 ⊗ |p ⊗ |p 1 . Notice also that |p ∈ H 8,k,1 . We will add a tilde to all vectors and operators that act on that space. In particular, we defineb ∈ B 8 to be the "reduced" version of b, created by the 8 non-trivial strands of b.
It is now easy to verify that This follows from the definition of the generators ρ i in Eqs. (6-9, 10) which only depend on z i -the position of the path after i − 1 steps, and not on the index i itself.
We will devote most of the remainder of the paper to proving this theorem. Before we start, however, let us see how, together with Eq. (20), it can be used to construct the appropriate braid b ∈ B 4n in Eq. (19).
Let U = U L · . . . · U 1 with U i being local two-qubit gates, and let ǫ > 0 be an arbitrarily constant.
For every U i we use the theorem to construct a braidb i ∈ B 8 , with δ = ǫ/L, and extend it into a braid b i ∈ B n by adding identity strands at the appropriate places. Finally, b is taken to be the product of these b i 's. We have Proof: The claim is easily proved by induction. Indeed, assume that and define |β def = U i−1 · · · U 1 |α . It is easy to verify that any encoded gate U sends the subspace S into itself and therefore |β ∈ S. Consequently where the first equality follows from the reduction in Eq. (20) and the second inequality follows from the way in which we constructedb i ∈ B 8 . Then using the induction assumption together with the triangle inequality, we get This shows that the braid b = b L · · · b 1 satisfies α|ρ(b)|α ≃ α|U |α as required by Eq. (15) and Eq. (19). By theorem 3.1, the number of braid generators that are needed to approximate every gate is of the order poly(k, 1/δ), and they can be found in time poly(k, 1/δ). Since we have L = poly(n) gates then the total number of generators is poly(n, k, 1/δ). Finally, as we assume that k = poly(n) then the total number of generators is poly(n, 1/δ). This is exactly the number of crossings in the resultant link since every generator contains exactly one crossing. We have thus proved Theorem 1.1, which states that approximating the Jones polynomial of a plat closure of a braid with m crossings in e 2πi/k , with both m and k polynomially bounded in n -is BQP-complete.
In the next section we will prove theorem 3.
It is therefore an eigenvector of ρ i with eigenvalue The vectors |p| i 01 p| i , |p| i 00 p| i do not exist since they represent a path that leaves the graph. Finally the vector |p| i 11 p| i is nullified by Φ i and is therefore also an eigenvector of ρ i with the eigenvalue A −1 .
Next, consider the z i > 1 case. Here Φ i nullifies the |p| i 00 p| i , |p| i 11 p| i vectors while mixing |p| i 10 p| i , |p| i 01 p| i . Its matrix in that block is given by It has two eigenvalues: 0 and 2 cos θ, and consequently the eigenvalues of ρ i in that block are A −1 and −A −1 e −i2θ -the same eigenvalues of the z i = 1 case. In fact, it is not hard to see that all the ρ i operators are equivalent to each other.
The 2 × 2 matrix that transforms the standard basis elements {|p| i 10, p| i , |p| i 01 p| i } to the eigenvectors basis is Inside that subspace we have The other vectors {|p| i 00 p| i , |p| i 11 p| i }, if exist, are trivial eigenvectors of ρ i with eigenvalue A −1 .
Finally, we note that for k > 5, H 8,k,1 consists of exactly 14 paths 2 and hence it is a 14 dimensional space. These paths are shown graphically in Fig. 5. We identify each of them with a basis vector and label them by the numbers 1, . . . , 14. Using this labeling we write down the block structure of the seven generators ρ 1 , . . . , ρ 7 in Table 1. For each operator, the table lists the non-trivial blocks (i.e., blocks where Φ i does not vanish). These are specified by brackets that surround the vectors that define them.
The one-dimensional blocks correspond to the z i = 1 case while the two-dimensional blocks correspond to the z i > 1 case.

Proving the density
We will now prove the density part of Theorem 3.1. We will show that the seven operators ρ i can approximate any special unitary matrix on H 8,k,1 , provided that k > 4 and k = 6, 10. As it is a 14dimensional space, we are interested in matrices U ∈ SU (14) 3 .
We begin by considering the action of ρ 1 and ρ 2 on this subspace. From    trivial A −1 phase on the rest. In these blocks, the ρ 1 operator is represented by (i), whereas the ρ 2 operator is represented by (i, j). Additionally, the action of the operators on all five blocks is equivalent.
The following theorem assures us that in each such block we may approximate any SU (2) matrix.
Proof: Given in appendix A.
Next, consider what happens when we are also allowed to act with ρ 3 . Looking at Obviously, we can still approximate any SU (2) matrix in the 2 × 2 blocks. The next lemma provides a way to increase the dimensionality of the space on which we have density, in the following way: We start with SU (A) and SU (B), for two orthogonal subspaces A, B. If we also have a bridge, namely, some operator which mixes the two subspaces, then we have density in SU (A ⊕ B). This general lemma is very reminiscent of a lemma which appeared in an early version of [15]. Its proof uses a combination of ideas by Aharonov and Ben-Or [15] and from Kitaev [16]. Proof: Given in appendix A.
The bridge lemma implies that it is also possible to approximate any SU (3) matrix in the 3 × 3 blocks.
As an example, consider the {5, 6, 10} block. From Theorem 4.1 we already know that we are able to approximate any SU (2) transformation on the {5, 6} block, and now we have the transformation ρ 3 that mixes this subspace with the one-dimensional subspace of that is spanned by tenth vector. Lemma 4.1 therefore guarantees that together they can approximate every transformation in SU (3).
In the above reasoning there are two small cavities that are worth mentioning, since they will appear in the rest of the proof. Firstly, the mixing transformation ρ 3 does not belong to SU (3). This, however, is not a real problem, as we can always consider the transformationρ 3 def = cρ 3 with c some phase such [ ρ 1 , ρ 2 ,ρ 3 , ρ 1 , ρ 2 ,ρ 3 ] is dense in SU (3). But the last group is equal to [ ρ 1 , ρ 2 , ρ 3 , ρ 1 , ρ 2 , ρ 3 ] since the group bracket cancels out the phase c and therefore also ρ 1 , ρ 2 , ρ 3 is dense in SU (3).
Secondly, we know we can approximate any transformation in SU (2) while Lemma 4.1 assumes that we can get any transformation in SU (2) precisely. But since the approximation is made of a finite product of operators, all of which can be approximated as accurately as desired by ρ 1 , ρ 2 , ρ 3 , it follows that we can also approximate any transformation in SU (3) to any desired accuracy.
Naturally, the next step is to consider what happens when we are also allowed to act with ρ 4 . From and vice versa.
Proof: Given in appendix A.
It is therefore clear that we are able to approximate any SU (5)  Finally, by using ρ 6 , we mix the five-dimensional block {1, 2, 5, 6, 10} with the nine-dimensional block from above -thereby approximating any transformation in SU (14). This completes the density proof.

Proving the efficiency for the k = const and k = poly(n) cases
Having proved that the operators ρ 1 , . . . , ρ 7 can approximate any matrix U ∈ SU (14), the efficiency of the process for a constant k follows easily. We want to be able to approximate any U ∈ SU (14) up to some error δ using poly(1/δ) operators in poly(1/δ) steps. This, and actually much more, can be achieved using the Solovay-Kitaev algorithm [16]. The algorithm relies on the fact that ρ 1 , . . . , ρ 7 are dense in SU (14), and provides us with a δ-approximation to U that consists of no more than poly(log(δ −1 )) operators in poly(log(δ −1 )) steps! One should be aware, however, that the Solovay-Kitaev algorithm contains an initial step, where an ǫ-net is constructed. This is a finite set of operators that is generated by ρ 1 , . . . , ρ 7 and has the property that every operator in SU (14) is closer than ǫ to at least one of the elements of the net. ǫ is a finite constant which is unrelated to the target accuracy δ, and whose actual value is of the order 10 −2 (see, for example, Ref [17]). The existence of such a net is guaranteed since we know that ρ 1 , . . . , ρ 7 generate a dense set in SU (14); its construction takes a constant time. We note that this constant may be very large since we might need to use brute force. Fortunately, the same ǫ-net can be used in all calculations, and we can treat its construction as a pre-computational stage.
The situation becomes more tricky when we let k be polynomially dependent on n. In such case k, and consequently the operators ρ 1 , . . . , ρ 7 are no-longer constant, and a new ǫ-net has to be reconstructed for every n. Then we can no longer treat the ǫ-net construction as a constant step, and its complexity must be taken into account. The question is therefore whether we can still guarantee that the overall computational cost is polynomially bounded in k and in log(δ −1 )? The answer is yes. The idea is to construct an ǫ-net for some k 0 and then use it to efficiently generate ǫ-nets for any large enough k.
We begin by considering the k → ∞ limit of the ρ i operators. From Sec. 4.1, we recall that in the standard basis these operators decompose into 1 × 1 or 2 × 2 blocks. The diagonalizing matrix of the 2 × 2 blocks is V k (z), given by Eq. (27). Here, and in what follows, we explicitly added the subscript k to V (z) to indicate its dependence on k. In the limit k → ∞, we have θ → 0 and we get We now pick a finite k 0 -say, k 0 = 7 -and construct a new set of generators {ρ 1 ,ρ 2 , . . . ,ρ 7 } by going through the 2 × 2 blocks and replacing each finite V (z i ) by V ∞ (z i ) while keeping the same eigenvalues. It is easy to see that also this set of operators generates a dense set in SU (14). Indeed, the density proof in  (14)). Then for large enough k, by replacing every occurrence ofρ i inÊ by ρ 2m i , with m = O(1/k), we obtain a new net, E k , which is an ǫ-net.

Proof:
Let d be the maximal number of generators that are needed to construct an element inÊ. We wish to be able to approximate anyρ i up to at least ǫ/2d using ρ i . The first thing we take care of is that V k (z) will be close enough to V ∞ (z). We therefore pick an integer K 1 such that for any k > K 1 , Next, we must find a K 2 such that for any k > K 2 , the eigenvalues of ρ 2m i will be close enough toρ i , for some yet to be determined m. This is more conveniently done by defining and approximating the operators P i with Q i . In the end, the factors A(k 0 ), A(k) will cancel out when we plug these operators to the group commutator of each element inÊ. The logic behind these definitions is that these factors cause one of the eigenvalues of both P i and Q i to be exactly one (see Eq. (10)), and therefore we only have to match the remaining eigenvalues. Indeed, the non-trivial eigenvalue of P i is e −iπ(2+k0)/k0 , whereas the non-trivial eigenvalue of Q i is e −4πi/k . We therefore define and let K 2 be such that for every k > K 2 e −iπ(2+k0)/k0 − e −4mπi/k < ǫ/(6d) .
Assume then that k > max(K 1 , K 2 ) and let us estimate the distance between P i and Q m i . This is the maximal distance between the corresponding blocks in the standard basis. In the 1 × 1 blocks both operators have an eigenvalue 1 and therefore the distance is zero. In the 2 × 2 blocks we have and consequently therefore ||P − Q m || < ǫ/(2d).
Let us now return to theÊ-net and create the E k net. Any element inÊ is a commutator of products ofρ i , and therefore remains unchanged if we replaceρ i → P i , because the phase factors cancel out in the commutator. The distance of this product from a product in which we replace P i → Q m i is smaller than ǫ/2 since ||P i − Q m i || < ǫ/(2d) and we have at most d terms in the product. The Q m i 's product is unchanged upon the replacement Q m i → (ρ i ) 2m (again, the phase factors cancel out), thereby creating the E k net. Hence any element inÊ can be approximated up to a distance ǫ/2 by an element of E k . It follows that E k is an ǫ-net.
It follows that we can create an ǫ-net from the operators ρ i , and because m < k, the number of steps that are needed to create this net is bounded by poly(k). The next step would be the application of the Solovay-Kitaev algorithm to approximate any transformation U ∈ SU (14) up to an error δ -and so the overall computational cost is bounded by poly(k, 1/δ) as required.

A Proofs
A.1 Proof of Theorem 4.1

Proof:
Since ρ 1 and ρ 2 are not in SU (2), we will look at their images under the canonical homomorphism U (2) → SU (2) which takes V ∈ U (2) to (det V ) −1/2 V , and prove that these images form a dense set in SU (2). Then using the fact that [SU (2), SU (2)] = SU (2) it will follow that also ρ 1 and ρ 2 generate a dense set in SU (2).
We first use the fact that group which is generated by ρ 1 , ρ 2 is infinite as long as k > 2 and k = 4, 6, 10.
This results was proved by Jones 4 in 1983 and appears in Theorem 5.1 page 262 in ref [12]. It uses the canonical homomorphism between SU (2) and SO(3) and the well known classification of all finite subgroups in SO(3).
To approximate any element in SU (2) to within an ǫ, we pick two matrices in g 1 , g 2 ∈ G such that ||g 1 − g 2 || < ǫ/3 (we can do that since G has an infinite number of elements and SU (2) is compact), and we define g def = g 1 g −1 2 . Then obviously ||g − ½|| < ǫ/3. Consequently, if e ±iλ are the eigenvalues of g then |e ±iλ − 1| < ǫ/3. Also g must be non-commuting with at least one of the matrices ρ 1 or ρ 2 which we shall denote by T .
Let U be the diagonalizing matrix of g: g = U −1 diag{e iλ , e −iλ }U , and define the two continuous families of matrices Then it is easy to see that any matrix V ∈ SU (2) can be presented as the product R(α)S(β)R(γ) for a suitable choice of α, β, γ ∈ Ê (see, for example, Kitaev [16]). But since |e iλ − 1| < ǫ/3 then any member in the families R(·), S(·) can be approximated by multiplications of g and σ up to a distance of ǫ/3, and therefore the multiple R(α)S(β)R(γ) can be approximated to within ǫ.

A.2 Proof of Lemma 4.1 (The Bridge Lemma)
To prove lemma 4.1 we first need to prove the following two lemmas: Lemma A.1 Consider a linear space C which is a direct sum of two subspaces A and B such that dim B > dim A ≥ 1, and let W ∈ SU (C) be some transformation that mixes the two subspaces. Then for every pair of normalized vectors |ψ , |φ ∈ C, we can approximate a transformation T ψ→φ ∈ SU (C) such that T ψ→φ |ψ = |φ , to any desired accuracy using a finite product of transformations from SU (A), SU (B) and W . Proof: Instead of approximating the transformation T ψ→φ for any two vectors |ψ , |φ , we will approximate a transformation T ψ that transforms a particular vector |v * to an arbitrary vector |ψ . Then obviously, We begin by finding a vector |v * ∈ B for which W |v * ∈ B. Such vector must exist since dim B > dim A. Indeed, let |v 1 , . . . , |v n be a basis of B.
Then since dim B > dim A, the |u ′ i vectors are linearly dependent and we can find a non-trivial linear combination such that i c i α i |u ′ i = 0. Then the vector |v * def = c i |v i is in B and W |v * has no projection on A.
Next, we pick a vector |u * ∈ A for which W |u * has some projection on B. Such vector must exist because if W A ⊆ A then by unitarity W A = A. This implies W B = B, again by unitarity -contradicting the assumption that W mixes the two subspaces.
Now let |ψ = α|u 0 + β|v 0 be an arbitrary vector, with |u 0 ∈ A and |v 0 ∈ B. We will now apply a series of unitary operations that will take |ψ closer and closer to |v * . We start by moving |u 0 to |u * and |v 0 to |v * using transformations from SU (A) and SU (B) respectively, yielding the vector As |v ,W |v * ∈ B, we can now apply a transformation from SU (B) that takes αb|v +W |v * to c 2 |v * .
Here c 2 is the norm of αb|v + W 2 |v * , which is usually smaller than one. We obtain Notice that comparing |ψ 2 to |ψ 1 , we see that we managed to move some of the weight from |u * to |v * . We iterate this process. We get |ψ ′ 2 by applying the W 2 transformation on |ψ 2 , and obtain |ψ 3 by moving the B part of |ψ ′ 2 to |v * . After n such iterations we obtain |ψ n = αa n−1 |u * + c n |v * , and since |a| < 1 it is obvious that we exponentially converge to |v * , and in particular we can approximate T ψ (and hence T ψ→φ ) to any desired accuracy using a finite number of transformations.
To continue, we need to be able to move a vector from subspace A to subspace B without affecting the rest of the vectors in subspace A. The following Lemma guarantees that this is possible. By Lemma A.1 we can approximate a transformationT that takes |u n to |v 1 . Consider now all the operators of the form W =T −1 U V ′T with U ∈ SU (A) and V ′ ∈ SU (B ′ ). Clearly, W takes |u n to itself -and therefore leaves invariant the subspace A ′ ⊕ B. We claim that there is at least one such transformation, W (1) , that also mixes the subspaces A ′ and B. If this is indeed the case, then we can repeat the argument for the subspaces A ′ and B, which together with the particular transformation W (1) satisfy the conditions of Lemma A.1. Consequently, we find a transformation W (2) that takes |u n−1 to |v 1 while leaving |u n unchanged and mixing the space that is spanned by |u 1 , . . . , |u n−2 with B.
Repeating this again and again, we are left in the end with a transformation W (n) that transforms |u 1 to |v 1 and is the identity over |u 2 , . . . , |u n . Since this recursion has only n steps, it is clear that at any step we can approximate the mixing transformation W (i) to any desired accuracy using a finite product of operators from SU (A), SU (B) and W .
Let us now see why W (1) must exist. Indeed, if no such transformation exists, then for every two operators U ∈ SU (A) and V ′ ∈ SU (B ′ ) there is no mixing between the subspaces A ′ and B, and therefore there must exist operators U ′ ∈ SU (A ′ ) and V ∈ SU (B) such that Then, for every |u ∈ A and |v ∈ B, which implies that for every U ∈ SU (A) and V ′ ∈ SU (B ′ ), Notice that this equation holds also when we take one of the operators, U or V ′ , to be the identity. We will use this to show thatT A = A -in contradiction with the fact thatT |u 1 = |v 1 .
We first deduce thatT A ′ ⊂ A. To do this we show that for all |u ∈ A ′ ,T |u has no projection on B.
We already knowT |u has zero projection on |v 1 (since |u 1 moves to |v 1 ), so it suffices to show that T |u has no projection on B ′ .
Indeed, by Equation 45 it would suffice to show that V ′T v can be made to be an arbitrary vector in B ′ . This follows from the following reasoning. By the same argument as in the proof of Lemma A.1 there must be a vector |v * ∈ B such thatT |v * ∈ B. Moreover,T |v * must be in SinceT |v * ∈ B ′ , using an arbitrary V ′ ∈ SU (B ′ ), V ′T |v * can be made to be any vector in B ′ . Now pick any |u * ∈ A ′ . ThenT |u * ∈ A, and therefore with an arbitrary transformation U ∈ SU (A), UT |u * can be made to be an arbitrary vector in A. But since for every |v ∈ B we have u * T U |T v = 0 thenT |v has no projection on A and consequently,T B = B. But then sinceT is unitary it follows that T A = A -which is the contradiction we were seeking.
Having proved the last two lemmas, We are now in a position to prove Lemma 4.1: Proof: Let {|u 1 , . . . , |u n } be an orthonormal basis of A and similarly {|v 1 , . . . , |v n } an orthonormal basis of B. We define the following sequence of subspaces for i = 1 · · · n, where B 0 = B.
Let us show how to approximate an arbitrary U ∈ SU (B 1 ), given SU (B), SU (A), and W . From Lemma A.2 we can use SU (B), SU (A) and W to approximate a transformation T that takes |u 1 to |v 1 while leaving the rest of the vectors in A intact. Therefore T ∈ SU (B 1 ). Now pick an eigenvector |ψ ∈ B 1 of U with an eigenvalue e iθ . Using Lemma A.1 with respect to the subspaces span|u 1 , B and the mixing transformation T , we can approximate a transformation W ψ ∈ SU (B 1 ) that takes |ψ to |u 1 .
We first show how to approximate the transformation U 1 = W ψ U W −1 ψ . We notice that U 1 has |u 1 as an eigenvector with an eigenvalue e iθ . Consequently, U 1 leaves the subspace B invariant. Let V 1 be the transformation in SU (B) that satisfies V 1 |v 1 = e iθ |v 1 , V 1 |v 2 = e −iθ |v 2 , and leaves the rest of the basis vectors unchanged. Recalling that T takes |u 1 to |v 1 , we see that T −1 V 1 T has |u 1 as an eigenvector with eigenvalue e iθ and leaves the subspace B invariant. So the only difference between U 1 and T −1 V 1 T is some transformation V 2 ∈ SU (B), and therefore and consequently, Now that we have generated all transformations in SU (B 1 ), we can generate SU (B 2 ) using the very same procedure -except now B 1 plays the role of B and |u 2 plays the role of |u 1 . In the same method we can work our way all up to SU (B n ) = SU (C).

A.3 Proof of Lemma 4.2 (The Decoupling Lemma)
Proof: We define two subgroups H a ⊳ SU (A) and The theorem will be proved once we show that H a = SU (A) and H b = SU (B).
To do that, we first observe that both H a and H b are normal subgroups. It is also straightforward to see that they are closed. Consider, for example, H a in SU (A): assume that {U k } in H a converges to U ∈ SU (A). Then there exist series σ (k) n such that lim n→∞ τ a (σ (k) n ) = U k , Without loss of generality, we may choose the series such that for every n, k, Then the series τ a (σ k k ) → U , and we are guaranteed that τ b (σ k k ) → ½.
We first need to show that this function is well defined for all cosets U H a . This follows from: n } and {σ (2) n } such that with U 1 and U 2 in the coset U H a . We will show that V 1 and V 2 must be in the same coset of H b .
Since H a is normal, ∆ is in H a and we may therefore find a series {σ Then looking at the series σ n , we find that It remains to show that M is 1 − 1, onto, and preserves the action of the group. The onto part follows if we start with a series that converges to some V in V H b and apply the same reasoning as in the first item above. The 1 − 1 part follows if we apply the same reasoning as in the second point above, starting with the V 1 , V 2 in the same coset instead of the opposite direction. The fact that M is a homomorphism follows from the fact that τ is a representation.
Recall now that H a and H b can be either finite groups or equal to their "supergroup". So there are four possibilities: 1. H a is finite and H b = SU (B). Let us now see why the third case is also impossible. To do this, we will show that M is a continuous map. Indeed, assume that U k H a → U H a (here, convergence means that for some representatives of the cosets we have U k → U . It is easy to check that this is well defined). Then let M (U k H a ) = V k H b , M (U H a ) = V H b . We will show that V k H b → V H b . The proof is straightforward, similarly to the proof that H a is closed. pick pick a series of series {σ and without any loss of generality we assume that τ a σ k n − U k < 1/n , τ b σ k n − V k < 1/n .
Then since U k → U , we have τ a (σ k k ) → U , and since M (U H a ) = V H b we can find a sub-series τ b (σ k ℓ k ℓ ) that converges to someṼ ∈ V H b . In order not to overload the notation, let us re-define k to be that sub-series. We now claim V k →Ṽ . Indeed, for each ǫ > 0, we can choose K such that for each k > K, 1/k < ǫ/2 and τ b σ k k −Ṽ < ǫ/2. Then Now H a and H b are closed normal subgroups, and therefore SU (A)/H a and SU (B)/H b are Lie groups themselves (see for example, Theorem 3.64, pp 124, in [19]). Furthermore, since every continuous homomorphism between Lie groups is also smooth (see for example, Theorem 3.39, pp 109, in [19]), we have found a smooth diffeomorphism (1 − 1 homeomorphism) between two differentiable manifolds.
and it is therefore impossible to find a diffeomorphism between the two manifolds.