Quantum Routing with Teleportation

We study the problem of implementing arbitrary permutations of qubits under interaction constraints in quantum systems that allow for arbitrarily fast local operations and classical communication (LOCC). In particular, we show examples of speedups over swap-based and more general unitary routing methods by distributing entanglement and using LOCC to perform quantum teleportation. We further describe an example of an interaction graph for which teleportation gives a logarithmic speedup in the worst-case routing time over swap-based routing. We also study limits on the speedup afforded by quantum teleportation - showing an $O(\sqrt{N \log N})$ upper bound on the separation in routing time for any interaction graph - and give tighter bounds for some common classes of graphs.


I. INTRODUCTION
Common theoretical models of quantum computation assume that 2-qubit gates can be performed between arbitrary pairs of qubits.However, in practice, scalable quantum architectures have qubit connectivity constraints [1,2], which forbid long-range gates.These connectivity constraints are typically represented by a simple graph, where vertices correspond to qubits, and edges indicate pairs of qubits that can undergo 2-qubit gates.A quantum architecture with N qubits is thus represented by a graph G with N vertices.Circuits that use all-to-all connectivity must be transformed to new circuits that respect the architecture constraints specified by this graph.Simple transformations introduce polynomial overhead in the worst case, so it is crucial to lower this overhead.
A natural approach to mapping circuits to respect interaction constraints is by permuting qubits using routing protocols.Routing refers to the task of permuting packets of information, or tokens, on vertices of a graph.In quantum routing, tokens are data qubits, to be permuted on the graph specified by the architecture's connectivity constraints.Previous work has used swap gates to perform routing [3,4], and routing protocols from a classical setting using swap gates [5][6][7] can be naturally applied to the problem of routing quantum data as well.
Faster routing protocols can be obtained by using a wider range of quantum operations.For example, Hamiltonian evolution can obtain a constant-factor speedup over swap-based routing [8].More details of the comparisons and advantages of quantum routing models to clas- * ddhruv@umd.edusical routing models can be found in reference [9].However, these approaches rely on locality-restricted unitary evolution, so the routing time is limited by the propagation speed of quantum information [9,10].
In this paper, we additionally allow for fast local operations, measurement and feedback (LOCC).Since this model allows for fast classical communication across long distances, it is not similarly constrained by the propagation speed of quantum information.For example, without prior shared entanglement, quantum teleportation over arbitrary distances can be performed in constant depth by using entanglement swapping [11] in a quantum repeater protocol [12], as shown in Fig. 1.Entanglement can also be distributed using quantum network coding protocols [13].The ability to perform teleportation in constant depth immediately gives routing speedups over swap-based methods and even over previous unitary quantum routing methods, since teleportation can be used to quickly exchange distant pairs of qubits.
We show that using measurement and feedback to help prepare long-range entanglement can significantly decrease the time required for routing, even without using a large number of ancillas.In particular, we demonstrate the first superconstant speedup for quantum routing over swap-based routing in the setting where O(N ) ancillas are allowed, showing a O(log N ) speedup for the hardest (i.e., worst-case) permutations.Further, our main result proves the first non-trivial limits on the advantage of teleportation-based routing protocols by an O( √ N log N ) upper bound on the speedup over swapbased routing.Finally, we also show a new swap-based algorithm for sparse routing of k qubits on a graph G in time O(k + diam(G)), where diam(G) is the diameter of G (i.e, the maximum shortest-path distance between any pair of vertices).
FIG. 1. Constant-depth long-range teleportation protocol on a path of 7 qubits.The X and Z gates are classically controlled by the parities of the two sets of measurement results.This protocol can be extended to paths of any length without increasing the circuit depth.
LOCC has previously been useful to give low-depth implementations of specific unitaries, such as quantum fanout [14], long-range operations on the surface code [15], and the preparation of a wide range of entangled states [16][17][18][19].In fact, previous work showed routing speedups by using ancillas [20] and by employing LOCC [21].Using teleportation, Rosenbaum [21] showed a protocol that implements any permutation in constant depth.However, Rosenbaum's protocol uses O(N 2 ) qubits to perform permutations on O(N ) qubits, so that only a negligible fraction of the qubits are data qubits.Engineering qubits is difficult, so it is preferable to use as many of them as possible as data qubits to enable larger computations.Therefore, in this work we consider a more modest O(1) ancillas per data qubit (i.e.there are O(N ) ancillas in total).The availability of O(1) ancillas per data qubit is natural in some quantum systems, such as in NV center qubits [22], quantum dots [23], and trapped ions [24].To our knowledge, this is the first work to study quantum routing with measurement and feedback in the restricted ancilla setting.By studying routing in this regime, we make progress on an open question posed by Herbert [20], asking to what extent ancillas can be used to accelerate routing.
Routing is more powerful than state transfer and entanglement distribution [25,26].For example, routing qubits from locally prepared Bell states can be used to generate long-range entanglement.The upper bounds in our work therefore also apply to these tasks.
Our work may be of interest to experimental efforts in systems which allow for mid-circuit measurements.In particular, the non-locality enabled by measurement and feedback makes large distances between qubits (i.e, large diameter connectivity graphs) less of a challenge for algorithm implementations.Additionally, knowledge of (tele-portation) routing may inform choices of connectivity in systems with these features.In particular, our upper bounds can be used to compare routing overheads on different architectures based on their spectral and isoperimetric properties.
Furthermore, teleportation routing can also provide large advantages for specific permutations, which makes it useful for efficient implementations of algorithms on near-term architectures.This is also of relevance to fault-tolerant quantum computation, as a major obstacle to the implementation of promising quantum errorcorrecting codes, such as qLDPC codes [27,28], is their need for long-range syndrome measurements [29,30].This can be alleviated by using teleportation to route together distant qubits from each syndrome.Further, protocols to prepare code states and implement logical operations in locality-restricted architectures are constrained by Lieb-Robinson bounds.Recent work [31] has shown how the use of measurements can accelerate such tasks.Routing schemes enabled by the use of teleportation can also be considered on fault-tolerant architectures, such as, for example, to perform logical circuits across surface code patches.More generally, the use of measurement and feedback enables speedups from the ability to implement long-range interactions quickly, which can also make algorithms much easier to run on near term architectures [14].
Our paper is organized as follows.After introducing the models in Sec.II, we discuss known upper and lower bounds on the routing time for both swap-based and teleportation routing in Sec.III.We also introduce an improved algorithm for sparse routing (i.e, routing of a small subset of tokens) with swaps and ancillas.In Sec.IV, we use teleportation to speed up specific permutations.In Sec.V, we compare teleportation routing to swap-based routing for arbitrary permutations, and we give an example of a O(log N )-factor speedup over swapbased routing.In Sec.VI, we show an O( √ N log N ) upper bound on the speedup of teleportation routing over swap-based routing for all graphs, and show tighter bounds for some common classes of graphs.Finally, we conclude in Sec.VII with a discussion of the results and some open questions.

II. PRELIMINARIES
We consider architectures consisting of N data qubits connected according to a simple graph G (with vertex set V (G) and (undirected) edge set E(G)), where an edge (u, v) ∈ E(G) represents a connection between qubits u, v ∈ V (G), and |V (G)| = N .We consider only connected graphs, i.e., graphs in which there is a path from any vertex to any other vertex.
We assume there are a constant number of ancillary qubits per data qubit that can interact only with the data qubit.Further, we assume that disjoint two-qubit gates can be performed between adjacent qubits in depth 1.Up to a constant overhead, this is equivalent to having fast (instantaneous) ancilla interactions since any unitary on the data qubit and ancillas can be decomposed into a constant number of two-qubit gates.As our results are asymptotic, they are insensitive to a constant overhead.
Ancillary qubits corresponding to different data qubits are not directly connected.However, gates between ancillary qubits of neighboring vertices can be performed in depth 1 by swapping ancillas with their corresponding data qubits, performing the desired 2-qubit gate between data qubits, and swapping again with the ancillas.This model can be implemented in realistic quantum architectures with attached ancillas [22][23][24] as well as architectures with grid connectivity such as superconducting qubits [1,32].For example, Fig. 2a shows an architecture where ancillas are interspersed with data qubits on a grid.This can be represented in our model as Fig. 2b.Both models are equivalent and can simulate each other with only constant depth overhead.
The task of routing involves permuting data qubits on the graph.We use the notation to denote a permutation on N vertices, where π(i) is the vertex to which we must move the ith qubit.We also write We consider the following models of routing.
1. Swap routing: In this model, the only allowed gates between adjacent qubits are swap gates.

LOCC routing:
In this model, we are allowed to perform arbitrary 2-qubit gates on disjoint pairs of qubits in a single time step.Further, in the same time step, we are allowed to perform singlequbit measurements (on data and ancilla qubits) and adaptively apply arbitrary single-qubit gates.We refer to this as fast measurement and feedback.
Gates in later time steps can be applied adaptively, conditioned on all previous measurement results.

Teleportation routing:
In this model, data qubits can be teleported along disjoint paths to ancilla registers at arbitrary distances in depth 1.Using this ability, a swap between the ends of a path can be performed in constant depth.Note that the qubits along a teleportation path cannot be involved in any other operations during a round of teleportation.However, teleportation between multiple pairs of qubits can be performed in parallel if there exist paths for each pair that have no more than a constant number of intersections per vertex, since we allow a constant number of ancilla qubits per data qubit.This model is a specialization of LOCC routing as the ability to perform fast measurement and feedback allows us to perform quantum teleportation, transporting a single qubit to any vertex in constant depth.The entanglement required for quantum teleportation is produced using an entanglement swapping protocol [11], as depicted in Fig. 1.A swap between the ends of a path can be performed by teleporting the qubit at each end to the opposite end, or by performing gate teleportation [33] of a swap gate.
We are particularly interested in the routing time rt(G, π), which is the minimum circuit depth to perform the permutation π on the data qubits of G.The worstcase routing time of a graph G is rt(G) := max where S N is the symmetric group, i.e., the group of all permutations of N elements.We let rt tele (G) denote the routing time in the teleportation model, rt LOCC (G) denote the routing time in the LOCC model, and rt swap (G) denote the routing time in the swap model.

III. BOUNDS ON ROUTING TIME
In this section, we discuss known bounds on the routing time for both swap and LOCC routing.

A. Lower bounds
If a permutation can only be implemented by sending a large number of tokens through a small number of vertices, then any circuit for performing it must have high depth, since each vertex can only hold one token at a time.This gives a natural lower bound on the routing time.To formalize this, we consider the vertex expansion (or vertex isoperimetric number ) c(G) of a graph G, defined as follows. where is the complement of X, and is the vertex boundary of X.
Since rt swap (G) ≥ rt tele (G) ≥ rt LOCC (G), this lower bound applies to swap-and teleportation-based routing as well.
We can also lower bound the swap-based routing time by the diameter of the graph (i.e, the maximum shortestpath distance between any pair of vertices) since swapping two vertices at distance d requires a swap circuit of depth at least d.
Note that this bound does not apply to teleportation or LOCC routing.

B. Upper bounds
On any graph, a classical swap algorithm can route on an N -vertex tree in depth O(N ) [7].Recall that we only consider connected graphs, so we can always route on a spanning tree with swaps in depth O(N ).We thus have the following upper bounds.Theorem 3.4.For any N -vertex connected graph G, This bound also implies that rt tele (G) = O(N ) and rt LOCC (G) = O(N ).
We can prove a tighter bound for sparse routing.Let rt swap (G, k) denote the worst-case routing time on G over permutations that move at most k tokens.Using reversals, [9] gives a routing algorithm that takes depth O(diam(G) + k 2 ).We improve this result, using swaps with ancillas, to show the following.Theorem 3.5 (Sparse routing).For any N -vertex connected simple graph G and k ∈ Proof sketch.Call all tokens v with π(v) ̸ = v marked.
There are k marked tokens.There are three main steps in our algorithm: 1. Hide all unmarked tokens in the ancillas by performing swaps.Route the k marked tokens to span a tree subgraph in time O(diam(G)).
2. Permute the k tokens on the tree subgraph, using the procedure from [7], in time O(k).
3. Reverse the first step, thereby moving the k tokens from the subgraph to the appropriate target locations in time O(diam(G)).Restore the unmarked tokens from the ancillas.
See Appendix A for the full proof.

IV. FASTER PERMUTATIONS WITH TELEPORTATION
The ability to perform teleportation immediately suggests possibilities for speedups over swap-based routing.Swap-based routing must obey the diameter lower bound (Theorem 3.3), so permutations that involve long-range swaps (e.g., between diametrically separated pairs of vertices) should be sped up by teleportation.
By the diameter lower bound, this permutation takes depth Ω(N ) with swaps.However, with teleportation it takes depth O(1), showing that adv(P N , π diam ) = Ω(N ).
This further generalizes to permutations that require multiple long-range swaps.For example, consider a rainbow permutation π α rainbow , as depicted in Fig. 3b.This . . .permutation involves performing N α swaps across a 1D lattice for some α ∈ [0, 1].With swaps, this takes depth Θ(N ) by the diameter bound, but with teleportation it takes depth N α , by a procedure that simply teleports each pair into place sequentially.This gives a polynomial advantage: adv(P N , π α rainbow ) = O(N 1−α ).These permutations allow speedups bounded by the diameter of the graph.Any single teleportation step can be simulated by swaps in depth O(diam(G)), by simply swapping along the shortest path between the initial qubit and the final destination.Intuitively, one might therefore expect that teleportation routing could achieve at most a diameter-factor speedup.However, there exist some graphs and permutations for which we can obtain even larger speedups.Teleportation speedups are not limited by the graph diameter since teleportation protocols can utilize multiple longer paths together to avoid intersections.
To illustrate this, consider the example of a wheel graph W N +1 , as shown in Fig. 4. The (N + 1)-vertex wheel graph, with central vertex N + 1, has edges (16) The diameter of W N +1 is 2. On this graph, consider the permutation (shown in red in Fig. 4) that exchanges l pairs of vertices spaced along the "rim" of the wheel (assume l | N ).For swap-based algorithms, this can be done in depth min{3l, N/l −1} by routing the qubits sequentially through the central vertex or routing them in parallel along the "rim", whichever is faster.This is optimal up to constant factors, by the following reasoning.If there exists a data token that does not pass through the central node, the routing time must be at least N/l − 1, which is the travel distance along the rim.On the other hand, if every data token passes through the central node, then there must be at least 2l steps in the algorithm.Therefore However, in the teleportation routing model, this permutation can be performed in constant depth by performing l teleportations in parallel along non-intersecting paths on the wheel rim.Therefore, Setting l = N/2, we obtain a maximum teleportation advantage adv(W N +1 , π l wheel ) = Θ( √ N ) for this class of permutations, even though diam(W N +1 ) = O(1).Teleportation therefore enables super-diametric speedups.

V. TELEPORTATION ADVANTAGE
While π diam , π α rainbow , and π l wheel allow for teleportation speedups, they are not the worst-case permutations on their respective graphs.For example, consider the full reflection on the line graph, i.e., a rainbow permutation with α = 1.This permutation requires depth Θ(N ) for both swap-and teleportation-based routing.Similarly, on the wheel graph with an even number of vertices, the permutation π with π(i) = i + ⌊N/2⌋ mod N for all i ∈ [N ] requires depth Θ(N ) for both types of routing as well.Thus, although these graphs have teleportation speedups for specific permutations, there is no separation between their swap and teleportation routing numbers.
To compare the relative strength of the teleportation routing model to the swap-based routing model for all permutations, we aim to understand how much teleportation improves worst-case permutations.We measure the relative strength of the teleportation model by the separation in teleportation and swap-based routing numbers, . . . . . . . . .

FIG. 5. The graph L(n).
The black lines show edges between layers, while blue lines show edges within a layer (colored for visibility).
which we define as the worst-case teleportation advantage: Note that this is not the worst-case ratio of routing numbers for a single specific permutation, i.e., adv(G) is not necessarily the same as max π adv(G, π).(Indeed, as discussed above, these two quantities differ for the path and wheel graphs.)Instead, adv(G) can be thought of as the speedup teleportation provides for the general task of routing on a particular graph in the worst case, rather than for implementing a specific permutation.It also allows us to compare different graphs: teleportation routing offers greater worst-case guaranteed speedups on graphs with higher adv(G).It is not immediately obvious that we should expect adv to be greater than 1 for any graph.However, we now describe a graph that does offer a worst-case speedup for teleportation.This graph, which we denote by L(n) (with N = 2 n − 1 vertices), has adv(L(n)) = n = log 2 (N + 1).The graph L(n) (depicted in Fig. 5) has and In words, L(n) is a ladder formed by arranging complete graphs K 2 k for k ∈ {0, 1, . . ., n − 1} in horizontal layers, and then connecting every vertex in a given layer with every vertex one layer above or below.The total number of vertices in this graph is The diameter of With teleportation, we show that routing can be performed in depth O(1).The key idea behind the teleportation protocol is that every layer of L(n) has one more node than all the layers above it together.This allows us to identify a unique node in each layer corresponding to any node from a higher layer.We can then route tokens by simply teleporting along the path formed by the unique nodes from each layer, corresponding to the source vertex of the token to be routed.
This teleportation routing procedure establishes the following.Proof.For any permutation π ∈ S 2 n −1 , we construct a set of paths {P (u, π(u)) | u ∈ V (L(n))} between every node and its destination such that each vertex of the graph belongs to at most four paths in the set.
Label every vertex in the graph with an n-bit address as follows.To every node in the subgraph K 2 i (corresponding to layer i + 1 of the ladder), assign a unique integer u in the range [2 i , 2 i+1 − 1].(Since the layer is a complete graph, the order within a layer is arbitrary.)Equivalently, we may refer to node u by its binary representation b(u), which is an (i+1)-bit string with a leading 1, i.e., of the form b(u) = (1 . ..).
For any vertex u ∈ V , define r(u, i) to be the vertex whose address is (10 i−1 b(u)), i.e., the address of u appended to a leading 1 i places to the left.Note that r(u, i + 1) is adjacent to r(u, i) and lies in the layer immediately below r(u, i).Define r(u, 0) = u.Now, given two vertices u, v separated by a distance d, define a canonical path P (u, v) = P (v, u) as the sequence of the following nodes: (u, r(u, 1), . . ., r(u, d − 1), v), where we assume u < v without loss of generality.If d = 1, then P (u, v) = (u, v).We now show that for any permutation π ∈ S 2 n −1 , the set of canonical paths {P (u, π(u)) | u ∈ V (L(n))} intersects any vertex at most four times.
Finally, construct one Bell pair for every edge in every canonical path P (u, π(u)), using distinct local ancillas for every pair.The number of Bell pairs shared at any vertex is at most 6 = O(1), requiring 6 local ancillas per vertex.Using the standard repeater protocol (Fig. 1) along each canonical path, one can then carry out simultaneous teleportation of all data qubits to their destination vertices v → π(v) in constant depth.Therefore, any permutation of the qubits can be implemented in depth O(1).

VI. BOUNDING THE TELEPORTATION ADVANTAGE
In the previous section, we described a graph with logarithmic teleportation advantage.In this section, we examine limits on the teleportation advantage.In order to understand the power of teleportation in general, we specifically aim to bound the maximum teleportation advantage over all graphs with a fixed number of vertices.This quantity measures the maximum speedup teleportation can provide on worst-case permutations for any graph.
We also show tighter bounds on the advantage for some common classes of graphs.We immediately have an upper bound on adv from Theorem 3.4.Since any teleportation algorithm must have depth Ω(1), and a swap algorithm can implement any permutation in depth O(N ), we have We now show a tighter bound.

A. Advantage for general graphs
Combining Theorem 3.2 and Theorem 3.3, we have We now consider the relationship between diam(G) and 1 c(G) .Intuitively, increasing the diameter while keeping N constant 'stretches' the graph, tightening bottlenecks.This causes c(G) to decrease.Similarly, eliminating bottlenecks in the graph requires adding more edges across cuts, thereby increasing the connectivity of the graph and reducing the diameter.We thus expect that graphs with higher diameter will have higher 1 c(G) , and graphs with small 1 c(G) will have small diameter.We can express this relation more precisely as follows.Lemma 6.1.For any connected simple graph G, Proof.See Appendix B.
One might expect graphs with large diameter to allow large speedups, since the diameter lower bound only applies to swap routing.However, as illustrated by Lemma 6.1, graphs with large diameter also have tight bottlenecks, and therefore, by Theorem 3.2, are not likely to permit large speedups.
We now show our main results bounding the advantage.Our main technical result bounds the advantage in terms of the diameter of the graph.We note that this bound also applies to the separation between swaps and teleportation routing for any permutation, and not just the worst-case separation.
Proof.We construct a swap-based protocol that can simulate a single round of teleportation in depth O( √ N + diam(G)), thereby upper bounding the teleportation advantage.
A single round of a teleportation protocol performs teleportation along a set of paths.These paths must intersect no more than a constant number of times per vertex, since there are only a constant number of ancillas per vertex.
For all paths from the teleportation protocol of length at most √ N , we swap along the paths in parallel.Since each vertex only has a constant number of paths going through it, a qubit can move through every vertex in constant depth.Therefore, these swaps can be performed in depth O( √ N ).For an N -vertex graph, the number of paths of length at least l that intersect at most a constant number of times is in O(N/l).Therefore, since each long path corresponds to a single token, after routing along all paths of length at most √ N , we have O( √ N ) tokens left to route.By Theorem 3.5, this can be done in depth O( √ N + diam(G)).We can thus simulate each teleportation round in depth O( √ N +diam(G)), which completes the proof.
Combining our results, we now have a bound on the advantage for any graph.
Proof.First, combining Theorem 3.4 and Theorem 3.2, we have Combining this bound with the bound from Lemma 6.2, we have (30) We know that c(G) > 0. Using the fact that log(x) ≥ 1 − 1/x for x > 0, we have where in the last equality we used c(G) ≤ 1. Applying this to Lemma 6.1 and Eq. ( 30), we have (32) Recall the definition of the maximum teleportation advantage from Eq. ( 25): Therefore, As c(G) varies, the two bounds in the minimum vary inversely.The first bound, from Eq. ( 29), is monotonically the first bound is smaller, while when c(G) ∼ log N √ N , the second bound is smaller.The largest minimum of the two bounds is thus obtained when they are equal.
The minimum of the two bounds is thus maximized when c(G) = (log N )/N .Note that even if a graph with c(G) = (log N )/N does not exist, any other value of c(G) will result in a smaller right-hand side of Eq. (32).With c(G) = (log N )/N , we obtain as claimed.
This bound applies to any graph, and is thus independent of the diameter of the graph.Therefore, this result shows that in graphs with diameter ω( √ N log N ), we cannot obtain a routing time separation between teleportation-and swap-based routing that is proportional to the diameter.
Next we show tighter bounds for a few common families of graphs.

B. Grids
For d-dimensional grids (i.e, P □d n , the d-fold Cartesian product of the path graph P n , with N = n d vertices), the vertex cut bound (Theorem 3.2) gives where c(P □d n ) ≤ 2/n follows from considering a hyperplane that bisects the grid along one dimension.From [5], we have Therefore, the swap routing time of a d-dimensional grid is O(dN 1/d ) = O(dn).For constant d, this saturates the cut bound in Eq. (36).Therefore, there is no worstcase speedup from either teleportation or full LOCC, i.e, adv(P □d n ) = 1.

C. Expander graphs
We bound the advantage for spectral expander graphs to be poly(log N ).The (normalized) Laplacian of a graph, G, is defined as where d v is the degree of vertex v.The matrix L is symmetric and positive such that we can order its eigenvalues as 0 We write λ(G) for λ 1 of the Laplacian of G. Spectral expander graphs are graphs of bounded degree with λ(G) = Ω(1).For a comprehensive introduction to spectral graph theory, consult [6].
To bound the advantage for spectral expander graphs, we first use the following upper bound on the swap-based routing number.Let d * := max v∈V dv min v∈V dv denote the degree ratio of a graph.

Theorem 6.4 ([9]
).For any graph G and permutation π, Combining this result with the lower bound of Theorem 3.2, we immediately get Thus graphs with λ(G) = Ω(1) and d * = O(1) (such as spectral expanders) have at most a polylogarithmic advantage.

D. Hypercubes
The swap-based routing time for a d-dimensional hypercube Q d is [5,34] Now, we will show that c(Q In a hypercube, Hamming balls (i.e., sets of all points with Hamming weight ≤ r for some integer r) have the smallest boundary of all sets of a given size [35].Taking the Hamming ball of radius d/2 as X, we have Teleportation thus offers at most an O( √ log N ) advantage on hypercubes.

E. Other graphs
The cyclic butterfly graph B r has been proposed as a constant-degree interaction graph that allows for fast circuit synthesis [3,36].Each of the N = r2 r vertices is labelled (w, i) ∈ {0, 1} r × [r].Vertices (w, i) and (v, i + 1 mod r) are connected if w = v or if w and v differ by exactly one bit in the ith position.The cyclic butterfly has diameter O(log N ), degree 4, and rt swap (B r ) = O(log N ) [36].
We now show that the O(log N ) protocol is optimal even for teleportation routing on the cyclic butterfly graph, so adv(G) = O(1).Bipartition the vertices into sets X, X such that X consists of all rows with bit j = 0 for some j, and X consists of all rows with bit j = 1.For this partition, |X| = r2 r−1 and |δX| = 2 r , so c(G) ≤ 2/r.Since r = Θ(log N ), c(G) = O( 1 log N ), so from Theorem 3.2, adv(G) = O(1).
The complete graph K N has rt swap (K N ) = O(1), and therefore has adv(K N ) = 1.
Finally, graphs with poor expansion properties-in particular, with vertex expansion c(G) = O( poly(log N )

VII. DISCUSSION
In this paper, we have used quantum teleportation to speed up the task of permuting qubits on graphs.We have shown examples of specific types of permutations that can be sped up by teleportation.Further, we have shown an example of a graph that exhibits a worst-case teleportation routing speedup of log N .Our main technical result (Theorem 6.3) is a general upper bound of O( √ N log N ) on the worst-case routing speedup.We also show that many practical architectures cannot implement arbitrary interactions with low overhead, even with fast LOCC (unlike previous work which only considered unitary evolution).Such a negative result provides useful constraints for the design of quantum devices, suggesting that designing new architectures may prove fruitful.
Our work leaves an open question on whether there exists a graph with adv(G) = ω(log N ).Such a graph cannot be a spectral expander graph as per Theorem 6.4.From Lemma 6.1, we know that a graph with large diameter will have poor expansion properties (small c(G)) and therefore will not have a large teleportation advan-tage as per Eq. ( 29).Some candidate graphs for a superlogarithmic teleportation advantage are those with c(G) ≈ (log N )/N .Such graphs may come closer to achieving a teleportation advantage given by the upper bound of Theorem 6.3.
Furthermore, we believe that there should exist a tighter upper bound than Theorem 6.3 on the maximum teleportation advantage for any graph.This is one particularly interesting direction in resolving the advantage of a teleportation protocol over swaps.There could be more sophisticated methods that give tighter bounds by exploiting parallelism.A possible approach to tightening this bound would be to show a swap protocol that performs routing from multiple teleportation rounds in parallel, since swap paths need not obey the strict conditions of teleportation paths (namely, allowing only a constant number of path intersections per vertex).
We have primarily focused on the teleportation model of routing.However, teleportation routing is a special case of the more general LOCC model of routing.We currently do not know whether the full power of LOCC can provide a super-constant speedup over teleportation routing.This is analogous to another open question, namely whether routing with arbitrary 2-qubit gatesor even with arbitrary bounded 2-qubit Hamiltonianscan provide a super-constant speedup over swap-based routing [9].
Herbert [20] posed the question of establishing to what extent ancillas can be used to reduce the routing depth.Rosenbaum [21] showed an O(1) routing protocol on N qubits with O(N 2 ) ancillas (i.e., an advantage of O(N )), while systems without ancillas cannot perform LOCC or teleportation routing, and therefore cannot exhibit any speedups.We have investigated an intermediate regime, and have shown that a linear number of ancillas cannot allow for speedups greater than O( √ N log N ).It remains an open question to further investigate the space-time tradeoff between the number of ancilla qubits and the routing time.
We assume noiseless circuits, but in the presence of noise the performance of teleportation protocols depends directly on the fidelity of the required resource Bell pairs.We are primarily interested in ways to use teleportation for routing, and Bell pairs are necessary for this process.Our current teleportation routing model does not distinguish between routing over long or short paths, but a more comprehensive model of routing could prioritize shorter paths as they will be less error prone without error correction.Alternatively, we could use a purification protocol [37] to prepare high-fidelity Bell pairs at the cost of additional ancillas and overhead, or we could encode our state in an error-correcting code [38] to suppress the error rate when operating between nodes.If operating in a quantum network, we can make use of protocols generalizing entanglement swapping from Bell basis measurements to n-qubit GHZ states to improve the performance of repeater protocols in lossy quantum networks [39].Alternatively, we can prepare a high-fidelity Bell pair by performing multiple repeater protocols along different paths in parallel [40] or using multiplexers on each edge [41].
A more general task than routing is to perform unitary synthesis, i.e, decompose a particular unitary into 2-qubit gates that can be applied on our locality-constrained qubits.It remains an open question to understand how much unitary synthesis can be sped up by using LOCC with a linear number of ancillary qubits.Previous work has shown an Ω(N ) speedup for implementing fanout [14] and preparing GHZ and W states [16], and an Ω( √ N ) speedup for preparing toric code states [16], which takes time Ω( √ N ) without LOCC [42].Previous work has also shown how measurements of cluster states can be used to efficiently prepare long-range entanglement [18] and states with exotic topological order [19].In principle, LOCC could provide superlinear speedups for unitary synthesis, as we currently have no upper bounds on the advantage for arbitrary unitaries.Algorithm A.1: Advance a train Input : Train T from vertices 0 to l − 1 on a path.Vertex l has a null token.The data token on vertex i is data(i), and the token on the corresponding ancilla is ancilla(i). 1 parallel for i = 1, 3, . . .: 2 swap data(i) with ancilla(i) 3 parallel for i = 0, 2, 4, . . .: Definition A.1 (Null token).A null token is a dummy token that can be routed anywhere.In quantum routing, all ancillas are initialized with a null token in state |0⟩.Proof.Suppose we want to move a train of length l towards some vertex r.We define the head of a train as the token on the vertex closest to r, and the tail as the token on the vertex furthest from r.Consider the path subgraph spanned by the vertices the train lies on as well as the vertices of the shortest path from the head to r.Let the tail lie on vertex 0, and head lie at vertex l − 1.We use Algorithm A.1 to advance the train such that after 5 time steps, the tail of the train is at vertex 1 and the head at l.This procedure is depicted in Fig. 6.
We now define a token cluster.
Definition A.4 (Token cluster).A token cluster is a set of trains such that each train contains a token on a vertex that is adjacent to a vertex with a token from another train in the token cluster.
Token clusters move as in Fig. 7, by Algorithm A.2. Once a train joins a token cluster, it remains connected and part of the token cluster.
We now prove Theorem 3.5, which we reproduce here for clarity.marked tokens, and let the remaining token be unmarked tokens.Our algorithm involves three phases.Phase 1: First, we swap the unmarked tokens into local ancilla qubits and store them there for the duration of routing.Every vertex that initially held an unmarked token now holds a null token.
Next, we select some vertex r of G arbitrarily practice, selecting r to be at the center of the graph may provide constant-factor speedups).We now move all marked tokens towards vertex r by swapping along the shortest possible paths, until the tokens span a set of vertices forming a tree connected to r.The tokens are moved in parallel, and when their paths intersect, the tokens move as trains, as per Lemma A.3.When the paths of multiple trains intersect, they form a token cluster, and can be moved as in Algorithm A.2.
Any given train is at most diam(G) distance away from r at the start of Phase 1.At every time step, a train either advances by 1 vertex towards r, or is part of a token cluster in which another train closer to r advances.Therefore, every token cluster becomes connected to r in depth O(diam(G)), since in every token cluster, at least 1 train must reach r in depth O(diam(G)).In particular, in O(diam(G)) depth, all non-null tokens must span a tree containing r, and thus have merged into a single token cluster.
Phase 2: Now we have k vertices spanning a tree T .Suppose token v is mapped to the vertex t(v) in T after Phase 1.Note that the token u that was originally at t(v) must also be a marked token, and therefore must now lie in T .We route the tokens on T according to a permutation π ′ such that π ′ (t(v)) := t(π(v)) (A2) for all t(v) ∈ V (T ), in depth 2k [5].Phase 3: We now simply perform Phase 1 in reverse.During Phase 1, the marked token at u was mapped to t(u).Therefore, after Phase 3, the token at t(u) is mapped to vertex u.Therefore, the following mapping is applied to all vertices with marked tokens: (A3) The combined depth of the three phases is at most O(k + diam(G)). |ψ

FIG. 2 .
FIG. 2. (a)A grid architecture with ancillas (blue) interspersed between data qubits (black).The red ovals indicate which ancilla corresponds to each data qubit.(b) An equivalent architecture in our model.

FIG. 6 .
FIG. 6. Advancing a train of tokens, as in Algorithm A.1.Blank vertices hold state |0⟩.The dashed lines represent connections with the local ancilla qubits.

4
swap data(i) with data(i + 1) 5 swap data(i + 1) with ancilla(i + 1) 6 parallel for i = 1, 3, . . .: 7 swap data(i) with data(i + 1) 8 swap data(i) with ancilla(i) Algorithm A.2: Token cluster movement Input : Set of token clusters; vertex r. 1 In each token cluster, advance the train with head closest to r by 1 using Algorithm A.1. 2 If any two token clusters are adjacent, join them as a single token cluster.3 If the head of any train T1 is adjacent to the tail of another train T2, join them as a single train with the head of T2 and the tail of T1.

Definition A. 2 (
Train).A train is a set of non-null tokens along a path subgraph of G. Now we show how a train can advance, i.e., translate by 1 along its length.Lemma A.3.A train can advance in depth 5.