Polynomial Equivalence of Complexity Geometries

This paper proves the polynomial equivalence of a broad class of definitions of quantum computational complexity. We study right-invariant metrics on the unitary group -- often called `complexity geometries' following the definition of quantum complexity proposed by Nielsen -- and delineate the equivalence class of metrics that have the same computational power as quantum circuits. Within this universality class, any unitary that can be reached in one metric can be approximated in any other metric in the class with a slowdown that is at-worst polynomial in the length and number of qubits and inverse-polynomial in the permitted error. We describe the equivalence classes for two different kinds of error we might tolerate: Killing-distance error, and operator-norm error. All metrics in both equivalence classes are shown to have exponential diameter; all metrics in the operator-norm equivalence class are also shown to give an alternative definition of the quantum complexity class BQP. My results extend those of Nielsen et al., who in 2006 proved that one particular metric is polynomially equivalent to quantum circuits. The Nielsen et al. metric is incredibly highly curved. I show that the greatly enlarged equivalence class established in this paper also includes metrics that have modest curvature. I argue that the modest curvature makes these metrics more amenable to the tools of differential geometry, and therefore makes them more promising starting points for Nielsen's program of using differential geometry to prove complexity lowerbounds.


Introduction & Summary
In this paper we prove the polynomial equivalence of a broad class of definitions of quantum complexity.This class includes many different complexity geometries that may differ exponentially at short distances but at longer distances are all shown to be polynomially equivalent; the class also includes the orthodox gate-counting definition that involves approximating unitaries by compiling circuits from gates and then counting the gates.
A complexity geometry is a right-invariant Riemannian metric on the unitary group, characterized by a penalty schedule (see Sec. 2).We will delineate the class of complexity geometries that give rise to polynomial equivalence.We will show that within this equivalence class any unitary U ∈ U(2 N ) that can be reached in one metric with a path of length L can be approximated in any other metric in the class with a path whose length is at-worst polynomial in L and N and inverse-polynomial in the permitted error.
In the context of computational complexity, the study of right-invariant metrics was pioneered by Nielsen and collaborators [1][2][3][4][5].In 2006, Nielsen, Dowling, Gu, & Doherty [2] proved that for one particular complexity geometry-the so-called 'cliff metric', which exponentially punishes motion in any tangent direction that has terms that touch more than two qubits at once (for a review and rigorous definition, see Sec. 2)-this gives rise to a definition of complexity that is polynomially equivalent to the standard gate-counting definition length of path in cliff metric to get close ∼ number of gates to get close. ( In this paper we will extend this result to less draconian penalty schedules.We will describe sufficient conditions on a penalty schedule I(σ I ) such that length of path in cliff metric to get close ∼ length of path in I(σ I ) metric to get close.
(2) From the point of view of geometry, this will be interesting because it will prove the longdistance polynomial equivalence of different right-invariant metrics on the unitary group, even though these metrics differ exponentially at short scales, advancing the 'effective geometry' program laid out in [6].From the point of view of complexity, this will be interesting because it will provide an alternative definition of the set of tasks that can be efficiently performed on a quantum computer, and this alternative definition has primitive operations that are substantially more permissive than either the gate definition or the cliff-metric definition.Furthermore, many of the complexity geometries that are shown to be in the equivalence class are considerably easier to work with than the cliff metricand to deploy the tools of differential geometry against [7]-because unlike the extremely highly curved cliff metric, these geometries have only modest curvature.

Summary of results
Let's describe our headline results.We will write down sufficient conditions on the complexity geometry in order for the polynomial equivalence Eq. 2 to hold.Our theorems will only apply to complexity geometries for which the penalty metric (defined in Sec. 2) is diagonal in the generalized Pauli basis σ I (also defined in Sec. 2).For such metrics the 'penalty schedule' (also, like all terms in this summary, defined in Sec. 2) assigns a 'penalty factor' I(σ I ) to each of the 4 N basis directions σ I in the tangent space to U(2 N ); this is called a penalty factor because a larger value of I(σ I ) makes the direction σ I more expensive to evolve in.In order to prove the polynomial equivalence, we will need to prove that the complexity geometries can efficiently emulate gates, and that gates can efficiently emulate the complexity geometries.

Complexity geometries that can efficiently emulate gates
The first direction of the equivalence will be straightforward.In order that the complexity geometries are guaranteed to be able to efficiently emulate gates, a sufficient condition is for 2-local basis directions σ I (i.e.basis directions that act non-trivially (3) on only 2 qubits) the penalty factor I(σ I ) is at-most polynomially big (We will show in Sec.7.5.1 that this condition can be significantly relaxed.)

Complexity geometries that can be efficiently emulated by gates
The other direction of the equivalence will require more work.In order that gates are guaranteed to be able to efficiently emulate the complexity geometries, we will need the penalty schedule to be sufficiently punitive, meaning it assigns a small value of I to only a small number of basis directions σ I .To make this precise, we will have to be precise about what we mean in Eq. 2 by 'close'.We will consider two different definitions of 'close', leading to different sets of sufficient conditions.
1. Killing close.If we demand that we get polynomially close as measured in the Killing metric (defined in Sec.2.1), the sufficient condition for polynomial equivalence is for any polynomially large I, there are only (4) polynomially many basis directions σ I with I(σ I ) < I 2. Operator-norm close.If we make the more demanding demand that we get polynomially close as measured by the operator-norm distance (defined in Sec.2.4) then a sufficient condition for polynomial equivalence is for any polynomially small value, we can make the harmonic sum of the penalty factors σ I I(σ I ) −1 smaller than that value by (5) omitting from the sum at most polynomially many of the σ I s Another sufficient condition for operator-norm polynomial equivalence is for any polynomially large I, there are only (6) polynomially many basis directions σ I with I(σ I ) < 2 N I We evaluate these sufficient conditions against some example penalty schedules in Table 1.
When fed the same input state, two unitaries that are close in operator norm will give output states that are close in inner-product.A corollary of our results therefore is that, given the power to implement any polynomial-length path through a complexity geometry that satisfies conditions 3 and 5 or 6, the set of decision problems that can be solved with high probability for every input is exactly the quantum complexity class 'BQP'.
Table 1: Summary of results for some example penalty schedules.For all these examples, the penalty schedule I(σ I ) is a function only of the weight k of the direction σ I , see Sec. 2.2 (though our theorems are more general).A ✓ means that if there exists a polynomiallength path in that penalty metric, there must also exist a circuit that approximates it with a number of gates that is polynomial in the length, the number of qubits, and the targeted error.For the first column, the error is measured in the Killing metric (or equivalently in the normalized Frobenius norm ||error|| F , see Sec. 2.4); for the second column, the error is measured in the operator norm ||error|| op .

Defining distance functions on the unitary group
In this paper we will consider distances functions on U(2 N ), the group of purity-preserving linear functions on N qubits.As well as satisfying all the required axioms-symmetry, the triangle inequality, etc.-all these distance functions will in addition be 'right invariant', meaning that right-invariance: distance Let's define the distance functions now.

Killing metric (bi-invariant Riemannian)
The simplest and most symmetrical nontrivial Riemannian distance function on the unitary group is the Killing metric.The infinitesimal distance between two nearby unitaries U and U + dU is defined as Let's explain what these terms mean.First the bar over the Tr indicates we have normalized the trace so that it gives the average-not the sum-of the diagonal terms, when applied to a 2 N ×2 N matrix this means Tr = 2 −N Tr.The symbol δ IJ is the Kronecker delta.The sum runs over all 4 N of the generalized Pauli's σ I , which are defined as The generalized Pauli's give a complete basis for the tangent space of the unitary group; this basis is orthonormal since Tr[σ I σ J ] = δ IJ .The 'weight' or 'k-locality' of a generalized Pauli σ I is defined as the number of qubits on which it acts nontrivially.In other words, a k-local σ I picks k of the terms in the tensor product Eq. 10 to be SU(2) Pauli matrices (σ x , σ y , or σ z ), and N − k terms to be 1.The number of exactly k-local generalized Pauli's is Using the generalized Pauli's, we can decompose the instantaneous tangent Hamiltonian, which Schrödinger's equation −i U = HU tells us is given by H ≡ −i U U † , as Eq. 8 implies that the length of a small step that applies this Hamiltonian for a time δ is So far we have defined the length of infinitesimal paths.The length of a general differentiable path through the space of unitaries is defined by breaking the path up into infinitesimal segments and summing the lengths of the segments.Finally the Killing distance s(U 1 , U 2 ) between two unitaries is given by the length of the shortest path that connects them.As discussed in Appendix A.1, this distance is never be bigger than π, which is the distance between two antipodal unitaries (e.g. 1 and −1).
The Killing distance function is 'bi-invariant', meaning it is invariant under both leftmultiplication and right-multiplication, One consequence of bi-invariance is that errors at-worst add, (Here the first step uses the triangle inequality and the second step uses the bi-invariance.)Applying this equation to the unitaries built from two different Hamiltonians gives where "P" is Dyson's path-ordering operator, necessary because the Hamiltonian at one time may not commute with the Hamiltonian at a different time.

Complexity geometry
The complexity metrics generalize the Killing metric by replacing δ IJ with a positive symmetric 'penalty matrix' I IJ , so that the line-element becomes (For a pedagogical introduction see [8]; for other recent work see [9][10][11][12][13][14][15][16][17][18][19][20][21][22][23][24][25][26][27]) This metric assigns a length to infinitesimal paths through the space of unitaries; the length of a noninfinitesimal path is the sum of the lengths of its segments; and the distance between two unitaries is the length of the shortest path that connects them.This expression is manifestly still right-invariant but is in general not left-invariant.In this paper, we will prove theorems only about metrics for which the penalty matrix I IJ is diagonal in the generalized-Pauli basis, diagonal penalty matrix: (We discuss the necessity of this condition in Sec.7.5.1.)The diagonal component I(σ I ) is known as the 'penalty factor' since it stretches the tangent direction σ I so that motion in that direction more expensive (or cheaper for I(σ I ) < 1).The length assigned to the small step generated by applying the Hamiltonian of Eq. 12 for an infinitesimal time δ is It follows directly from the metric Eq. 17 that when one penalty schedule is harder than another for every σ I , the harder schedule always gives longer distances Applying this formula to compare a complexity geometry with largest penalty factor I max to the scaled Killing metric that has I(σ I ) = I max in every direction tells us that Example metrics.The theorems we will prove will apply to all penalty metrics that are diagonal in the generalized Pauli basis (i.e. that have the form in Eq. 18), but we will illustrate the theorems with three metrics of special interest.For all three, the penalty factor I(σ I ) assigned to a given generalized Pauli is a function only of the direction's k-locality.We can completely specify such penalty metrics by specifying the value of the penalty I k for each of the possible values of the weight k (i.e. for every whole number between 0 and N ).The three families of special metrics are characterized by the parameters I cliff , α and x: 'binomial metric' 'exponential metric' For all three metrics, 1-and 2-local directions are cheap, and large-k directions are expensive; the metrics differ in how gradually the cost increases as a function of k.The cliff metric was the original metric considered in [1,2].The exponential metric was discussed [11], where it was pointed out that its modest sectional curvature makes it a good candidate to reproduce the correct 'switchback effect' seen in quantum chaos and holographic black holes.The binomial metric also has modest sectional curvature, and was discussed in [6] as a possible candidate for the 'critical metric' to which all harder metrics flow in the infrared.

Gate complexity
The gate distance between two unitaries is the number of gates in the smallest circuit that connects them [28].In the version of gate complexity we will consider in this paper, we triangle right-left-continuous Riemannian inequality invariant invariant Table 2: Properties of the five kinds of distance functions we consider in this paper.
will take the primitive operations to be the action of general two-qubit gates.At each step we are allowed to pick any two qubits, and act on those two qubits with any two-qubit unitary, i.e. any element of U (4).In this way we can build up any element of U(2 N ) exactly with no more than about 4 N gates [29], C gates (U 1 , U 2 ) is defined as the number of gates in the smallest circuit that implements U † 2 U 1 .
It follows directly from the definitions that the cliff metric gives a distance function that is smaller than (π times) the number of gates in the smallest circuit, This is because any element of U(4) can be generated with a time-independent TrH 2 = 1 Hamiltonian acting only on those 2 qubits for a time at most π.
(In this paper, we have chosen a definition of gate complexity in which there is a continuum of primitive operations-any element of U( 4) is an allowed gate.However, this set of gates is known to be polynomially equivalent-polynomial in both the complexity and in the targeted error-to a more restrictive definition of complexity that allows only a discrete set of primitive gates, such as CNOT plus Hadamard plus a random single-qubit phase.This is because of the Solovay-Kitaev theorem [30,31].)

Bi-invariant matrix norms
We can also define a distance between two unitaries via a matrix norm.The distance function is defined as the norm of the difference of the two unitaries The matrix norms we will consider will all be bi-invariant, which means Bi-invariant distance functions have the important property that errors compose (as we already saw with the Killing distance in Eq. 15), This follows from combining the triangle inequality For unitaries generated by Hamiltonians, it follows that for any bi-invariant norm

Frobenius norm
The Frobenius norm of a matrix A is defined as Because of the cyclic property of the trace, the Frobenius distance is bi-invariant.It will also be helpful to define the normalized Frobenius norm For U(2 N ) the normalized trace is given by Eq. 9 as Tr = 2 −N Tr so the norms are related by ||A|| F = 2 −N/2 ||A|| F .The Frobenius norm is the squareroot of the sum of eigenvalues of A † A, whereas the normalized Frobenius norm is the squareroot of the average.
The normalized Frobenius norm of the Hamiltonian of Eq. 12 is given by The normalized-Frobenius-norm length of the infinitesimal path generated by H is Comparing with Eq. 13, we recognize this as being the same as the Killing length At longer separations, Eq. 30 tells us that the normalized Frobenius distance is no greater than (and is typically less than) the Killing distance Both distance functions satisfy the triangle inequality, but only for the Killing distance is there always an intermediate point that saturates the triangle inequality.The Frobeniusnorm distance between U 1 and U 2 is typically shorter than the sums of the lengths of the path-segments for every connecting path, and is therefore not Riemannian.
The relationship between the two distance functions can be understood by considering the simplest case: U(1).As depicted in Fig. 1, the group U( 1) is a circle.The Killing distance between two unitaries is the length around the circumference of this circle, whereas the Frobenius distance is the length of the chord through the circle.These two distances agree for infinitesimal separations; for non-infinitesimal separations the Frobenius-norm distance is shorter.While the Killing distance s is longer than the normalized Frobenius-norm distance, we will prove in Appendix A.1 that it's never that much longer, Looking at the U(1) case in Fig. 1, we see that the lowerbound is saturated for neighboring unitaries and the upperbound is saturated for antipodal unitaries.What Eq. 37 means in practice is that we can bound the Killing distance by bounding the normalized Frobeniusnorm distance, which will be useful since the Frobenius norm is often easier to work with.

Operator norm
The operator norm of a matrix A is defined by The operator norm is the squareroot of the largest eigenvalue of A † A; two unitaries can only be operator norm close if they have close to the same effect on every state, including the worst-case state.It follows immediately from their definitions that for any matrix A, We can write a possibly tighter upperbound on ||H|| op when H is a Hamiltonian that only has support on at most N of the generalized Pauli's.The subadditivity of the norm tells us that This inequality will be saturated if, for example, the only nonzero generalized Pauli's are those drawn from the 2 N operators composed exclusively of 1 and σ z operators, but the inequality is typically not saturated and is never saturated for N > 2 N .Combining Eq. 40 with the Cauchy-Schwarz inequality and with the definition Eq. 33 gives In summary, our two bounds Eqs.39 and 41 together give Table 3: The operator norm ||H|| op is always between ||H|| F and ||H|| F .The operator norm will be at the bottom end of this range when the eigenvalues of H all have equal magnitude, for example any single generalized Pauli such as By contrast the operator norm will be at the top end of this range when the eigenvalues are very unequal, so that H changes a few states a lot but most states only a little or not at all, for example the control-control-control-. . .-control-control-(1 + σ z ) Hamiltonian is an equal superposition over 2 N generalized Pauli's.Note that for a general superposition of generalized Pauli's, ||H|| op will typically be strictly less than I |h I | .

Approximating geodesics with gates
In this section we will consider paths of length L through a complexity geometry with arbitrary penalty schedule I(σ I ).We will show how to approximate such paths within a given error tolerance using 2-local gates, and upperbound the number C gates of 2-local gates required.For a broad class of penalty schedules, we will show that the required number of gates is at most polynomial in L and the error.This generalizes the result of Nielsen, Dowling, Gu, & Doherty [2], who proved the polynomial equivalence of the gate definition of complexity and the complexity geometry for the special case of the 'cliff schedule' Our general proof strategy will roughly follow that of Ref. [2], albeit having to contend with a number of complications that arise from the fact that the penalty schedules we are considering may assign exponentially smaller penalty factors than those assigned by the cliff schedule.
Consider the minimal path through the complexity geometry that constructs U target .This path will be characterized by a (typically time-dependent) Hamiltonian H(t) Without loss of generality we can normalize the time so that H(t) has unit trace Since the minimal path is a geodesic, the 'difficulty' Γ of this path, will be a constant of motion1 .The 'difficulty' gives the constant of proportionality between the length of the path as measured in the Killing metric, s = t, and the length of the same path as measured in the complexity geometry, L = Γs = Γt.

Approximation 1: prune expensive operators
Our first approximation will be to prune the Hamiltonian of its most expensive components.
We will drop all terms that have a penalty factor greater than some critical value I, approximation 1: Excision introduces error.Let's upperbound that error.

Killing error from pruning
According to Eq. 16, the Killing error from pruning is upperbounded by Eqs. 44 and 45 tell us that very expensive operators must also be very small, since Combining these equations, the total Killing error is no more than

Operator-norm error from pruning
In Sec.A.2.2, we will prove that the total operator-norm error from pruning is upperbounded by (There are two upperbounds because there are two upperbounds in Eq. 42.)The operatornorm error is larger than the Killing error Eq. 50.

Approximation 2: average Hamiltonian
To approximate the path generated by H P (t), we will adopt the standard quantum simulation strategy [32] of divide and conquer.First we will divide the path into S equal segments, each of complexity-geometry length L/S and inner-product length δ ≡ L/ΓS, and then we will conquer each segment in turn by approximating it within our target error.Rather than following every twist and turn of H P (t), which would be prohibitively expensive, we will instead, for each segment, just apply the average Hamiltonian within that segment, approximation 2 : These two unitaries agree at O(δ) but disagree at O(δ 2 ).Since we will eventually make δ small but non-infinitesimal, we must also be careful about terms higher order in δ.We'll exercise this care in the appendix, and quote the results here.

Killing error from averaging
In Appendix A.3.1 we will show that the per-segment Killing error caused by averaging is upperbounded by

Operator-norm error from averaging
In Appendix A.3.2 we will show that the per-segment operator-norm error caused by averaging is upperbounded by

Approximation 3: Trotterize unitary
The final approximation will be to chop up the average Hamiltonian into its constituent generalized Pauli's and implement each Pauli in turn.This is known as Trotterization.
We will use the first-order Trotter approximation exp i These agree at O(δ) but disagree at O(δ 2 ),

Killing error from Trotterization
In Appendix A.4.1 we will show that the per-segment Killing error caused by Trotterization is upperbounded by

Operator-norm error from Trotterization
In Appendix A.4.2 we will show that the per-segment operator-norm error caused by Trotterization is upperbounded by

Total error from combining segments
Errors at-worst add for both the Killing distance (Eq.15) and the operator norm (Eq.29).The total error is therefore no more than the sum of the error contributions for each segment times the number of segments S = L/Γδ.So long as we keep δ < N −1/2 , the results in Secs.3.1-3.3imply the error is upperbounded by To keep the first contribution in budget, we must keep δ small; to keep the second contribution in budget, we must keep I large.

Lemma: making monomial unitaries
We have taken the path and broken it up into segments, and then broken those segments up into 'monomial' unitaries each generated by a single generalized Pauli.Now let's show that we can make any of those monomial unitaries with a small number of gates, even if the monomial has high weight.If σ K is a k-local generalized Pauli operator, then any 'monomial' unitary of the form e iσ K z can be synthesized exactly using no more than 2k − 3 two-local gates, lemma: Let's prove that now.Without loss of generality, we can assume the monomial has the form (This is without loss of generality because all monomial operators are related by a change of single-qubit basis, e.g.changing σ z to σ x on the first qubit, changing σ z to σ y on the second qubit, etc.But this is a symmetry of our definition of gate complexity.In Sec.2.3, we defined the primitive gates to be any two-qubit unitary-any element of U(4)-so we can effect any change of single-qubit basis by just bundling it into the last gate that touches that qubit.)We will show how to write e iσ K z with 2k − 3 gates.First, note the identity to change yaw by 2z, we can roll 90 • , pitch down by 2z, and then roll back [33].This identity can be applied to any three generators that have the same algebra as SU(2), so for any generalized Pauli's σ I and σ J we have This allows us to recursively build up high-weight monomials e iσ K z = e i(σz) 1 (σx) 2 π 4 e i(σy) 2 (σy) 3 π 4 e i(σx) 3 (σx) 4 π 4 . . .e i(σy This 2k − 3 gate circuit exactly compiles the monomial unitary and proves Eq. 62. Since 2k − 3 < 2N , it is certainly the case that for all monomial unitaries of any weight,

Main results: bounding the gate complexity
The number of gates to synthesize a single segment using the strategy in Sec.3.5 is The total cost is the cost-per-segment times the number of segments S = L Γδ ,

The gate complexity of getting Killing close
Our error budget is set by Eq. 60.To stay within this budget, we will ensure that each of the two terms in the sum in Eq. 60 is less than half the permitted error; apportioning half the budget to each term and then plugging into Eq.69 gives This first inequality (together with the fact that unless L > Γs(error) the two unitaries to be connected are already within s(error) at the start) ensures that δ < N −1/2 and therefore self-consistently ensures the validity of Eq. 60.Since Γ 2 ≥ min I I(σ I ) = 1 for every geodesic, this tells us that whenever there is a complexity geometry path of length L, there is also a circuit that arrives within Killing distance s(error) that uses a number of two-local gates no more than where N I is the number of σ I with I(σ (71) A consequence of this inequality is the criterion for polynomial equivalence, Eq. 4.

The gate complexity of getting operator-norm close
Our error budget is set by Eq. 61.Let's first use the bound from the first term in the minimum in Eq. 61.Apportioning half the budget to each of the two terms in the sum and then plugging into Eq.69 gives Using Γ ≥ 1, and recalling that N I is defined as the number of σ I with I(σ I ) ≤ I, where I is big enough that A consequence of this inequality is the criterion for polynomial equivalence, Eq. 5. We can also derive the bound from the second term in the minimum in Eq. 61.This gives the sometimes-tighter-sometimes-looser where N I is the number of σ I with I(σ (74) A consequence of this inequality is the criterion for polynomial equivalence, Eq. 6.
Comparing Eq. 71 with Eqs.73-74 we see that it is easier to get Killing close, reflecting the fact that the Killing distance is generally shorter than the operator-norm distance.

Bounds for example metrics
Let's illustrate our results by taking the general bounds Eqs.71, 73, and 74 and applying them to the example metrics Eqs.22-24.We will also have the trivial bound from Eq. 25, (75)

Bounds for Killing close
The number of gates required to get Killing close is bounded by Eq. 71.(As discussed around Eq. 37, being Killing close is equivalent to being close in the normalized Frobeniusnorm distance.)

Cliff metric, Killing close
The cliff metric is For the cliff metric, the number of cheap direction is N 2 + N 1 + N 0 so Eq.71 gives The combination of Eqs.75 and 77 tells us that so long as I cliff is exponentially large2 (meaning that I cliff > α N for some α > 1) then exponential I cliff → Eq. 77 holds for all polynomial values of L and s(error) (78) → C gates is a polynomial function of L and s(error) for all (79) values of L and s(error) For I cliff > 4 N , Eq. 77 holds for all L and s(error).

Binomial metric, Killing close
For the binomial metric the penalty factor of a k-local direction is (the αth power of) the number of k-local directions, Our bound Eq. 71 gives where This tells us that So long as α > 0, this gives polynomial equivalence for any L and s(error).

Exponential metric, Killing close
For the exponential metric, the penalty factor is exponential in the k-locality, Our bound Eq. 71 gives where k is big enough that (84) This is not polynomial equivalence, since the bound on the number of gates scales like L log N/ log x , and so as N increases the exponent of L grows without bound.However, because log N grows so slowly, for all x > 1 this gives what is known as quasi -polynomial equivalence.

Bounds for operator-norm close
The number of gates required to get operator-norm close is bounded by Eqs.73 and 74.

Cliff metric, operator-norm close
The cliff metric is For the cliff metric, the strongest bound comes from Eq. 74, which gives The combination of Eqs.75 and 86 tells us that so long as 2 For 2 −N I cliff > 4 N , Eq. 86 holds for all L and ||error|| op .

Binomial metric, operator-norm close
For the binomial metric the penalty factor of a k-local direction is (the αth power of) the number of k-local directions, For the binomial metric, the strongest bound comes from Eq. 73.Eq. 72 tells us that to control the pruning error we need For α ≥ 1 we have Thus for α > 1, Eq. 73 tells us that polynomial L and ||error|| op means polynomial C gates , We have no polynomial bound for α ≤ 1.

Exponential metric, operator-norm close
For the exponential metric, the penalty factor is exponential in the k-locality, Eq. 72 tells us that to control the pruning error we need The sum is exponentially large for x = O(1) unless Nk is exponentially large, so our method does not put tight bounds on the gate complexity of following polynomial-length paths in the exponential metric if we insist on getting operator-norm close.

State complexity
So far in this paper we have been discussing the complexity of unitary operators.One can also define the relative complexity of pairs of quantum states.The relative complexity of a pair of states is defined as the complexity of the least complex unitary that connects them, Analogously one can define a state-space complexity geometry (see e.g.Sec. 3 of [8]).
Starting with a complexity geometry on unitaries, the distance between a pair of states is defined as the distance from the origin of the least distant unitary that connects them A penalty matrix I IJ thus defines a deformed metric on Hilbert space.It follows from Eq. 26 that the relative gate complexity of a pair of states is lower bounded by their separation in the state-space complexity geometry of the cliff metric, or of any metric less punitive than the cliff metric, To develop a useful upper bound, we're going to have to tolerate error.If we are aiming for |ψ a ⟩ = U a |ψ 0 ⟩ but we instead hit |ψ b ⟩ = U b |ψ 0 ⟩, then knowing that ||U a − U b || F is small (or equivalently that U a and U b are close in the Killing metric) does not guarantee that the inner-product ⟨ψ a |ψ b ⟩ will be close to one.This is because small ||U a − U b || F only guarantees that ⟨ψ|U † a U b |ψ⟩ is close to one when averaged over all input states |ψ⟩, and not necessarily for our particular input state |ψ 0 ⟩.Instead we need to demand that U a and U b are close in the operator norm.The operator norm being small does guarantee that U a |ψ⟩ and U b |ψ⟩ are close even for the worst-case input.Using the definition of the operator norm, Eq. 38, we have Using this inequality, bounds on the number of gates required to get operator-norm close to a unitary become bounds on the number of gates required to get inner-product close to a state.The first upperbound on the cost of getting operator-norm close was Eq. 73.Combining this with Eq. 97 upperbounds the size of the circuit need to transition between any pair of states that are connected by a complexity-geometry path of length L: there must exist a circuit of size at most where I is big enough that (98) that, when applied to the first state, gets within inner-product error ϵ of the second state.(Recall that N I is defined as the number of principal directions σ I with I(σ I ) ≤ I.) The second upperbound comes from Eq. 74 and gives All penalty schedules that are sufficiently punitive to satisfy either of the sufficient conditions Eqs. 5 or 6 are sufficiently punitive that any pair of states that can be connected by a state-space complexity-geometry path of at most polynomial length L[|ψ 1 ⟩, |ψ 2 ⟩] can also be approximated to within any polynomially small inner-product error ϵ by a circuit with at most polynomially many gates.

Implications for diameter of complexity geometry
The "diameter" of a space is defined as the greatest separation between any pair of points.Let's use our results to lowerbound the diameter of the complexity geometry.
One question that has arisen repeatedly in the literature is which complexity geometries have a diameter that is exponential in the number of qubits N .It follows implicitly from the work of Nielsen et al. [2] that both the cliff metric for the unitary group and the cliff metric for state space have exponential diameters.Then in Ref. [7] I proved that a much broader class of metrics on the unitary group have exponential diameters.In the analysis below I will show that Eq. 71 also implies an exponential diameter for a broad class of complexity geometries on the unitary group (though as we will see a slightly smaller class than was proved to have exponential diameter in [7]); I will also show that Eq. 98 can be used to establish-for the first time in the literature-an exponential diameter for a broad class of complexity geometries on state space.
In the rest of this paper, all equalities and inequalities are exact.In this section we will only ask about the part of the answer that is exponential in N , and so will ignore all factors sub-exponential in N .For this reason, we will write '∼' and ' ∼ < ' not '=' and '<'.

Diameter of complexity geometry of unitary group
The main results of this paper-given in Sec.3.6-are inequalities that upperbound the number of gates C gates needed to approximate any unitary that can be reached by a path of length L. Alternatively, if we already know C gates , the inequalities lowerbound L. Using this, let's lowerbound the L needed to approximate a typical unitary.
The unitary group on N qubits, U(2 N ), is 4 N -dimensional and a simple counting argument [29] tells us the number of gates needed to get close to a typical element is (Here and in the rest of this section we write "∼" not "=" because we have neglected all sub-exponential factors.)This is true no matter whether we measure 'closeness' in the Killing metric or in the operator norm, so to get the tightest bound on L typical we should use the Killing bound.Putting e.g.s(error) = 10 −2 into Eq.71 and dropping all subexponential terms gives Combining this with Eqs.21 and 100 gives, for each value of I, a lowerbound on the length of the complexity path required to reach a typical operator To get the tightest bound we can maximize over the choice of I, The greatest separation between a pair of unitaries in the complexity geometry-the 'diameter'-must be at least as big as the typical separation, so this result also bounds the diameter.Applying this to the three example metrics Eqs.22-24 gives For all three, the diameter is exponential in the number of qubits N (assuming exponential I cliff , α > 0, and x > 1).The bound on L typical , Eq. 103, is able to establish an exponential diameter even for some complexity metrics for which we were unable to prove polynomial equivalence in Sec. 3.For example, Eq. 103 establishes that the 'delayed cliff' metric defined by I k<N/4 = 1 & I k≥N/4 = 4 N has exponential diameter, even though for this metric C gates is not always a polynomial function of L and s(error).This is because to prove that the diameter is exponentially large we do not need to establish polynomial equivalence for all unitaries, only for typical unitaries.
In Ref. [7] I lowerbounded the diameter of the complexity geometry using a completely different approach that made no reference to gates, error budgets, or ϵ-balls.This bound was almost identical to Eq. 103, except with the N I −3/2 tightened to N I −1 .In Sec.7.5, I will speculate on how the method in this paper might be improved to also yield N I −1 .It is intriguing that these two dramatically different methods give almost the same bound.

Diameter of complexity geometry of Hilbert space
Because of the minimization called for in Eq. 95, it does not follow that just because the complexity geometry of the unitary group has a large diameter, the corresponding Hilbert-space complexity geometry must too.As a close analogy, consider the relative computational complexity of pairs of classical bit strings.To implement a generic Boolean function needs exponentially many gates, but any specific pair of bit strings can be connected with at most N NOT gates.Similarly, we will see that there are some penalty schedules for which we can prove the unitary group has exponential diameter but for which we cannot prove the state-space complexity geometry has exponential diameter.
The Hilbert space of N qubits is 2 N -dimensional and a simple counting argument tells us that the number of gates needed to get from one typical state to inner-product close to another is typical: (We write "∼" not "=" because we have again neglected all sub-exponential factors.)Our first bound on the diameter of Hilbert space comes from Eq. 98. Putting e.g.ϵ = 10 −2 into Eq.98, combining with Eq. 107, and then dropping all subexponential terms gives diameter ∼ > max I min 2 The second bound comes from Eq. 99.Putting e.g.ϵ = 10 −2 into Eq.99, combining with Eq. 107 and then dropping all subexponential terms gives diameter ∼ > max I min 2 Applying these bounds to the cliff metric Eq.22 and the binomial metric Eq.23 gives For both of these metrics, the sufficient conditions for exponential diameter (exponentially large 2 −N I cliff , and α > 1 respectively) are the same as the sufficient conditions for polynomial equivalence of polynomial lengths, Eqs. 5 and 6.By contrast for no O(1) value of x does the exponential metric meet sufficient conditions Eqs. 5 or 6, but nevertheless for x > 10.27... our bounds show that the diameter of the state-space complexity geometry is exponentially large.For x > 10.27..., Eq. 108 lowerbounds the diameter by We have just proved a theorem about the diameter of the complexity geometry.Even though the theorem itself knows only about the complexity metric, the proof leans heavily on the Killing metric.It is curious that the proof needed to make use of this crutch, and curious that this crutch is so physically motivated.Had we just been presented with the complexity metric, with no mention of the Killing metric, how would we have known that inventing a fictitious metric would prove so useful?

Discussion
In this paper we have proved that a broad class of complexity geometries have a computational power that is polynomially equivalent to quantum circuits.A summary of our results is given in Sec.1.1.

Overview of method
Let's review how we proved these equivalencies.One direction of the polynomial equivalence is straightforward: it is easy to show that the complexity geometries are no less powerful than circuits.This is because any quantum circuit composed of two-local gates describes a zig-zagging path through the unitary group that moves only in one-and two-local directions.Since moving in one-and two-local directions is cheap for all the complexity geometries in our universality class, it is cheap to follow the zig-zag path exactly (see Eq. 26).Error need not be tolerated.
The nontrivial direction-and the principal result of this paper-is proving that the complexity geometries are no more powerful than quantum gates.The problem is that if we want to use gates to exactly follow every twist and turn of a geodesic through the complexity geometry, it is prohibitively and usually infinitely expensive.Let's describe the aspects of this problem, and how they are solved by our approximations.
A geodesic through the complexity geometry is generated by a Hamiltonian H(t) = I h I (t)σ I .By contrast for circuits the primitive operation is acting with a gate, which can enact any element of U(4) on any pair of qubits.Each gate costs 1, no matter which element of U(4) we use.As we saw in Sec.3.5, by composing these gates we can implement any monomial unitary e iσ I δ , for any generalized Pauli σ I and any δ, at modest cost.
The most severe problem with trying to use gates to exactly follow a geodesic is the 'microstep' problem.In the complexity geometry, the cost to implement a small step e iσ I δ scales like δ, and so is cheap for small δ.Geodesics can change direction with no penalty.Conversely the number of gates required to implement e iσ I δ is never less than one, no matter how small the nonzero value of δ.There is no discount for small δ.This means that to exactly follow a geodesic described by a time-dependent Hamiltonian requires an infinite number of gates.This forces us to abandon attempting to follow the geodesic exactly, and instead settle for approximating the geodesic as best we can with zig-zag segments.These segments have to be short for the approximation to be accurate, but shorter segments mean more gates.
But the time-dependence of H(t) is not the only problem.Even when the Hamiltonian that generates the geodesic is independent of time, gates still struggle to keep up.This is due to the 'Pythagoras' problem.When gates implement a monomial e iσ I δ , they move in a cardinal direction in the tangent space.But when geodesics move through the unitary group, they follow Hamiltonians that may be superpositions over all 4 N of the generalized Pauli basis directions, H = I h I σ I .A geodesic that moves diagonally will be much shorter than a gate-path that follows a Manhattan grid.Since the grid is 4 N -dimensional, the Pythagorean penalty could be as large as 2 N .This forces us to cut down the number of directions.We divide directions into 'expensive' and 'cheap'.For the expensive directions (that cost more than some value I), we simply prune them out of the Hamiltonian.Excision introduces error, but at fixed geodesic-length L and when I is sufficiently large the error must be small, since the Hamiltonian cannot have a large component in expensive directions without the geodesic becoming long.For the cheap directions we simply eat the squareroot slow down (and budget for the error caused by the monomials not commuting).Next we optimize the dividing line between cheap and expensive, I, to give the tightest bounds.This is altogether a more delicate balancing act than what Nielsen et al. had to do for the cliff metric [2], since in that case there was a small number of cheap directions and the rest were all exponentially expensive, whereas our more gradual schedules must contend with the proliferation of mid-market penalty factors that are both moderately inexpensive and vexatiously numerous.

Complexity geometries and BQP
The complexity class 'bounded-error quantum polynomial time' (BQP) is the set of decision problems that can be solved in polynomial time on a quantum computer, with low probability of error for every input.This is known to be equivalent to the set of decision problems that can be solved using a uniform family of quantum circuits with polynomially many gates (again requiring low probability of error for every input).
Let's collect three relevant results.First, we saw in Eq. 26 that complexity geometries can efficiently simulate circuits exactly (so long as I 2 is at-worst polynomially large).Second, we saw that circuits can efficiently simulate complexity geometries that satisfy Eqs. 5 or 6, so long as we tolerate small operator-norm errors.Finally in Sec. 5 we saw that tolerating small operator-norm errors will lead to low probability of error even for the worst-case inputs.
These results tell us that the ability to enact every polynomial-length path through a complexity geometry that satisfies Eq. 3 and Eqs. 5 or 6 confers the ability to solve a class of decision problems with high probability for every input if and only if those decision problems are in BQP.Any complexity geometry that satisfies Eq. 3 and Eqs. 5 or 6 thus gives an alternative definition of BQP-completeness.
(We will also need to dispense with two technicalities.First, the definition of BQP requires not only that there be a family of circuits that implements the unitary, but also that the family of circuits be 'uniform', meaning that there be an efficient classical algorithm that compiles the circuit.This condition is satisfied by our simulation algorithm: the procedure in Sec. 3 not only proved the existence of a circuit that approximates a given geodesic through the complexity geometry, it also gave an explicit polynomial-time recipe for constructing it.Second, the definition of BQP allows us to act the circuit not only upon the N -qubit input state, but also on an additional set of 'ancilla' qubits.These qubits are all initialized at zero, and can be thought of as a quantum workspace.One might worry that the ability to use these ancilla qubits makes the circuit definition more powerful than the complexity geometry definition, since the circuit now describes a path through the space of unitaries on the extended Hilbert space that also includes the ancilla qubits.However, no polynomial-size circuit can reach more than a polynomial number of ancillas, so we can use the same method that led to Eq. 26 to construct a polynomial-length complexity geometry path through the extended Hilbert space that exactly emulates the circuit.)

The role of the binomial metric
Recall that the binomial schedule, Eq. 23, is given by For α = 1 the penalty factor of a k-local basis direction σ I is equal to the total number of k-local basis directions, N k .The α = 1 binomial metric has shown up in three of our calculations, in each case playing a special role.Let's review them here.The first place the binomial metric plays a special role is in the sufficient condition Eq. 5 for polynomial equivalence between the complexity metric and the gate definition of complexity with operator-norm error.To control the operator-norm pruning error we need to be able to make I I(σ I ) −1 less than any polynomially small value by omitting from the sum at most polynomially many basis directions.As discussed around Eq. 91, this leads to a critical value α = 1.For α > 1 the metric satisfies the sufficient condition; for α ≤ 1 there is no guarantee that there are not polynomially long paths to unitaries that cannot be operator-norm approximated without superpolynomially many gates.(As discussed in Sec.7.5.1, it seems likely that by making use of the structure of the commutation relations we could also show the α = 1 metric is in the equivalence class.) The second place the binomial metric plays a special role is in Eq. 161.This is the sufficient condition that guarantees that being close in the complexity geometry implies being close in the operator norm.Again the quantity I I(σ I ) −1 is important, and for essentially the same reason, though this time we only require that I I(σ I ) −1 be at most polynomially large.The binomial metric with α ≥ 1 satisfies the sufficient condition, whereas for α < 1 there may be paths that are short in complexity distance but not short in operator-norm distance.
The final place the α = 1 binomial metric plays a special role is in Ref. [7].In that paper I lowerbounded the diameter of the complexity geometry.(The lowerbound was proved using a technique from differential geometry, and not using the quantumsimulation methods we deployed in Sec.6.1 of this paper to derive a weaker bound on the same quantity.)According to the lowerbound in [7], if N of the directions are cheap and the rest are very expensive, the total Killing distance required to hit every unitary scales like 2 N N −1/2 .This means the complexity distance needed to hit every unitary using only the exactly k-local directions is lowerbounded by 2 N I k /N k .If this lowerbound is tight, then the I k = N k metric is the critical 'load balanced' schedule for which the cost to synthesize a typical unitary using the exactly k-local directions is independent of k.

Equivalence class of complexity geometries
Let's take a geometrical perspective.From the point of view of geometry, the relevant result in this paper is not that right-invariant geometries that satisfy certain sufficient conditions are polynomial equivalent to the gate definition of complexity, it's rather that these right-invariant geometries are polynomially equivalent to each other.
In a previous paper [6], my collaborators and I conjectured that the long-distance geometry of right-invariant metrics should be markedly insensitive to short-distance parameters.Even if two spaces look very different at short scales, they may nevertheless give rise at longer scales to the same emergent effective geometry.We made an analogy with the UV/IR decoupling that happens in the theory of renormalization in quantum field theory.In our paper we gave simple examples of this phenomenon at work, and made quantitative conjectures for the large-separation distance function for high-dimensional right-invariant metrics on Lie groups.The results of this paper provide further support for these conjectures.The short-distance properties of the metric, like the curvatures and the small-separation distance functions, are given directly by the penalty factors.The long-distance properties of the metric like the large-separation distance functions are also formally determined by the penalty factors, but in a more convoluted fashion that shields the 'IR' from the details of the 'UV'.What we have shown in this paper is that even when two metrics differ exponentially in their short-distance properties (e.g. the binomial metric and the cliff metric differ exponentially in their assignment of penalty factors to moderate-weight directions) they may nevertheless be in the same polynomial equivalence class for approximate synthesis at long distances.

Prospects for improving bounds
Let's consider the prospects for tightening the bounds we have derived in this paper.

Incremental improvements
The most straightforward and least consequential improvements to Eqs. 71, 73, & 74 would be to try to improve the exponents of L, s(error), ||error|| op , and N I , while still tolerating bi-invariant error and while still settling for polynomial rather than linear equivalence.One approach might be to replace the first-order Trotterization used in Sec.3.3 with a higher-order Suzuki-Trotter formula; another might be to deploy some more state-of-theart quantum simulation techniques such as those in Refs.[34][35][36][37][38].
However, even within the confines of first-order Trotterization there is reason to be hopeful that the √ N scaling of the normalized Frobenius norm of the Trotter error in Eq. 58 might be improved.As discussed in Appendix B, the √ N scaling of the O(δ 2 ) error term is caused by the possibility of constructive interference between the various commutators.But using our freedom to choose the order of the factors within the Trotter product, we can arrange for the interference to be on-net destructive.If this resultlaid out in Appendix B-could be extended to all orders in δ, then the bound on the per-segment Trotter error described in Eq. 58 would improve from N 1/2 δ 2 to conjecture : This in turn would improve the upperbound in Eq. 71 to C gates ∼ < N I L 2 /s(error).This upperbound would be particularly intriguing because it would imply that the lowerbound on the diameter of the complexity geometry we could derive using the technique of Sec.6.1 would be identical to that proved using a completely different technique in [7].Another set of improvements would be to make use of the structure of the commutation relations amongst the k-local generalized Pauli's.In our analysis so far, we more-or-less made worst-case assumptions about the commutation relations.But for penalty schedules in which the cheap directions are the k-local directions with small k, we can make use of the fact that the commutation relations are highly structured.(As an example of the structure, the number of independent commutators of the N k k-local directions is much smaller than N 2 k ).This should allow us to improve the bounds both for getting Killing close and for getting operator-norm close.For example, consider the derivation of the bound on the operator-norm pruning error in Sec.3.1.2.We used the inequality || I h I σ I || op ≤ I ||h I σ I || op .But that inequality is generally not tight.As discussed in Sec.2.4.2 the operator norm is often smaller than the sum of its parts.By making use of the structure of the commutation relations between k-local operators we could sharpen our bound on the critical schedule in Sec.4.2.2 to be less expensive than , there is no prospect that this could show the critical metric to be any less punitive than Relatedly, we can relax our condition Eq. 3 that, to belong to the equivalence class, penalty schedules must assign two-local directions modest penalty factors.The motivation for this condition was to guarantee the 'only if' direction of the equivalence: to show that complexity geometries can easily emulate gates.But it's overkill.It is sufficient to require that there is some set of cheap generalized Pauli's whose nested commutators generate the entire algebra.With that set, we can use the technique of Sec.3.5 to make any gate.What this means is that we can permute the penalty factors amongst the generalized Pauli's and, subject only to the condition that there always be a cheap subalgebra whose nested commutators generate the entire algebra, the metric will remain in the same polynomial equivalence class.Such a permutation does not need to respect any notion of k-locality (although it must be a permutation rather than a more general rotation-it does need to keep the penalty matrix diagonal in the generalized-Pauli basis, see below).As an illustration of this, consider that there are actually two natural tensor decompositions of U(2 N ).One is the decomposition into N qubits; the other is the decomposition into 2N Majoranas.These two decompositions are related by the Jordan-Wigner transformation.The Jordan-Wigner transformation is a permutation that maps diagonal penalty matrices to diagonal penalty matrices, but does not respect k-locality, so that an operator that touches only a few qubits may touch many Majoranas, or vice versa.Just as we defined e.g. an 'exponential metric' with respect to qubits, which punishes generalized Pauli's proportional to the exponential of the number of qubits they touch, so too could we define a Majorana version of the 'exponential metric' that punishes generalized Pauli's proportional to the exponential of the number of Majoranas.A consequence of our analysis is that these two metrics are in the same quasi-polynomial equivalence class-subject to tolerating Killing error-despite rendering completely different directions cheap.Similarly, when we tolerate operator-norm error, a parallel argument to the one that leads to Eq. 91 tells us that the critical Majorana metric is I k = 2N k and that any penalty schedule parametrically more expensive than this will be in the same polynomial equivalence class as quantum circuits.
Finally, let us comment on the restriction that the penalty matrix I IJ must be diagonal in the generalized-Pauli basis.There is no prospect of being able to fully relax this condition: we will not be able to make a useful sufficient condition for polynomial equivalence that is a function only of the spectrum of I IJ , with no reference to what basis I IJ is diagonal in.To see the obstruction, consider a penalty matrix that, while generally having large eigenvalues, makes it cheap to move in a few Haar-random directions.These cheap directions will be approximately equal superpositions over all 4 N of the generalized Pauli's.No known simulation technique makes it cheap to approximate motion in a Haar-random direction using gates, and a counting argument tells us that no simulation technique could possibly work in the typical case (and vice-versa: the Haar-random directions will not themselves be able to easily simulate gates).It's not that the 'off-diagonal' penalty matrix gives a more powerful model of computation, it's rather that the power of the two models of computation are simply incommensurate.

Errors as measured in complexity geometry do not compose
In the bounds proved in this paper, we measured the length of the path L in the complexity geometry metric, but measured the error in a different metric (operator norm, Frobenius norm, or Killing).One might aspire to prove a stronger theorem in which we measure both the length of the path and the targeted error in the complexity geometry.Unfortunately, I will now show that we cannot prove such a bound with a simple upgrade of the method of this paper nor indeed using any strategy of divide-and-conquer.
In a divide-and-conquer strategy (such as the one we deployed in Sec. 3) we break the path into many small segments, bound the error of each small segment, and then bound the total error by adding up the segment errors.The 'bound the segment error' step is not the problem: for gradual penalty schedules (for which the penalty factor only gets large at large weight), we can bound the complexity-geometry error of each segment by noticing that the Taylor expansion for δ (see e.g.Eq. 134) is also an expansion in the weight of the nested commutator.Instead the problem is the 'add up the segment errors' step.Since the complexity geometry is only right-invariant and not also left-invariant, there is no guarantee that L(U ).The divide-and-conquer strategy will not work because composing two steps both of which have small error may nevertheless lead to a combined path that is unacceptably erroneous.

Beyond divide-and-conquer
Finally, and most ambitiously, we might aspire to completely eliminate the need to tolerate error, and perhaps even establish a linear bound between some definitions of complexity.Certainly the existence of a robust linear relationship between a set of definitions of complexity would be welcome news for the holographic complexity program, which via the AdS/CFT correspondence ascribes physical meaning-the size of the dual wormhole-to the numerical value of the complexity [39][40][41][42].The more robust this numerical value can be shown to be, the more robust the foundation of holographic complexity.If this is possible, it will require a radical departure from the methods of this paper, and require transcending not only Suzuki-Trotter but also the entire strategy of divide-and-conquer.This is probably not possible for gates.We have seen that it is challenging to follow complexity geometry paths with gates, because gates move through the unitary group only in clumsy zig-zagging quantum leaps.With such poor fine-motor skills, gates probably cannot yield bounds vastly superior to the ones we have already derived, so something qualitatively similar to Eqs. 71, 73, & 74 is about as good as it gets.
Where the situation is more promising is for one complexity geometry simulating another.As recounted in Sec.7.4, in a previous paper [7] my collaborators and I conjectured that one right-invariant metric on a high-dimensional Lie group can simulate another with much greater accuracy, and much lower overhead, than the analysis of Sec. 3 would predict.As well as some suggestive examples, we also gave a couple of concrete pieces of evidence that such an improved simulation strategy should exist.
First, the 'ball-box theorem' of Gromov [43] says, in our language, that for any rightinvariant metric on the unitary group, so long as the nested commutators of the cheap directions generate the entire algebra, then even if the other directions are infinitely expensive we can still economically fill out the entire neighborhood of a point, hitting anywhere within the neighborhood with zero error.This fact is completely invisible at any finite order in the Suzuki-Trotter expansion.
Second, in our paper we carefully examined the complexity geometry of a single qubit [7,8]-the squashed three-sphere-and found not only that we could hit every point with zero error, but also that the incremental cost of eliminating error got smaller and smaller the farther away we were aiming.This behavior makes no sense in the context of a Suzuki-Trotter expansion, or more generally in the context of any strategy of divide-andconquer that involves dividing the path into small segments, simulating each in turn, and then adding up the errors from each segment.Any divide-and-conquer strategy will predict that-just as we see with the formulas we derived in Sec.3-the greater the separation the harder it is to control the error.But it makes perfect sense in the context of a conjectured more holistic compilation strategy that from the beginning sees all the way to the very end of the simulation-not just the end of the segment-and is able to plan far enough in advance to make micro-adjustments to the trajectory en route that leverage the greater lever-arm provided by the greater distance to the destination.Such a global strategy would make essential use of the 'wiggle room' provided by the fact that Riemannian paths can change direction without penalty, and so there is an infinite dimensionality of small deformations to the path we can make as we seek to eliminate error.
This phenomenon can be expected to be particularly pronounced for high-dimensional spaces.While high-dimensional spaces have a greater number of directions in which to err, they permit a greater number of small deformations that can be deployed to fix those errors, and in Ref. [7] we argued that as a statistical matter the latter consideration should dominate.(We saw a foreshadowing of this phenomenon in the discrete context of Sec.7.5.1,where we were able to harness the exponential number of degenerate deformations of the first-order Trotter formula to force the errors to destructively interfere.) There are a number of tantalizing hints that point to the existence of a holistic simulation strategy that is qualitatively superior to any currently known.Unfortunately, all these hints are non-constructive.The path may exist, but I have no idea how to find it.

Curvature and complexity
Years ago, Nielsen laid out a program [1] to use the tools of differential geometry to prove complexity lowerbounds.His idea was that by recasting quantum complexity in geometric terms, we might harness the great machinery of that branch of mathematics, and bypass some of the barriers that impede more conventional approaches.Unfortunately, progress has been slow.Let me finish this paper with some speculation on why progress has been slow, and on why there is reason to hope it might not remain slow.An essential role will be played by Riemannian curvature.
The complexity geometry is curved.The largest sectional curvatures are when two cheap directions σ I and σ J commute to an expensive direction, giving approximately [44] This tells us that the cliff metric, Eq. 22, is very curved indeed.Two (cheap) 2-local directions that commute to a (maximally expensive) 3-local direction give a sectional curvature that is exponentially big.The cliff metric is so highly curved that while it is still technically Riemannian it is difficult to exploit its Riemannian-ness.Perhaps this explains why the Nielsen program has made little progress.The tools of differential geometry have a hard time getting purchase when the curvature is exponential.
But not all complexity geometries have this problem.Less abrupt penalty schedules like the exponential schedule and the binomial schedule-for which the penalty factors grow slowly with weight-have modest curvature [11].The commutator of a k-local direction and an m-local direction can never be more than (k+m-1)-local, so any penalty schedule that has I k+m−1 ∼ < I k I m will have modest sectional curvature for every section.
The results of this paper enlarge the equivalence class of complexity geometries known to be polynomially equivalent to the gate-counting definition of complexity.Previously, the class was only known to contain geometries-like the cliff metric-that have extremely high curvature.The enlarged class now includes geometries-like the binomial metricthat have modest curvature.Perhaps the tools of differential geometry that failed when applied to the cliff metric might now meet with more success. 3ith the smallest value of √ TrH 2 t.If the eigenvalues of Ht for this shortest geodesic are {λ 1 , λ 2 , . . .λ 2 N }, then the Killing distance is Since H generates the minimal geodesic, we must have for larger values of λ i we have gone the long way round the U(1) circle for that eigenvalue (see Fig. 1) and we can shorten the path by instead going the short way.This confirms that the most distant unitary from the identity 1 is the antipodal −1, which has λ i = π for all eigenvalues and sits at a distance s = π.Now let's consider the normalized Frobenius-norm distance, defined in Sec.2.4.1, (118) We wish to bound the ratio between the Killing distance and the normalized Frobeniusnorm distance, Bi-invariance then implies Eq. 37.

A.2 Bounding the error from pruning
Approximation 1, Eq. 47, was to drop all terms that have a penalty factor greater than some critical value I, This will introduce error, which we will now bound.Eq. 30 says that for any bi-invariant norm To bound the right-hand side of this expression, we will use Eq.45,

A.2.1 Pruning error for Frobenius norm
The normalized-Frobenius pruning error is upperbounded by Eq. 122 as We can us Eq. 123 to write Combining these equations gives Alternatively we could have derived the same inequality by applying Eq. 36 to Eq. 50.

A.2.2 Pruning error for operator norm
There are two different upperbounds we can place on the operator-norm pruning error, for the same reason there are two upperbounds in Eq. 42.
First bound.The first bound come from applying subadditivity to Eq. 122 to get Pe i H(t)dt − Pe i H P (t)dt op ≤ dt Let's upperbound the right-hand side using Eq.123.To maximize I |h I | at fixed I I(σ I )h 2 I , the worst-case scenario is maximize for some Lagrange multiplier c.This worst-case-scenario gives (129) Eliminating c, this implies Combining this with Eq. 127 gives Second bound.The second bound comes from applying Eq. 39 to Eq. 126 to give A.3 Bounding the error from averaging Noting that 1 < 2y −2 (e y − 1 − y) < 2 for 0 < y < 1, this implies Using Eq. 120, we can turn this from a bound on the Frobenius-norm error to a bound on the Killing error,

A.3.2 Averaging error for operator norm
The operator norm is submultiplicative, so using Eq.41, we have that Eq. 134 becomes Noting once again that 1 < 2y −2 (e y − 1 − y) < 2 for 0 < y < 1, this implies We could also have derived the same formula by applying Eq. 41 to Eq. 137.

A.4 Bounding the error from Trotterization
In this subsection we will be interested in bounding the norm of the Trotter error, There are two complications in bounding this expression: we must ensure that at noninfinitesimal δ, the higher-order terms do not make a dominant contribution; and we must deal with the possibility that the O(δ 2 ) terms might constructively interfere.These questions have received extensive study [45][46][47][48][49][50][51], but for our purposes-at the cost of not fully optimizing our estimates-we will use only the result (quoted in e.g.Eq. 2 of [49]) that for any submultiplicative bi-invariant matrix norm,

A.4.1 Trotter error for Killing distance
For a given σ I , the commutator [σ I , σ J ] will be orthogonal to the commutator [σ I , σ J ′ ] if J ̸ = J ′ , so there will be no interference in Eq. 145 and the terms inside the norm add in quadrature, Using that for the normalized Frobenius error ||[σ I , σ J ]|| F ≤ 2, together with Eq. 41, this gives err trot.
Using Eq. 120, we can turn this from a bound on the normalized Frobenius error to a bound on the Killing error,

A.4.2 Trotter error for operator norm
The Trotter error in the operator norm is bounded by Eq. 145 as B How does Trotter error scale with N ?
Consider Trotterizing the time-independent Hamiltonian H = N I h I σ I .How does the Frobenius norm of the Trotter error scale with N ?The difference between the first-order Trotter product and the Hamiltonian evolution we are aiming for is When there is interference, rather than adding in quadrature the terms may just add (for constructive interference) or cancel (for destructive interference).If the number of different terms that give rise to the same commutator is n, and if all those terms constructively interfere, then this could multiply the Frobenius norm by a factor of as much as √ n.How big can n be?There are about N 2 terms in the sum Eq. 150, but they cannot all be the same since Taking that into account, the worst-case scenario is n = N , which gives the √ N deterioration from the no-interference answer to our actual bound Eq. 148.
There are a number of ways we might seek to beat down this scaling.One way would be to argue that you would have to be fantastically unlucky [51] for all the signs to conspire such that all the interference is constructive.Another way would be to argue that given the commutator structure of the generalized Pauli's it is actually impossible to be so unlucky.
We'll take a different approach.Rather than rely on luck, we're going to take our fate into our own hands.We're going to use our freedom to re-order the terms in the Trotter product to make sure that the interference is on-net destructive.We will arrange the signs so that interference now works for us, and guarantee that the O(δ 2 ) error term is even smaller than if there were no interference at all.
In constructing the Trotter product, we could have put the monomial factors e ih I σ I δ in any order.The number of possible orders is N !.The effect of changing the ordering will be to change the signs of some of the terms in the sum in Eq. 150.Changing the sign of a given term will flip between constructive and destructive interference.
Let's write down an iterative procedure for arranging a favorable ordering.We will be handed each component monomial e ih I σ I δ in turn, and then we will do one of two things: we will either place the monomial at the very back of the Trotter product, or we will place it at the very front.We will make this choice based on which makes the interference most destructive.Specifically, we will choose to greedily minimize the Frobenius error || 1 2 I>J h I h J [σ I , σ J ]|| F , where the sum is taken only over the terms we have already placed in the Trotter product (and not those still waiting to be placed).
Let's see how this works.The first two monomials we can place in either order, let's choose e ihaσaδ e ih b σ b δ .At this stage the O(δ 2 ) error has just a single contribution, H a,b ≡ 1 2 h a h b [σ a , σ b ].We are now handed the third monomial.With only three Pauli's in play there's still no interference (because of Eq. 152), so we can place the third monomial anywhere.Let's put it at the back, giving e ihaσaδ e ih b σ b δ e ihcσcδ .This makes an additional contribution to the O(δ 2 ) error of H a,bc It is when we are handed the fourth monomial that things finally get interesting.We will either place the fourth monomial at the very back e ihaσaδ e ih b σ b δ e ihcσcδ e ih d σ d δ or at the very front e ih d σ d δ e ihaσaδ e ih b σ b δ e ihcσcδ .(One could also imagine placing it in the middle, but we won't need to make use of that possibility.)The new contribution to the O(δ By iterating this procedure for each monomial in turn-adding it either to the very front or the very back, guided by which gives the most destructive interference-we arrive at optimal order: 1 2 Thus by judicious choice of ordering we can make the O(δ 2 ) Frobenius error for the firstorder Trotter expansion be parametrically smaller in N than the worst-case ordering, and indeed can ensure that it is always smaller than if there were no interference at all.
To make this enhanced bound on the Frobenius error rigorous, we'd need to show that the O(δ 3 ) terms do not dominate the O(δ 2 ) term.Such an enhanced bound would be particularly intriguing for what it would imply about the diameter of the complexity geometry.Combining the enhanced bound with the method of Sec.6.1 (and folding the approximations of Secs.3.2 and 3.3 into a single step) would give an improved lowerbound on the diameter of complexity geometries, and this improved lowerbound would exactly match what we derived by a completely different (differential geometry) technique in [7].
(Note that our sorting procedure does not expend all of our freedom to re-order, since it can only reach 2 N out of the N !possible orderings.This additional unused freedom might be of assistance in generalizing this method to higher-order Suzuki-Trotter formulas.) C When does close in complexity geometry imply close in operator norm?
Let's prove a further lemma.This lemma was not needed to establish the results of this paper, but may be useful in attempting to improve them.We will derive a sufficient condition on the penalty schedule such that two unitaries being close in the complexity geometry implies that they are also close in the operator norm.Let's consider the minimal geodesic through the complexity geometry.This will be generated by some Hamiltonian H(t) Eliminating c, and using bi-invariance, gives the bound This tells us that polynomially small complexity distance implies polynomially small operator-norm distance so long as the penalty schedule is sufficiently punitive, sufficient condition: This condition is satisfied by the cliff metric Eq.22 with I cliff > 4 N and the binomial metric Eq.23 with α ≥ 1.As discussed in Sec.7.3, this is yet another place in which the α = 1 binomial schedule shows up as a critical metric.We can also lower bound the operator-norm distance.Combining Eqs.37, 39, and 45 gives The lowerbound is saturated by those monomial unitaries e iσ I δ that are generated by a short step in the most expensive generalized-Pauli directions, I(σ I ) = I max .
polynomial C gates can polynomial C gates can get Killing close get || • || op close bi-invariant I k = 1 (Killing metric) hard cliff

<
l a t e x i t s h a 1 _ b a s e 6 4 = " i F G Z s b O R s / j r G n b m B Q l 1 H v + l i n w = " > A A A B 5 X i c b V D L S g N B E O y N r x h f U Y 9 e B o P g K e x K U I 9 B L x 4 j m A c k M c x O e p M h s w 9 m e o W w 5 B O 8 i H h R 8 H f 8 B f / G y e O S x I K B o q q G 7 m o / U d K Q 6 / 4 6 u Y 3 N r e 2 d / G 5 h b / / g 8 K h 4 f N I w c a o F 1 k W s Y t 3 y u U E l I 6 y T J I W t R C M P f Y V N f 3 Q / 9 Z s v q I 2 M o y c a J 9 g N + S C S g R S c r N T C 5 0 w y m r B e s e S W 3 R n Y O v E W p A Q L 1 H r F n 0 4 / F m m I E Q n F j W l 7 b k L d j G u S Q u G k 0 E k N J l y M + A C z 2 Z Y T d m G l P g t i b V 9 E b K Y u 5 X h o z D j 0 b T L k N D S r 3 l T 8 z 2 u n F N x 2 M x k l K W E k 5 o O C V D G K 2 b Q y 6 0 u N g t T Y E i 6 0 t B s y M e S a C 7 K H K d j q 3 m r R d d K 4 K n v X 5 c p j p V S 9 W x w h D 2 d w D p f g w Q 1 U 4 Q F q U A c B C t 7 g E 7 6 c g f P q v D s f 8 2 j O W f w 5 h S U 4 3 3 9 t J I t W < / l a t e x i t > e it < l a t e x i t s h a 1 _ b a s e 6 4 = " C o d X m K 3 O e d p w 2 Q M 8 D 2 b u C 0 T c t o l o U 9 B P 6 i / 0 b 5 q n t j D q w M D h n D P c e 6 4 X K 2 m Q 0 i 9 n Y 3 N r e 2 c 3 s 5 f d P z g 8 O s 6 d n L Z M l G g u m j x S k e 5 4 z A g

Figure 1 :
Figure 1: The group U(1) is a circle.The Killing distance from the identity to U = e it is the distance around this circle, s(1, e it ) = t.The normalized Frobenius-norm distance is the length of the chord, ||1 − e it || F = 2 sin t/2.
as discussed in Sec.2.4.2, the inequality || I h I σ I || op ≤ I ||h I σ I || op is saturated if we only include the N k commuting k-local generalized Pauli's that are exclusively composed of 1s and σ z s (with no σ x s or σ y s).) = I h I (t)σ I .The length is L = dt I I(σ I )h I (t) 2 .(157)On the other hand Eq. 30 tells us that operator norm is upperbounded by||Pe i dtH(t) − 1|| op ≤ dt||H(t)|| op = dt|| I h I (t)σ I || op ≤ dt I |h I (t)| .(158) How large can I |h I | be at fixed I I(σ I )h 2 I ?For some Lagrange multiplier c, the maximum is given by h I = c I(σ I ) → I I(σ I )h I (t) 2 = Approximation 2 was to average the Hamiltonian within each segment, Eq. 53, replacing H P (t) with H av. ≡ δ −1 δ 0 H P (t)dt.We can upperbound the norm error this introduces using the Dyson expansion err av ≡ exp i δ 0 Let's discuss the normalized Frobenius norm of the O(δ 2 ) term.The complication is going to be interference.Interference happens when more than one term in the sum gives rise to the same operator-when [σ I , σ J ] is not orthogonal to [σ K , σ L ].If there were no interference, then the terms in the sum would just add in quadrature.Using ||[σ I , σ J ]|| F ≤ 2, we would have an expression without multiplicative factors of N , 2 ) error will be H abc,d ≡ [h a σ a + h b σ b + h c σ c , h d σ d ] if we placed e ih d σ d δ at the back, or −H abc,d if we placed e ih d σ d δ at the front.The norm-squared of the error is ||H a,b + H ab,c ± H abc,d || 2 F = ||H a,b + H ab,c || 2 F + ||H abc,d || 2 F ± 2Tr[(H a,b + H ab,c )H abc,d ] .(154) If the last term contributes positively we have constructive interference; if it contributes negatively we have destructive interference.We get to choose, and we choose destruction.With this choice we have min ± ||H a,b + H ab,c ± H abc,d || 2 F ≤ ||H a,b || 2 F + ||H ab,c || 2 F + ||H abc,d || 2 F .