Efficient Quantum Circuits for Diagonal Unitaries Without Ancillas

The accurate evaluation of diagonal unitary operators is often the most resource-intensive element of quantum algorithms such as real-space quantum simulation and Grover search. Efficient circuits have been demonstrated in some cases but generally require ancilla registers, which can dominate the qubit resources. In this paper, we point out a correspondence between Walsh functions and a basis for diagonal operators that gives a simple way to construct efficient circuits for diagonal unitaries without ancillas. This correspondence reduces the problem of constructing the minimal-depth circuit within a given error tolerance, for an arbitrary diagonal unitary $e^{if(\hat{x})}$ in the $|x>$ basis, to that of finding the minimal-length Walsh-series approximation to the function $f(x)$. We apply this approach to the quantum simulation of the classical Eckart barrier problem of quantum chemistry, demonstrating that high-fidelity quantum simulations can be achieved with few qubits and low depth.

Quantum computation within the circuit model relies on the ability to construct efficient sequences of elementary quantum operations, or gates, that produce a faithful representation of the unitary operators appearing in quantum algorithms. We consider the situation where the unitary of interest is diagonal. Some important algorithms where this applies are quantum simulation of quantum dynamics [1][2][3][4], quantum optimization [5], and Grover search [6]. one-and two-qubit gates [7]. However, one is usually interested in circuits that approximate the unitary to within some error tolerance, ε. In order to be of practical value, such a circuit must be efficient-the number of one-and two-qubit gates should scale no worse than ε ( ) ( ) / O n poly , 1 [8]. Efficient circuits for diagonal unitaries have been demonstrated, but with the requirement of ancilla qubits. In the real-space quantum simulation algorithm [1,2], for example, studies indicate that ancilla registers often dominate the qubit resources [3,9]. Due to limitations in the coherence time and number of qubits in any future practical implementation of quantum computing, it is desirable to decrease these resources as much as possible.
In this paper, we provide a constructive algorithm that significantly reduces the qubit resources through the use of a correspondence between Walsh functions [10] and a basis for diagonal operators. Our construction builds on earlier work that established a connection between the Walsh-Hadamard transform and the circuit required to implement a diagonal unitary [11,12]. These authors showed that an n-qubit diagonal unitaryê ( ) if x can be implemented exactly using a circuit with − 2 1 n z-axis rotation operators with rotation angles proportional to the discrete Walsh transform coefficients of f (x). The circuit depth 3 was found to be − + 2 3 n 1 [12]. In this analysis f (x) was a real-valued function of an n-bit string, taking 2 n discrete values.
Here, we are concerned with approximate implementation of diagonal unitaries, as described above. In particular, we are interested in the important case where f(x) is a real-valued function of a continuous variable x, rather than an n-bit string. We show that this can be accomplished using circuits of a similar type to those in [11,12]. By identifying elementary operators whose eigenvalues in the x-basis are Walsh functions, we show that an arbitrary diagonal unitary may be approximated efficiently on a finite register by using a Walsh-Fourier series approximation for f(x). Since the Walsh basis is the only basis with this property, it follows that it is the natural basis for representing arbitrary diagonal unitaries. In this paper, we quantify the error resulting from such approximations, and describe how to do them optimally.
We first consider approximating f(x) with a partial Walsh-Fourier series containing 2 k terms, with ⩽ k n, where n is the number of qubits. Since the bound on the error in this approximation is inversely proportional to the number of terms [13], the resulting gate sequence is efficient. Next, we address the problem of finding the shortest possible gate sequence that approximates the diagonal unitaryê ( ) if x with error ε. This problem reduces to finding the . This is in general an integer programming problem [14], but its solution can be found to a good approximation by throwing away the coefficients of the Walsh-Fourier series for f that fall below a certain bound [13][14][15]. This can lead to a significant reduction in circuit depth beyond the partial series approximation.
As a simple yet practical demonstration of these ideas, we describe a 1D implementation of the real-space quantum simulation algorithm for a single particle tunneling through an Eckart barrier [16]. This problem is a benchmark in classical computational methods of quantum chemistry for simulating quantum dynamics. This example illustrates that high-fidelity quantum simulations without ancillas can be achieved with few qubits and low depth.

Walsh functions and operators
In this section we identify the mapping between Walsh functions and a basis for diagonal operators. We begin with some definitions.

Walsh functions
The Paley-ordered Walsh functions are defined on the continuous interval ⩽ < x 0 1 as [13] = − In terms of the bits of j, k, and x, we have To complete the analogy with Fourier series, we recall that orthonormal functions arise as the irreducible representations of symmetry groups [17]. For trigonometric functions, the relevant group is that of translations. For Walsh functions up to order 2 n , it is the group ⊗n 2  , which is formed by a basis for diagonal operators on n qubits. These are the Walsh operators introduced below.

Walsh operators
The state of an n-qubit register in a quantum computer is typically expressed as a superposition, ψ = ∑ = − c k k N k 0 1 , of = N 2 n states in the computational basis [6], defined as  Functions f(x) of a continuous variable, ∈ [ ) x L 0, , may be represented in this way if they are discretely sampled. Here we will assume a constant sampling interval, and define the sampling To simplify the discussion, we use units such The results for general L are obtained by replacing w(x) by ( ) / w x L . We will also use the notation k , x k , and x interchangably, dropping the subscript k on x when there is no loss of clarity.
LetẐ i denote the Pauli Z operator acting on the ith qubit, We define the Walsh operator of order j on n qubits aŝ For the case of Rademacher functions, this relationship was pointed out by Sornborger [18], who observed that the eigenvalue of a single Pauli Z gate acting on the ith qubit in (5) is a binary-valued function of x with period The locations of theẐ operators inŵ j correspond to the positions of the 1ʼs in the bitreversed binary string for j. For example, the Walsh operator with j = 6 on n = 3 qubits iŝ . The gate representation of w 6 is shown in figure 2. The general Walsh operator requires O(n) gates for its implementation: a single Z gate and up to n 2 controlled NOTs. Using (4), any diagonal operator on n qubits may be expanded as a sum of = N 2 n Walsh operators,ˆ= ∑= Each term in the product,ˆ=Û e j ia w The circuit forÛ is given by successively applying the circuits forÛ j . Figure 3 shows two equivalent ways of implementing one such term, specificallyÛ 7 . As seen in this figure, the gate configuration is not unique. We adopt the convention in figure 3 where the CNOTs are always targeted on the highest order qubit possible. Then a precise rule for constructing the circuit forÛ j can be given in terms of the binary expansion of j: a rotation gate, − ( ) R a 2 z j , is placed on the qubit corresponding to the most significant non-zero bit (MSB) of j. Then CNOTs are placed on either side, targeted on the same qubit as the rotation gate, and controlled on the qubits corresponding to the 1ʼs other than the MSB in the binary expansion of j. This rule will be used in the next section to construct an optimal circuit forÛ. Equation (7) easily generalizes to more than one dimension. For a d-dimensional system represented by d registers of n qubits each, the single Walsh operators will be replaced by tensor products of up to d Walsh operators over the different registers. The exact number depends on the number of variables in the function f. For applications to quantum simulation, this does not significantly increase the gate complexity as interaction potentials are generally few-body. Since products of Walsh operators are also Walsh operators, the expansions have the same form as (7) with = N 2 dn . The utility of (7) is that it relates the circuit depth ofÛ to the number of coefficients in the Walsh series for f. If some of these coefficients are zero or may be neglected, this leads to a reduction in the circuit depth for implementingÛ. We will examine such cases below, but first we discuss methods to find optimal-depth circuits given a Walsh series for f, as well as to calculate the circuit depth based on the number of non-zero coefficients a j .

Optimal circuit constructions
as it will only contribute a global phase, and hence will not affect the final result of any algorithm), for the unitary in (7) 7 7 in (7). We use the compact notation ≡ − qubits, the general circuit found in this way is shown in figure 4, with vertical dashed lines separating the different Walsh operators. However, as Bullock and Markov have shown [7], this circuit construction is not optimal. They find that it is possible to reduce the gate count to − + 2 3 n 1 and prove that this is optimal within a factor of two. In this section, we show that putting the Walsh operators in sequency order automatically produces the optimal circuit. In addition, we describe how to calculate the gate count for an arbitrary number of Walsh functions ′ < N N , where = N 2 n . This gate count scales as ′ ( ) O N . We begin by recalling the relationship between the gate sequence for a Walsh operatorŵ j and the binary expansion of its index, j. This gate sequence is given by placing a rotation gate, − ( ) R a 2 z j , on the qubit corresponding to the most significant bit of j, and CNOTs on either side targeted on the same qubit and controlled on the qubits corresponding to the other non-zero bits of j. Since two identical CNOTs (CNOTs with the same targets and controls) cancel, it follows that the CNOTs between the rotation gates in adjacent Walsh operators are controlled on the non-zero bits of the bitwise XOR between their indices. For example, given a circuit withŵ 6 followed byŵ 7  In order to minimize the number of CNOTs between rotation gates in a circuit containing all − 2 1 n Walsh operators, the operators must be ordered in such a way that adjacent indices have the minimal number of binary transitions between them. Such an ordering is given by the Gray code [19], where the number of binary transitions between adjacent indices is exactly one. Walsh functions sorted in this way are called sequency ordered [19]. This is also the order of increasing number of zero crossings. For consistency with the rest of the paper, we keep the indices of Walsh functions and operators in Paley order, and do not relabel them in sequency order. In addition, we partition the Walsh functions according to their common MSBs. which is the optimal gate count found by Bullock and Markov [7].
To illustrate the procedure, we give an example with n = 3 qubits. First we reorder the binary strings corresponding to the indices j (except j = 0) in Gray code. This is given by From top to bottom, this gives CNOTs controlled on qubits 2, 3, 2, and 3, respectively. These go to the left of each rotation gate. In this way, we reduce the initial non-optimal circuit in figure 4 to the optimal circuit in figure 5. While this method can be used to generate the optimal circuit containing all − = − N 1 2 1 n Walsh functions, it is not optimal when applied directly to the case of ′ < − N N 1 Walsh functions, since adjacent elements in the sets G i will now contain multiple binary transitions. In this case, we can use the following commutation relations between CNOTs to simplify the circuit further. Letting C j i denote a CNOT with control i and target j, The first equation states that a Z gate commutes with the control of any CNOT. The second and third equations state that CNOTs with common targets but different controls, or common controls but different targets, commute. The final equation describes the case when the target of one CNOT is the control of another. Then commuting the two introduces an additional CNOT that is controlled by same qubit as the first and targeted on the same qubit as the second: ð14Þ Using these rules, we find that in most cases the gate count for a circuit with ′ < N N Walsh operators on = ( ) n N log 2 qubits can be reduced to ′ ( ) O N gates.

Efficient circuits for diagonal unitaries
In this section, we consider approximating f(x) with a partial Walsh-Fourier series. If n is fixed, this leads to an efficient circuit forÛ. Otherwise it gives the minimum n necessary to represent U within the given error, ε. Following [6], we define the error in implementing the operatorV instead ofÛ aŝˆ≡ˆ−( is the spectral norm of the operatorÂ. The maximum is taken over all normalized wavefunctions As discussed in the introduction, the circuitˆε U approximating the operatorÛ with error ε is efficient if it can be implemented using ε ( ) and is a constant independent of n as long as ⩽ k n. This proves that the operatorê ( ) if x k is an efficient gate sequence forê ( ) if x for any ⩾ n k. Although efficient circuits for diagonal unitaries can be constructed using partial Walsh-Fourier series, the circuit depth can often be reduced further by minimizing the number of Walsh functions used in the series for f(x). To be precise, consider the problem of finding a Walsh series f x ( ) with the smallest possible number of Walsh coefficients. This is an integer programming problem, whose solution can be found numerically given f(x) and ε. However, Yuen has shown that simply throwing away the terms of the Walsh-Fourier series for f(x) below an appropriate bound gives close to optimal results [14]. This is a much simpler procedure, and we apply it in the example in the next section. The solution gives the non-zero Walsh coefficients a j as well as the minimum number of grid points, 2 n , needed to represent the resulting function. This information can then be combined with the circuit optimization methods described in the previous section to obtain a minimal-depth and minimal-width circuit forÛ.

Quantum simulation example: eckart barrier
As a practical example of the ideas above, we analyze the quantum simulation of tunneling through an Eckart barrier by numerically implementing the real-space algorithm of Wiesner and Zalka [1,2]. The Eckart barrier problem is a benchmark in classical computational methods of quantum chemistry for simulating quantum dynamics and transition states of chemical reactions. The solution to the scattering problem can be used for calculating chemical reaction rates [3].

Real-space algorithm
We evaluate the time evolution of a quantum system, iHt using the real-space, or first quantized, representation of the wavefunction in terms of position eigenstates, Each x i is discretized, and represented on the quantum computer in the computational basis as in (5). Using d registers of n qubits each, the basis states corresponding to a grid of 2 dn points can be represented. Equation (15) is evaluated using the first-order Trotter formula [1,3,20,21]. Assuming a time-independent Hamiltonianˆ=ˆ+( ) ( ) where δ / t t is an integer called the Trotter number. TheF operators are quantum Fourier transforms (QFTs) and are inserted to diagonalize the kinetic energy operatorsK . The potential energyV is already diagonal in the position representation.

Error analysis
It is generally not possible to evaluate the diagonal unitary kinetic and potential propagators in (16) exactly. At the very least, there will be sampling error in going from the continuous x to the discrete x k representation. This contribution to the total error is in addition to the Trotter error from splitting the propagator into non-commuting parts. LettingÛ denote the operator on the right-hand side of (16), the total simulation error satisfies where E G denotes the gate error in evaluating the kinetic and potential propagators, α δ t t is the first order Trotter error, and , is a problem-specific constant.
As we have seen, the gate error in evaluating a diagonal unitary is equal to the absolute error in the exponent. For the potential energy propagator, approximating V(x) with a function iV t iV t . Letting ε V be the error in V(x) and ε K be the error in K(p), the total gate error for the algorithm satisfies The total error in evaluatingÛ therefore satisfies α δ .
iHt V K Since the diagonal unitaries can be implemented efficiently, and the QFT requires n poly ( ) gates, the entire algorithm is efficient, and requires ε ε δ ( ) gates. The parameters δt, ε V , and ε K may be varied to obtain the shortest gate sequence for the simulation given a combined total error tolerance. Here, we only consider the problem of finding the shortest gate sequence for a single Trotter step. This corresponds to finding the shortest Walsh series for the approximate potential and kinetic energies given ε V and ε K .

Simulation
The Eckart barrier is defined as A a x sech ( ) [16], and is plotted in figure 6 for A = 1, a = 0.05. Also shown is a plot of a 19-term Walsh series for this potential that is accurate to 10%. This series was constructed by including a subset of coefficients from the full Walsh-Fourier series starting from the largest, then the next largest, etc until the function was reproduced within the required 10% accuracy. We used a 2 n -term Walsh-Fourier transform with n = 13 to approximate the infinite series. This introduces a discretization error of about 0.1%. The 19 largest coefficients included in the approximate Walsh series are those with Paley indices: 1,2,4,7,8,11,13,14,16,19,21,22,25,32,35,37,38,64, and 67. The smallest power of 2 that is greater or equal to every index is = 2 128 This approach gives the minimal set of Walsh-Fourier coefficients, and is usually very close to the fully optimized solution found when the magnitudes of the coefficients are allowed to vary [14]. We find that only 7 qubits are necessary to represent the potential to 10% accuracy, with the given set of parameters. If > n 7, only the qubits corresponding to the seven most significant digits in the register will be used. This illustrates the resource savings possible if n is large. Although useful for illustrating the approach, the classical algorithm we described for finding best subset of Walsh-Fourier coefficients to approximate a function is not efficient since it requires first calculating a high-dimensional Walsh-Fourier transform. In fact it is not necessary to do this. For a given k, efficient methods exist for finding the best k-term Walsh-Fourier series approximation to a given function without calculating the entire transform [22].
Had we opted to use a partial Walsh-Fourier series (keeping all 2 n coefficients for some integer n) to approximate the Eckart barrier, we would also find that ⩾ n 7 is required to obtain better than 10% accuracy. (The discretization error with n = 7 is 7.8%, and with n = 6 is 15.6%.) The efficient circuit produced in this way requires a total of − = 2 3 125 7 gates, of which = 2 64 6 are rotation gates. (The Eckart barrier is an even function. Therefore half the Walsh coefficients are zero.) In contrast, the truncation described in the previous paragraph gives a circuit with a total of approximately 50 gates, of which 19 are rotation gates. This is more than a factor of two improvement.
We performed numerical simulations of equation (16) for the Eckart barrier with multiple error tolerances on the potential. The wavefunction was initialized to a Gaussian wavepacket traveling towards the barrier. Since there is a known polynomial-time algorithm for the kinetic energy propagator,ê /  In blue is the exact function and in green is a 19-term partial Walsh-Fourier series, which is accurate to 10%. The largest Paley index of these terms is 67. Therefore at least seven qubits are needed to implement this approximation.
For the present example, (17) drastically overestimates the total error, since it is a bound over all wavefunctions. To quantify the error in the simulation for the particular initial states under consideration, we found it more convenient to use the fidelity, defined as used a 10-qubit simulation with maximum possible resolution (including all Walsh operators) and 1000 time steps. By numerically analyzing the scaling of the error with the number of time steps, we found this number of time steps gives a Trotter error of less than 1%.

Conclusion
We showed that Walsh functions correspond to a basis for diagonal operators, and used this Walsh operator basis to prove that efficient circuits can be constructed for diagonal unitaries. We also described how the truncated Walsh-Fourier series for a function f(x) leads to an approximately minimal-depth circuit for the diagonal unitaryê ( ) if x given an error tolerance on f. This circuit has a gate count that scales proportionally to the number of Walsh functions in the series for f(x). We applied this approach to the quantum simulation of tunneling through an Eckart barrier, demonstrating that high-fidelity quantum simulations without ancillas can be achieved with few qubits and low depth.