Efficient quantum circuits for dense circulant and circulant like operators

Circulant matrices are an important family of operators, which have a wide range of applications in science and engineering-related fields. They are, in general, non-sparse and non-unitary. In this paper, we present efficient quantum circuits to implement circulant operators using fewer resources and with lower complexity than existing methods. Moreover, our quantum circuits can be readily extended to the implementation of Toeplitz, Hankel and block circulant matrices. Efficient quantum algorithms to implement the inverses and products of circulant operators are also provided, and an example application in solving the equation of motion for cyclic systems is discussed.


Introduction
Quantum computation exploits the intrinsic nature of quantum systems in a way that promises to solve problems otherwise intractable on conventional computers.At the heart of a quantum computer lies a set of qubits whose states are manipulated by a series of quantum logic gates, namely a quantum circuit, to provide the ultimate computational results.A quantum circuit provides a complete description of a specified quantum algorithm, whose computational complexity is determined by the number of quantum gates required.However, quantum computation does not always outperform classical computation.In fact there are many known N -dimensional matrices that cannot be decomposed as a product of fewer than N − 1 two-level unitary matrices [1], and thus cannot be implemented more efficiently on a quantum computer.An essential research focus in quantum computation is to explore which kinds of linear operations (either unitary or non-unitary) can be efficiently implemented using a series of elementary quantum gates (i.e.two-level unitary matrices) and measurements.
Remarkable progress has been made in such an endeavour, most notably the discovery of Shor's quantum factoring algorithm [2] and Grover's quantum search algorithms [3].
Significant breakthroughs in the area also included the development of efficient quantum algorithms for Hamiltonian simulation, which is central to the studies of chemical and biological processes [4][5][6][7][8][9][10].Recently, Berry, Childs and Kothari presented an algorithm for sparse Hamiltonian simulation achieving near-linear scaling with the sparsity and sublogarithmic scaling with the inverse of the error [10].Using the Hamiltonian simulation algorithm as an essential ingredient, Harrow, Hassidim and Lloyd [11] showed that for a sparse and well-conditioned matrix A, there is an efficient algorithm (known as the HHL algorithm) that provides a quantum state proportional to the solution of the linear system of equations Ax = b.
However, as proven by Childs and Kothari [12], it is impossible to perform a generic simulation of an arbitrary dense Hamiltonian H in C N ×N in time O(poly( H , log N )), where H is the spectral norm, but possible for certain nontrivial classes of Hamiltonians.It is then natural to ask under what conditions we can extend the sparse Hamiltonian simulation algorithm and the HHL algorithm to the realm of dense matrices.In this paper, we utilise the "unitary decomposition" approach developed by Berry, Childs and Kothari [9] to implement dense circulant Hamiltonians in time O(poly( H , log N )).Combining this with the HHL algorithm, we can also efficiently implement the inverse of dense circulant matrices and thus solve systems of circulant matrix linear equations.
Furthermore, we provide an efficient algorithm to implement circulant matrices C directly, by decomposing them into a linear combination of unitary matrices.We then apply the same technique to implement block circulant matrices, Toeplitz and Hankel matrices, which have significant applications in physics, mathematics and engineering [13][14][15][16][17][18][19][20][21][22][23].For example, we can simulate classical random walks on circulant, Toeplitz and Hankel graphs [24,25].In fact, any arbitrary matrix can be decomposed into a product of Toeplitz matrices [26].If the number of Toeplitz matrices required is in the order of O(poly(log N )), we can have an efficient quantum circuit.
This paper is organised as follows.In Sec. 2, we present an algorithm to implement circulant matrices, followed by discussions on block circulant matrices, Toeplitz and Hankel matrices in Sec. 3. In Sec. 4 and Sec. 5, we provide a method to simulate circulant Hamiltonians and to implement the inverse of circulant matrices.In Sec. 6, we describe a technique to efficiently implement products of circulant matrices.In the last section, we provide an example application in solving the equation of motion for vibrating systems with cyclic symmetry.

Implementation of Circulant Matrices
A circulant matrix has each row right-rotated by one element with respect to the previous row, defined as [27].In this paper we will assume c j to be non-negative for all j, which is often the case in practical applications.We also assume that the spectral norm (the largest eigenvalue) C = N −1 j=0 c j of the circulant matrix C equals to 1 for simplicity.
Note that C can be decomposed into a linear combination of efficiently realizable unitary matrices as follows, Such a linear combination of unitary matrices can be dealt with by the unitary decomposition approach introduced by Berry et al. [9].For completeness, we restate their method as Lemma 1 given below.Lemma 1.Let M = α j α j W j be a linear combination of unitaries W j with α j ≥ 0 for all j and j α j = 1.Let O α be any operator that satisfies O α |0 m = j √ α j |j , where m is the number of qubits used to represent |j , and select(W ) = j |j j| ⊗ W j .Then where Lemma 1 can be directly applied to implement the circulant matrix C, as shown in Fig. 1.Since select(V ) |j |k = |j |(k − j) mod N , it can be implemented using quantum adders [28][29][30][31][32][33], which requires O(log 2 N ) one-or two-qubit gates.We assume for simplicity that N = 2 L , where L is an integer.A measurement result of |0 L in the first register generates the required state C |ψ in the second register.The probability of this measurement outcome is O( C |ψ 2 ).With the help of amplitude amplification [34] this can be further improved, requiring only The amplitude amplification procedure also requires the same number of applications of O ψ , where O ψ |0 L = |ψ , and its inverse in order to reflect quantum states about the initial state |0 L |ψ .If O ψ is unknown, amplitude amplification is not applicable and we will need to repeat the measuring process in Fig. 1 O(1/ C |ψ 2 ) times, during which O(1/ C |ψ 2 ) copies of |ψ are required.It is worth noting that with the assumption c j ≥ 0, C is unitary if and only if C = V j .In other words, a non-trivial circulant matrix is non-unitary and therefore, the oblivious amplitude amplification procedure [35]  The complexity in Theorem 2 is inversely proportional to p = C |ψ 2 , which depends on the quantum state to be acted upon.Specifically, where F is the Fourier matrix with F kj = e 2πijk/N / √ N and Λ is a diagonal matrix of eigenvalues given by Λ k = N −1 j=0 c j e 2πijk/N .Since the spectral norm C of the circulant matrix C equals to one, we have p = ψ| F Λ † ΛF † |ψ ≥ 1/κ 2 , where κ is the condition number, defined as the ratio between C's largest and smallest (absolute value of) eigenvalues [11].Therefore, our algorithm is bound to perform well when κ = O(poly(log N )).In the ideal case where κ = 1 and p = 1, the vector c is a unit basis in which only one element equals to one and the others are zero.Even when κ is large, our algorithm is efficient when the input quantum state after Fourier Transform is in the subspace whose corresponding eigenvalues are large.To take an extreme but illustrative example, when 2 , normally close to one.

Block Circulant Matrices
Some block circulant matrices with special structures can also be implemented efficiently in a similar fashion.We assume the blocks are N -dimensional matrices and L = log N in the following discussions.
Firstly, when each block is a unitary operator up to a constant factor (i.e.C j = c j U j ), we have a unitary block (UB) matrix, If the set of blocks {U j } N −1 j=0 can be efficiently implemented, then by simply replacing select(V ) = , we can efficiently implement the block circulant matrices C U B using the same algorithm discussed in Sec. 2 as illustrated in Fig. 2(a).Specifically, when the set of blocks {U j } N −1 j=0 are one-dimensional, we can implement complex-valued circulant matrices with efficiently computable phase.For example, for U j = (e iθj ), j = 0, 1, . . ., N − 1, circulant matrices with the parameter vector c = (c 0 , e iθ c 1 , . . ., e i(N −1)θ c N −1 ) can be implemented efficiently.Moreover, if θ = π, c = (c 0 , −c 1 , . . ., (−1) N −1 c N −1 ) corresponding to the circulant matrix with negative elements are on odd-numbered sites is efficiently-implementable. Another important family is block circulant matrices with circulant blocks (CB), which has found a wide range of applications in algorithms, mathematics, etc. [20][21][22][23].It is defined as follows where C j is a circulant matrix specified by a N -dimensional vector . It can be decomposed as follows Given an oracle j=0 c jj |j |j , we can implement C CB using the quantum circuit shown in Fig. 2(b), which adopts a combination of two quantum subtractors.

Toeplitz and Hankel Matrices
A Toeplitz matrix is a matrix in which each descending diagonal from left to right is constant, which can be written explicitly as specified by 2N − 1 parameters.We focus on the situation where t j ≥ 0 for all j as in Sec. 2. Clearly, when t −(N −i) = t i for all i, T is a circulant matrix.Although a Toeplitz matrix is not circulant in general, any Toeplitz matrix T can be embedded in a circulant matrix [17,41], defined by where B T is another Toeplitz matrix defined by As a result, we use this embedding to implement Toeplitz matrices because Therefore, by implementing C T , we obtain a quantum state proportional to |0 T |ψ + |1 B T |ψ .Then we do a quantum measurement on the single qubit (in the second register in Fig. 3) to obtain the quantum state T |ψ .The success rate is T |ψ 2 according to Theorem 2 under the normalization condition that N −1 j=−(N −1) t j = N −1 j=0 c j = 1.With the help of amplitude amplification, only O(1/ T |ψ ) applications of the circuit in Fig. 3 are required.
A Hankel matrix is a square matrix in which each ascending skew-diagonal from left to right is constant, which can be written explicitly as specified by 2N − 1 non-negative parameters.A permutation matrix P = σ ⊗L x transforms a Hankel matrix into a Toeplitz matrix.It can be easily verified that T = HP and H = T P , in which t j = h j for all j.
Therefore by inserting the permutation P before the implementation of T , the circuit in Fig. 3 can be used to implement H, and the success rate is H |ψ 2 under the normalization condition that N −1 j=−(N −1) h j = N −1 j=0 c j = 1.With the help of amplitude amplification, only O(1/ H |ψ ) applications are required.
In comparison with existing algorithms, such as that described in [41], the above described quantum circuit provides a better way to realize circulant-like matrices, requiring fewer resources and with lower complexity.For example, only 2 log N qubits are required to implement N -dimensional Toeplitz matrices, which is a significant improvement over the algorithm presented in [41] via sparse Hamiltonian simulations.More importantly, this is an exact method and its complexity does not depend on an error term.It is also not limited to sparse circulant matrices C as in [41].Moreover, implementation of non-unitary matrices, such as circulant matrices, is not only of importance in quantum computing, but also a significant ingredient in quantum channel simulators [42,43], because the set of Kraus operators in the quantum channel ρ → i K i ρK † i is normally non-unitary [1].The simplicity of our circuit increases its feasibility in experimental realizations.

Circulant Hamiltonians
Hamiltonian simulation is expected to be one of the most important undertakings for quantum computation.It is therefore important to explore the possibility of efficient implementation of circulant Hamiltonians due to their extensive applications.Particularly, the implementation of e −iCt is equivalent to the implementation of continuous-time quantum walks on a weighted circulant graph [44,45].Moreover, simulation of Hamiltonians is also an important part in the HHL algorithm to solve linear systems of equations [11].
A number of algorithms have been shown to be able to efficiently simulate sparse Hamiltonians [4][5][6][7][8][9][10], including the unitary decomposition approach [9].We show that this approach can be extended to the simulation of dense circulant Hamiltonians.As we know, circulant matrices are diagonalizable and e −iCt = F e −iΛt F † .Hence, there is a direct method to implement e −iCt [25] when its diagonal elements {Λ k } N −1 k=0 are already known.However, this method is generally not extensible when {c j } N −1 j=0 are inputs.In this section, we will focus on the simulation of Hermitian circulant matrices, when e −iCt is unitary.For completeness, we first describe briefly the unitary decomposition approach and then discuss how it can be used to efficiently simulate dense circulant Hamiltonians.To simulate U = e −iCt , we divide the evolution time t into r segments with U r = e −iCt/r , which can be approximated as Ũ = K k=0 1/k!(−iCt/r) k with error .It can be proven that if we choose K = O log(r/ ) log log(r/ ) = O log(t/ ) log log(t/ ) , then U r − Ũ ≤ /r and the total error is within .
Since C = N −1 j=0 c j V j as given by Eq. 2, we have According to Lemma 1, let where |1 k 0 K−k is the unary encoding of k.Here s is the normalization coefficient and we choose r = t/ ln 2 so that Then we have where (|0 K+KL 0 K+KL | ⊗ I) |Ψ ⊥ = 0.It has been shown in Ref. [9] that after one step of oblivious amplitude amplification procedure [35], U r = e −iCt/r can be simulated within error The quantum circuit to implement one segment of circuilant Hamiltonians.Here /r.The oblivious amplitude amplification procedure avoids the repeated preparations of |ψ so that Ũ |ψ can be obtained using only one copy of |ψ .The total complexity depends on the number of gates required to implement select(W ) and O α .

Theorem 3 (Simulation of Circulant Hamiltonians).
There exists an algorithm performing e −iCt on an arbitrary quantum state |ψ within error , using O t log(t/ ) log log(t/ ) calls of controlled-O c 1 and its inverse, as well as O t(log N ) 2 log(t/ ) log log(t/ ) additional one-and twoqubit gates.
Proof.We first consider the number of gates used to implement O α in Eq. 13.It can be decomposed into two steps.The first step is to create the normalized version of the state Next we focus on the implementation of which performs the transformation 1 By controlled-O c , we mean the operation by applying K quantum subtractors between |j j=j 1 ,j 2 ,...,j k ,0,... and |ψ .K phase gates on each of the first K qubits multiply the amplitude by (−i) k .Therefore select(W ) can be decomposed into O(K log2 N ) numbers of one or two-qubit gates.In summary, O(K) calls of controlled-O c and its inverse as well as O(K log 2 N ) additional one-qubit gates are sufficient to implement one segment e −iCt/r .And the total complexity to implement r segments will be O(tK) calls of controlled-O c and its inverse as well as O(tK log 2 N ) additional one-qubit gates, where K = O log(t/ ) log log(t/ ) .
Note that we assumed the spectral norm C = 1.To explicitly put it in the complexity in Theorem 3, we can simply replace t by C t.

Inverse of Circulant Matrices
We now show that the HHL algorithm can be extended to solve systems of circulant matrix linear equations.
Proof.The procedure of the HHL algorithm works as follows [11]: 1. Apply the oracle O ψ to create the input quantum state |ψ : where {|u j } N −1 j=0 are the eigenvectors of C. 2. Run phase estimation of the unitary operator e i2πC : where Λ j are the eigenvalues of C and Λ j ≤ 1.

Perform a controlled-rotation on an ancillary qubit:
where κ is the condition number defined in Sec. 2 to make sure that 1/(κΛ j ) ≤ 1 for all j.
4. Undo the phase estimation and then measure the ancillary qubit.Conditioned on getting 1, we have an output state ∝ N −1 j=0 b j /Λ j |u j and the success rate p = N −1 j=0 |b j /κΛ j | 2 = Ω(1/κ 2 ).Error occurs in Step 2 in Hamiltonian simulation and phase estimation.The complexity scales sublogarithmically with the inverse of error in Hamiltonian simulation as in Theorem 3 and scales linearly with it in phase estimation [1].The dominant source of error is phase estimation.Following from the error analysis in Ref. [11], a precision O(κ/ ) in phase estimation results in a final error .Taking the success rate p = Ω(1/κ 2 ) into consideration, the total complexity would be Õ(κ 2 / ), with the help of amplitude amplification [34].

Products of Circulant Matrices
Products of circulant matrices are also circulant matrices, because a circulant matrix can be decomposed into a linear combination of {V j } N −1 j=0 that constitute a cyclic group of order N (we have (2) is the product of two circulant matrices C (1) and C (2) which have a parameter vector c (1,2) , where where c (1) and c (2) are each the parameters of C (1) and C (2) .Clearly, when the spectral norm of C (1) and C (2) are one, the spectral norm of C (1,2) is also one.Classically, to calculate the parameters c (1,2) would take up O(N ) space.However, in the quantum case, we will show that O c (1,2) , encoding c (1,2) , can be prepared using one O c (1) and one O c (2) .It means that the oracle for a product of circulant matrices can be efficiently prepared when its factor circulants are efficiently implementable, as illustrated in Fig. 5.
where |Φ j is a unit quantum state dependent on j, using one O c (1) , one O c (2) and O(log 2 N ) additional one-and two-qubit gates.
Proof.We need 2L ancillary qubits divided into two registers to construct the oracle for the product of two circulant matrices.We start by applying O c (1) and O c (2) on the last 2 registers, we obtain In order to encode c (1,2) j in the quantum amplitudes, we once again apply quantum adders to achieve our goals.By performing the following transformation where j ≡ (j 1 + j 2 ) mod N .This can be achieved using two quantum adders, we obtain the state because the amplitude of |j equals to This algorithm can be easily extended to implementing oracles for products of d circulants, in which d oracles of factor circulants and dL ancillary qubits are needed.Though the oracle described in Theorem 5 may not be useful in all quantum algorithms, due to the additional |Φ j in Eq. 19, it is applicable in Sec. 2 and Sec. 4 according to Lemma 6 (the generalized form of Lemma 1) described below.It implies that this technique could also be useful in other algorithms related to circulant matrices.Lemma 6.Let M = α j α j W j be a linear combination of unitaries W j with α j ≥ 0 for all j and j α j = 1.Let O α be any operator that satisfies O α |0 m = j √ α j |j |Φ j (m is the number of qubits used to represent |j |Φ j ) and select(W ) = j |j j| ⊗ I ⊗ W j .Then where

Application: Solving Cyclic Systems
Vibration analysis of mechanical structures with cyclic symmetry has been a subject of considerable studies in acoustics and mechanical engineering [16,19].Here we provide an example where the above proposed quantum scheme can outperform classical algorithms in solving the equation of motion for vibrating and rotating systems with certain cyclic symmetry.
The equation of motion for a cyclically symmetric system consisting of N identical sectors, as shown in Fig. 6, can be written as where q and f are  Assume all sectors have the same mass (M ∝ I) and there is zero damping (D = 0).If the system is under the so-called traveling wave engine order excitation, the equation of motion can be simplified as [16]: where the traveling wave is characterised by f j = f e i2πnj/N for the external force vector f , n is the order of excitation, and Ω is the angular frequency of the excitation.We search for solutions of the form q = q 0 e inΩt , which leads to Since K − nΩI is a circulant matrix, we can use Theorem 4 to calculate It is important to consider the conditions under which Theorem 4 works: 1. K − nΩI is Hermitian.This is generally for symmetric cyclic systems, where the coupling between q j and q j+d and the coupling between q j and q j−d are physically the same for any sector j and distance d.
2. K − nΩI has non-negative (or non-positive) entries.Although this is not in general true, Theorem 4 will work under a slight modification.We observe that the offdiagonal elements of K − nΩI are always negative because the coupling force between two connecting sectors is always in the opposite direction to their relative motion.
• If the diagonal elements of K − nΩI are also negative, then no modification to the proposed procedure is necessary.
• If the diagonal elements of K − nΩI are positive, we replace V 0 with −V 0 in Eq. 2, while keeping V j (j = 0) unchanged.Then −(K − nΩI) = −c 0 V 0 + N −1 j=1 c j V j would be a matrix whose off-diagonal elements are positive and diagonal elements are negative.It means that in the quantum circuits, we need to replace select(V ) = 3.The condition number κ of K − nΩI is small.This is true when the couplings among sectors are relatively weak -when |K 0 − nΩ| K 1 where K 0 characterises the coupling between a sector and the exterior and K 1 characterizes the coupling among sectors.
If all three conditions are satisfied, we have an exponential speed-up compared to classical computation.Note that the output q 0 is stored in quantum amplitudes, which cannot be read out directly.However, further computation steps can efficiently provide practically useful information about the system from the vector q 0 , for example the expectation value q 0 † M q 0 for some linear operator M or the similarity between two cyclic systems q 0 |q 0 [11].It is also worth noting that the proposed algorithm, in contrast to previous quantum algorithms [5-9, 11, 41], works for dense matrices K − nΩI.It means that the cyclic systems need not be subject to nearest-neighbour coupling.

Conclusion
In this paper, we present efficient quantum algorithms for implementing circulant (as well as Toeplitz and Hankel) matrices and block circulant matrices with special structures, which are not necessarily sparse or unitary.These matrices have practically significant applications in physics, mathematics and engineering related field.The proposed algorithms provide exponential speed-up over classical algorithms, requiring fewer resources (2 log N qubits) and having lower complexity (O(log 2 N/ C |ψ )) in comparison with existing quantum algorithms.Consequently, they perform better in quantum computing and are more feasible to experimental realisation with current technology.
Besides the implementation of circulant matrices, we discover that we can perform the HHL algorithm on circulant matrices to implement the inverse of circulant matrices, by adopting the Taylor series approach to efficiently simulate circulant Hamiltonians.Due to the special structure of circulant matrices, we prove that they are one of the types of the dense matrices that can be efficiently simulated.Being able to implement the inverse of circulant matrices opens a door to solving a variety of real-world problems, for example, solving cyclic systems in vibration analysis.Finally, we show that it is possible to construct oracles for products of circulant matrices using the oracles for their factor circulants, a technique that is useful in related algorithms.

Figure 1 :
Figure 1: Quantum circuit to implement a circulant matrix.

Figure 2 :
Figure 2: The quantum circuit to implement block circulant matrices with special structures.

Figure 3 :
Figure 3: The quantum circuit to implement a Toeplitz matrix.In this figure, O c |0 L+1 = which takes O(K) consecutive onequbit rotations on each qubit.We then apply K sets of controlled-O c to transform |0 L into N −1 j=0 √ c j |j when the control qubit is |1 .We therefore need O(K) calls of controlled-O c and O(K) additional one-qubit gates to implement O α .

Figure 6 :
Figure6: Topology diagram of an N -sector cyclic system.(a) a general cyclic system with coupling between any two sectors which can be solved using Theorem 4. (b) a cyclic with nearest-neighbour coupling which can be solved using the HHL algorithm[11].
Theorem 2 (Implementation of Circulant Matrices).There exists an algorithm creating the quantum state C |ψ for an arbitrary quantum state |ψ = N −1 k=0 ψ k |k , using O(1/ C |ψ ) calls of O c , O ψ and their inverses, as well as O(log 2 N/ C |ψ ) additional one-or two-qubit gates.