Generalized Quantum Signal Processing

Quantum Signal Processing (QSP) and Quantum Singular Value Transformation (QSVT) currently stand as the most efficient techniques for implementing functions of block encoded matrices, a central task that lies at the heart of most prominent quantum algorithms. However, current QSP approaches face several challenges, such as the restrictions imposed on the family of achievable polynomials and the difficulty of calculating the required phase angles for specific transformations. In this paper, we present a Generalized Quantum Signal Processing (GQSP) approach, employing general SU(2) rotations as our signal processing operators, rather than relying solely on rotations in a single basis. Our approach lifts all practical restrictions on the family of achievable transformations, with the sole remaining condition being that $|P|\leq 1$, a restriction necessary due to the unitary nature of quantum computation. Furthermore, GQSP provides a straightforward recursive formula for determining the rotation angles needed to construct the polynomials in cases where $P$ and $Q$ are known. In cases where only $P$ is known, we provide an efficient optimization algorithm capable of identifying in under a minute of GPU time, a corresponding $Q$ for polynomials of degree on the order of $10^7$. We further illustrate GQSP simplifies QSP-based strategies for Hamiltonian simulation, offer an optimal solution to the $\epsilon$-approximate fractional query problem that requires $O(\frac{1}{\delta} + \log(\large\frac{1}{\epsilon}))$ queries to perform where $O(1/\delta)$ is a proved lower bound, and introduces novel approaches for implementing bosonic operators. Moreover, we propose a novel framework for the implementation of normal matrices, demonstrating its applicability through the development of a new convolution algorithm that runs in $O(d \log{N} + \log^2N)$ 1 and 2-qubit gates for a filter of lengths $d$.


I. INTRODUCTION
In recent years, significant advancements in quantum algorithm theory have revealed a powerful, overarching insight: the majority of prominent quantum algorithms, such as Hamiltonian simulation [1][2][3][4], quantum search [5][6][7], factoring [8], and quantum walks [7,9], can be fundamentally reduced to the central task of implementing a matrix function of a Hamiltonian, f (H), this was shown in [10,11].Over the years, numerous techniques have been developed for constructing such functions, including phase estimation methods such as the HHL algorithm [12], linear combination of unitaries (LCU) [4], and quantum signal processing (QSP) [2,[13][14][15][16][17].Among these, QSP stands out as the most versatile approach to date, capable of approximating a wide range of matrix functions through eigenvalue or singular value transformations of H, while requiring a minimal number of ancilla qubits.The basic idea of QSP is to build a polynomial approximation of the desired function by assuming oracular access to a unitary U which encodes H. QSP has been demonstrated to yield asymptotically optimal Hamiltonian simulation algorithms [2].QSP's applicability was then further extended to the case of non-square matrices by the QSVT technique [11].Together, QSP and QSVT have provided an abstract formalism that facilitates the efficient implementation of a wide range of linear algebraic operations and transformations, creating a new language with greater expressivity for the construction of quantum algorithms.
Despite its success, QSP still suffers from a number of severe limitations.Most notable of them being the restrictions it places on the family of polynomials that can be built using it.For instance, in order to build an arbitrary polynomial using the original QSP one has to split the polynomial into four parts by first breaking the polynomial into its real and imaginary parts, and then further breaking down each of those into their respective even and odd components.One would then combine all parts together using linear combination of unitaries (LCU) [4], and furthermore perform a variant of amplitude amplification [18] on the resulting circuit.This requires that we triple the number of operations needed to perform Hamiltonian simulation relative to what we would need if we could implement Hamiltonian dynamics directly.Lastly, while the original QSP guarantees the existence of a set of parameters leading to polynomials satisfying the specified restrictions, finding those parameters has proved to be a complicated (if computationally efficient) task [14,19].In this paper, we present an algorithm, which we term Generalized Quantum Signal Processing (GQSP), that overcomes all the above limitations leading to a more general framework for the central task of implementing functions of Hamiltonians.
Our GQSP algorithm provides us with the ability to build polynomials of unitary matrices using a single ancilla qubit with the only restriction being that its norm is not greater than 1 on the complex unit circle as demonstrated in Corollary 5. Since GQSP does not restrict us to a fixed parity for the polynomial, it gives us the ability to approximate functions of a Hamiltonian H without the need for LCU that appears in some applications of QSP [11].Furthermore, in Section IV we demonstrate that computing the rotation angles required to build a given polynomial is much more efficient and conceptually simpler within this framework.This extends the scope of QSP along two different axes.We then utilize this technique in Section V to develop a conceptually simplified formulation of qubitization-based Hamiltonian simulation, and give a provably optimal algorithm for the fractional query problem as special cases of what we call Phase Functions.In Section VI we introduce a powerful idea that extends the concept of Fourier decomposition to normal matrices using polynomials of a special unitary matrix.More precisely, we will demonstrate that any normal matrix can be written as a polynomial of the root of unity unitary in its eigenbasis.We then explore the utility of this idea by giving a general framework for synthesizing diagonal and convolution matrices.But first, we give a review of the original quantum signal processing algorithm in Section II which sets the stage for our method introduced in Section III.

II. REVIEW OF QUANTUM SIGNAL PROCESSING
The quantum signal processing algorithm constructs a function of H by interleaving applications of a signal operator with signal processing operators.The signal operator encodes H using controlled applications of U = e iH , where an ancillary qubit acts as the control.The signal processing operations, on the other hand, consist of single-qubit operations performed on the ancillary qubit.There are two conventions that are commonly used for the formulation of quantum signal processing.One where the signal operator is expressed in the σ x basis and the signal processing is done in the σ z basis, and another where the signal operator is expressed in the σ z basis and the signal processing is done in the σ x basis.The signal operator in the σ z basis is defined as: We can express the signal operator in the σ x basis by applying a Hadamard gate to the both sides of the ancilla qubit: This shows that, at a high level, both of these formalisms are conceptually identical although there are distinctions between the forms of the polynomials that can be constructed in both bases.The results of QSP are often stated in the A σx formalism as: (II) P has parity d mod 2 and Q has parity (d -1) mod 2.
In practice, one often only cares about building a particular polynomial P , so the question becomes, for what P s there exists a Q such that they satisfy the conditions of Theorem 1?It turns out the restrictions above can be quite limiting.For example, the 3rd condition requires |P (±1)| = 1.This limitation can be overcome by applying Hadamards on the ancilla before and after the QSP sequence, effectively converting into the A σz formalism.In this case, it can be shown that we can implement any real polynomial with parity d mod 2 such that deg(P ) ≤ d, and Quantum circuit for the original QSP approach where the angles ϕ are chosen to enact a polynomial transformation of U .
Theorem 2 (Quantum Signal Processing in σ z Basis).∀P ∈ R[x], with deg(P ⇐⇒ Here T = {x ∈ C : |x| = 1} is again the complex unit circle. The above formalism indeed mitigates some previously discussed limitations.However, the requirement that the polynomial P must be real and possess definite parity necessitates the combination of four separate instances of QSP using Linear Combination of Unitaries (LCU) to construct an arbitrary complex polynomial with indefinite parity.In the following section, we address these constraints with a novel approach.We propose a more generalized formulation of Quantum Signal Processing by allowing our signal processing operators to be arbitrary SU(2) rotations, effectively lifting the "realness" condition on the polynomials.Furthermore, we eliminate the parity restriction by simplifying our signal operator.Combined together, our reformulation successfully eliminates all practical restrictions on P , which we will elaborate on and demonstrate in the next section.

III. GENERALIZED QUANTUM SIGNAL PROCESSING
Our generalized quantum signal processing approach seeks to remove the limitations inherent in traditional QSP, thereby providing a more powerful mechanism for constructing functions of Hamiltonians.Our approach generalizes QSP by using SU(2) rotations rather than X-rotations on the control ancilla similar to [20,21]; however, our approach eschews the need to use U † and avoids the use of Laurent polynomials in the analysis.Specifically, we use the convention that the signal operator is a 0-controlled application of U = e iH : Next, instead of performing rotations in a single basis, we allow for signal processing operations to be arbitrary SU(2) rotations of the ancillary qubit: R(θ, ϕ, λ) = e i(λ+ϕ) cos(θ) e iϕ sin(θ) Then by interleaving the above 2 operations as demonstrated in Figure 2, we can block encode polynomial transformations of the unitary matrix U without further assumptions.Note that below we have no assumptions made on the parity of the polynomial transformations unlike those made by the Theorem 1. Lastly, notice that we only need to specify a single λ since we can absorb the other instances into the instance of ϕ before them.
Quantum circuit for GQSP wherein the angles θ and ϕ are chosen to enact a polynomial transformation of U .Here R(θ, ϕ, λ) is a general single qubit rotation as specified in (6). ⇐⇒ Here T = {x ∈ C : |x| = 1} is again the complex unit circle.
Proof.Both directions are proven by induction on d.To show the forward direction, we need to show inductively that for all {θ j } and {ϕ j } and d both of the claimed restrictions are met.The base case of d = 0 is trivial since: R(θ 0 , ϕ 0 , λ) = e i(λ+ϕ) cos(θ)I e iϕ sin(θ)I e iλ sin(θ)I − cos(θ)I Next let d ∈ N >0 and assume the induction hypothesis holds for d − 1 producing P (U ) and Q(U ).Then: Thus: Since P and Q have degree ≤ d − 1, then P and Q must have degree ≤ d.Furthermore, given the fact that all operations are unitary, property (2) trivially holds.This completes the proof for the forward direction.
The proof for the reverse direction involves showing by induction that any pair of polynomials satisfying the conditions of Theorem 3 can be constructed through an appropriate choice of rotation angles for the single qubit unitaries in Fig. 2. The base case of d = 0 is again trivial since P and Q would both be constants whose norm squared adds to 1. Any such pair can be written as P = e i(λ+ϕ) cos(θ), Q = e iϕ sin(θ) and implemented by R(θ 0 , ϕ 0 , λ).Next let d ∈ N >0 , assume P and Q satisfy conditions (1) and (2) from Theorem 3, and the induction hypothesis holds for d − 1.Specifically, we assume from our induction hypothesis that for any P and Q of degree at most d − 1 satisfying: Thus, it suffices to find θ d and ϕ d such that P and Q are of degree at most d − 1 satisfying | P | 2 + | Q| 2 = 1 on T, where P and Q are given by: Then: First notice that regardless of our choice of θ d and ϕ d , we have: = ∥P (e it )∥ 2 + ∥Q(e it )∥ 2 = 1 Hence, the norm condition is again satisfied trivially by the unitary nature of our operations.Thus, it remains to show P and Q are of degree at most d − 1.Let a i and b i be the coefficients of the i-degree term in P and Q respectively.By our assumption, we have that the target polynomials P and Q satisfy the norm condition, or equivalently for any real-valued t: Next, we need to argue about the structure of the coefficients in this expansion.To begin we note that the inner product between the two parts from P satisfies: Repeating the same expansion for the b coefficients leads to the observation that for all t: We then note that the t-dependent sum must generically be zero because of the fact that the exponentials form an orthogonal function basis as seen by the Fourier transform as noted by the fact that where δ is the Dirac-delta function.Thus because the functions form an orthogonal function space that does not contain the constant function, we cannot form the constant function out of a linear combination of non-constant exponentials, and the only way that we can satisfy ( 23) for all real-valued t is if: Let s, s ′ ≥ 0 be the smallest degrees present in P and Q respectively, then notice that (26) implies deg(P ) which also implies s = s ′ .Hence using (26) we get: Then by ( 27): Then it is easy to see after substituting our choices of θ d and ϕ d into ( 15) and ( 16) the s degree terms of P and Q in P cancel (no negative degree terms are introduced) and d degree terms of P and Q in Q also cancel.Thus, Theorem 3 specifies the necessary and sufficient conditions for constructible pairs of polynomials P and Q using our method.However, in practice, one often only cares about building a particular polynomial P , so the question becomes: "For what P s there exists a Q such that they satisfy the conditions of theorem 3?" In the following theorem we show that, as long as |P | 2 ≤ 1 on T, there exists such a Q.
Before proving Theorem 4, notice that this leads to the following corollary, which states that we can build any arbitrary (appropriately scaled) polynomial P of U using our technique.
Corollary 5 is one of the most significant results in this paper, as it clearly establishes the superior expressivity of our method over traditional QSP when contrasted with 2.
Proof of Theorem 4.

=⇒: Let
on T, and define: Then T is a non negative trigonometric polynomial with terms from −deg(P ) to deg(P ) that satisfies T ≤ 1. Define H = 1 − T as another non negative trigonometric polynomial of degree deg(P ).It remains to show there exists Q a polynomial of degree deg(P ) such that Let d = deg(P ), then we can re-write H as: For some polynomial R of degree 2d.Since H = H * , we have for all |z| = 1: Using the identity theorem for analytic functions, this equality must hold for all z ∈ C − {0}, making R a self-inversive polynomial.Therefore, roots of R are either on the unit circle, or grouped in pairs (w i , 1 Notice (e iθ − w i )(e iθ − 1 , so by grouping the roots (w i , 1 ) together, we get: Where m is the number of such pairs.We can then write: where G is a polynomial of degree m and Ĥ is a non negative trigonometric polynomial with roots only on the unit circle, that maps the unit circle into [0, ∞).Just like H, we can re-write Ĥ as: For some polynomial R of degree 2d − 2m.Extending Ĥ to an analytic function on a small annulus including the unit circle, we can use the local form of the analytic function near the zeros.This implies a curve passing through a zero of R is sent either to a curve through 0, or a line segment ending at 0 around a small neighborhood of the zero depending on the parity of the zero.However, since Ĥ has all roots on the unit circle, then the unit circle itself passes through all zeros and is mapped into [0, ∞), which implies all zeros have even multiplicity.Hence, we can re-write R as:

R = Ĝ2
(39) For some polynomial Ĝ of degree d − m.Then on the complex unit circle, we have: Thus: Then letting Q = G Ĝ we have: and: which proves the forward direction of the proof.⇐=: This is trivial since |Q(e iθ )| 2 ≥ 0.
So far we have only discussed the construction of polynomials of positive degree, however, the framework presented here can also be used to implement polynomials with any ratio of positive and negative degrees by making use of a secondary signal operator which we'll define as: Particularly, if the set of rotation angles ⃗ θ, ⃗ ϕ, and λ lead to the polynomials P (x) = If and only if: For P ′ (e ix ) = e −ikx P (e ix ) and Proof of Theorem 6.
First notice: Hence we have Since I ⊗ U † commutes with both A and R(θ, ϕ, 0) we then have: In this section, we gave a detailed description of our generalized QSP technique and proved its first advantage over traditional QSP.Namely, we showed that our method overcomes the restrictions placed on the family of polynomials that can be built using QSP, which eliminates the need for LCU and oblivious amplitude amplification (OAA).In the next section, we will demonstrate the second advantage of our formalism over traditional QSP, which is, the computation of the rotation angles required to build a given polynomial is much more efficient and conceptually simpler within our framework.
Algorithm1 Pseudocode for the computation of the phase factors in GQSP.

IV. CALCULATING ROTATION ANGLES
In this section, we first provide a simple method to calculate the parameters of our algorithm given desired polynomials P and Q.This is a significant advance because existing approaches for computing the rotation angles in QSP are quite involved.This approach on the other hand is elementary and as a result, allows the approach in this paper to be more easily understood by students and more easily deployed in practice.We then provide an extremely efficient optimization algorithm for finding a corresponding Q given a desired P since in practice one is often only interested in implementing a particular polynomial P .We also provide astonishing numerical results that showcase the efficiency of our optimization method.Notably, we're able to find Q for randomly chosen polynomials P of up to 2 24 ∼ 16.8 million degrees in less than 40 seconds on an A100 GPU.This is a remarkable advancement given that previous methods have only been able to find rotation angles for polynomials of up ∼ 10 4 degrees [14].As we can see, this algorithm is very simple and easy to understand, however, the algorithm assumes P and Q satisfy the requirements of theorem 3, which is not a trivial task.
In theorem 4 we showed that as long as ∀x ∈ T, |P (x)| 2 ≤ 1, there exists a Q such that together with P they satisfy the conditions of 3. One way to find such a Q is through root finding by following the idea laid out in the proof of 4.However, root finding algorithms can be computationally expensive for polynomials of large degree, so here we lay out a simple and cheap optimization method for finding Q.Let P (e it ) = d n=0 a n e int , and Q(e it ) = d n=0 b n e int .In other words, P and Q are the discrete-time Fourier transformations of the coefficients {a n } n and {b n } n .To be more rigorous P and Q are the Fourier transforms of impulse trains: Where a n = b n = 0 for n > d or n < 0. The conditions of 3 also require: Then taking the inverse Fourier transform of both sides of the above equation, and applying the convolution theorem we get: Where δ is the unit impulse, and ⋆ is the convolution operator.The above implies that if we're given P and Q as arrays of coefficients the following must hold: Here reverse(•) refers to the reversal of an array wherein the last element is first and vice versa.Therefore, we can set up our optimization problem as: The above objective function can be evaluated in time O(d log d) using FFT-based convolution algorithms.This almost linear runtime of the objective function makes the optimization extremely efficient as demonstrated in 3. Notably, it lets us find the coefficients of Q for random polynomials of up to degree 2 24 ∼ 1.68 × 10 7 in less than 40 seconds on an A100 GPU.This is a significant improvement over the degree 10,000 polynomials that can be achieved using existing QSP approaches via the state-of-the-art techniques of [14].For completeness, we have provided our code1 for the above optimization algorithm used to generate 3.

V. PHASE FUNCTIONS:
A major application of quantum signal processing involves constructing phase functions of input unitaries.Specifically, these approaches give a way to implement a transformation of an input phase via: This sort of transformation is the root of the exponential improvements in accuracy found for the quantum linear systems algorithm and also is the central idea behind fixed point amplitude amplification as well as recent QSP based approaches to error mitigation.The following theorem will provide us with the building blocks to achieve such transformations in the phase.This idea is widely used within simulation and other areas [2,5,11]; however our approach provides a substantial simplification in cases where the fourier series used (equivalently a Laurent polynomial in the phase [10]) is of mixed parity.We provide a host of examples of this reasoning below and show how our generalization of phase estimation can be used to simplify the solution to the following problems.

A. Application to Simulation
The first and most obvious application of our framework is Hamiltonian simulation with the same query complexity as that of qubitization [3].As a first step, we need to note that the function that we wish to implement is the exponential of a cosine.The complexity of this is given below Theorem 7. Given a unitary matrix U = e iH , we can implement an ϵ-approximation of e it sin H and e it cos H for t ∈ R using O(t + log( 1 ϵ )/ log log( 1 ϵ )) controlled-U and 2-qubit operations while using a single ancillary qubit.Proof.This can be easily implemented through the use of the Jacobi-Anger expansions: Where J n (x) is the n th Bessel function of the first kind.The coefficients of the infinite sum are known to decrease exponentially fast, specifically Thus for any fixed t ≥ 0 we can choose n ′ ≥ t in which case the function is exponentially decreasing for all n ≥ n ′ .Then given that we choose n ′ to lead to exponential decay we can find for any ϵ > 0 that there exists n ′ ∈ O(t + log(1/ϵ)/ log log(1/ϵ)) such that for all n ≥ n ′ J n (t) ≤ ϵ.Allowing us to build e it sin H and e it cos H with accuracy ϵ with a polynomial of only O(t + log( 1 ϵ )/ log log( 1 ϵ )) degree.Corollary 8. Let H = j α j U j be a Hermitian matrix where α j ≥ 0 and U j ∈ U (2 n ).Further, let PREPARE : |0⟩ → j √ α j |j⟩/α and SELECT|j⟩|ψ⟩ = |j⟩U j |ψ⟩ be unitary matrices in U (2 m ) and U (2 n+m ) and finally let W = −(1 − 2PREPARE|0⟩⟨0|PREPARE † )SELECT.Under these assumptions we can, for any t > 0 and ϵ > 0, construct a unitary matrix V ∈ U (2 n+m+1 ) that block-encodes e −iHt within error ϵ using O(αt + log(1/ϵ)/ log log(1/ϵ)) applications of W .
Proof.Existing work [2,11] has shown that for each eigenvector |λ j ⟩ of H such that H|λ j ⟩ = λ j |λ j ⟩ there exists two eigenvectors of W that can be conveniently expressed within the span of PREPARE|0⟩|λ j ⟩ and W PREPARE|0⟩|λ j ⟩ such that we define the orthogonal component to PREPARE|0⟩|λ j ⟩ to be |ϕ j ⟩ and also define the action of W restricted to this two dimensional subspace is W λj (taking the former vector to correspond to |0⟩ and the latter to |1⟩) Next note that for this unitary we have that the eigenvalues of this operation are e ±i arccos(λj /α) and let us denote the eigenvectors of this to be |ϕ ± j ⟩.Then setting A = |0⟩⟨0| ⊗ W + |1⟩⟨1| ⊗ I ⊗n we can apply Corollary 5 to perform for any degree d polynomial P such that max ∥P (U )∥ ≤ 1 In our case we wish to choose P (U ) = e −iαtH .We can then use the result of Theorem 7 to find an expansion within error ϵ using O(αt + log(1/ϵ)/ log log(1/ϵ)) queries to the oracle W .We can then see that by making this transformation we can map W λj → e −iαt cos(arccos(λj /α)) 0 0 e −iαt cos(− arccos(λj /α)) = e −iλj t 0 0 e −iλj t := F λj . (66) Now let us consider the input vector PREPARE|0⟩|λ j ⟩ = a|ϕ + j ⟩ + b|ϕ − j ⟩.Then we have that the transformed block encoding obeys which is logically equivalent under local isometries to the evolution of the system.As this operation works for any two-dimensional matrix W λj it is straightforward to note that the same procedure must also work by linearity on j W λj as well.Specifically, let F = ⊕ j F λj be the operation that we get when applying the above transformation to the entire Hilbert space of H and the ancillary qubit.Let |ψ⟩ = j a j |λ j ⟩ then In order to estimate the cost of this procedure, let us consider the number of queries made to PREPARE and SELECT.
An invocation of the W requires three queries.From Theorem 7 we can perform the singular value transformation with a number of applications of W that is in O(αt + log(1/ϵ)/ log log(1/ϵ)).The claimed result then follows by multiplication of this cost by the O(1) cost of implementing W .
Next we use the fact that this polynomial can be constructed using Theorem 3 we see that a degree O(log( 1 ϵ )) polynomial can be implemented using O(log(1/ϵ)) controlled unitary operations and as for our circuits there are O(1) two-qubit gate applied in the circuit of Theorem 3 per controlled unitary and thus the result holds.The above algorithm is known to be optimal as improvements upon this scaling would lead to an algorithm that could compute the parity function using fewer than O(N ) queries to the underlying quantum bits [1].Importantly, however, our approach can be used to simplify the logic of Hamiltonian simulation by removing the need to combine polynomials of different parity.
Next, we will use e it sin H and e it cos H as our building blocks to build other phase functions.To this aim, we invoke one of the main technical results of [ [22], Lemma 37] about approximating smooth functions by Fourier series.Theorem 9. Let δ, ϵ ∈ (0, 1) and f : R where M = max(2⌈log( 4∥a∥1 ϵ ) 1 δ ⌉, 0) A notable property of the prior result is that the bounds on the Fourier series do not depend on the degrees of the polynomials terms.This can however be expected since the terms that have large degree make negligible contributions due to the restricted domain x ∈ [−1 + δ, 1 − δ], and therefore we can drop them without loss of accuracy.Note that these results have been traditionally used in conjunction with LCU methods to implement arbitrary functions of a generator [11].However, here the main difference between the LCU results and ours is that our approach does not require scaling by the inverse 1-norm of the coefficients ⃗ c or the additional poly-logarithmic memory.The 1-norm improvement of our algorithm is because LCU is a more general algorithm that builds a linear combination of arbitrary unitaries and not just the powers of the same unitary, because of this LCU puts more restrictions on the probability of success.As a straightforward example of this approach, let us consider the case of fractional queries to a unitary matrix.

B. Application to Fractional Queries
Fractional queries are a generalization of the following problem listed by Scott Aaronson as one of "The ten most annoying questions in quantum computing": given a unitary U , can we implement √ U = e i H 2 ?[23] Or more generally U t = e itH for t ∈ (0, 1)?Sheridan et al. [24] first gave an algorithm to implement the fractional power of a unitary matrix that runs in O(max 1 δ , 1 ϵ log( 1 ϵ )) which was then improved upon exponentially in terms of the error dependence by [11] to run in O( 1δ log( 1 ϵ )).Sheridan et al. [24] also proved a lower bound on this problem, which shows that the δ dependence of this algorithm is actually optimal.We provide an improvement of this algorithm below which leads to an additive logarithmic factor in 1/ϵ rather than a multiplicative logarithmic factor.Hence, our algorithm here not only improves upon previous techniques, but it is in fact provably optimal up to double logarithmic factors.
Theorem 10.Let U = e iH be a unitary operator, let t, ϵ ∈ (0, 1), and assume σ min (H) ≥ δ where σ min (H) is the minimum singular value of H.We can then implement an ϵ-approximation of U t = e itH with O 1 δ + log( 1 ϵ ) uses of controlled-U and 2-qubit gates using O(log(1/δ)) ancilla qubits.Further, no algorithm is possible that solves the algorithm using o( 1 δ ).Proof.First notice that for x ∈ [0, 2π]: We hope to use 9 to build e it arccos(y) , with y = cos(x) and make the appropriate phase shifts in the second half of the domain to build e itx .However, notice that y = cos(x) ∈ [−1, 1] which blows up the number of terms in 9.
So the idea here is to write a more fine-grained piecewise representation of e itx such that y , hence getting rid of the δ dependence in 9, so the approximation part of the algorithm runs in O(log(1/ϵ)).We would still need to distinguish the states at the branch-cut which depends on δ, however, this is a separate part of the algorithm and that is why we end up with a O 1 δ + log( 1 ϵ ) runtime instead of the previous best algorithm which achieves a O 1 δ log( 1 ϵ ) .We can write this more fine-grained piecewise representation of e itx as: In order to implement an ϵ-approximation of U t = e itH , we first perform a δ-precise phase estimation, and use the first 3 qubits storing the phase estimation results to figure out which 8 th of the unit circle we are in, and apply the corresponding transformation based on (71).Since by assumption the smallest eigenvalue of H in terms of norm is ≥ δ, no eigenvector will be miss categorized at the branch cut.Furthermore, it doesn't matter if we miss categorize the eigenvalues at the other boundary points since the piecewise function in ( 71) is continuous at all other boundaries, and thus applying the neighboring piece will still result in a correct transformation.Then since a δ-precise phase estimation has query complexity O 1 δ , it remains to perform an ϵ-approximations of e it arcsin(sin(x)) and e it arccos(cos(x)) using O log( 1 ϵ ) controlled-U and 2-qubit gates.
First let us look at the Taylor series of e it arcsin(x) .One can see that the 1-norm of the coefficients of the Taylor series of t arcsin(x) is |t| arcsin(1) = |t| π 2 .Therefore, for t ∈ [ −2 π , 2 π ] we get that the 1-norm of the Taylor series of e it arcsin(x) is ≤ e 1 = e.Also notice that ∥ sin(x)∥ and ∥ cos(x)∥ are both ≤ 1 √ 2 on the domains where they're being used in (71), hence eliminating the dependence on δ for theorem 9. Therefore, for t ∈ [ −2 π , 2 π ], we can write: For M ∈ O log( 1 ϵ ) .Hence it suffices to ϵ M -approximate each term e iπm 2 sin(x) , which by 7 requires a polynomial of degree at most Notice that the sum of all the polynomials can be written as a single polynomial, hence at the end we only require a single polynomial of degree O(log( 1 ϵ )).Furthermore, arccos(x) = π 2 − arcsin(x), thus building e it arccos(cos(x)) will have the same runtime as e it arcsin(sin(x)) .
Then since For t ∈ [−1, 1], U t can then be implemented by applying U t 2 twice, U t = e itH can be implemented with complexity O 1 δ + log( 1 ϵ ) for all t ∈ [−1, 1].To show optimality, it is known from [24] that o(1/δ) scaling is impossible for fractional queries.This shows that the only remaining question is whether the ϵ scaling is optimal.
C. Application to Square-Roots / Bosonic Simulation Next let us consider building a series expansion for the square root Phase Function of a unitary operator.Without loss of generality, every unitary matrix can be expressed as e iH for Hermitian H and our aim is to examine as an example a method for constructing e it √ H .The reasons why we would want to do this are many, but the simplest example involves implementing the exponential of a bosonic operator of the form e i(a † +a)t where a = j≥0 √ j|j +1⟩⟨j|.Conventionally this operator is implemented using an arithmetic circuit [25][26][27], which although polylogarithmic in depth [28], involves substantial constant factors and requires many ancillary qubits to carry out the reversible circuitry needed for the logic.can be built using a polynomial of degree log( M ϵ ) of U and an initial δ-precise phase estimation.Thus, we can build the above approximation using a polynomial of degree O(log( M ϵ ) + M ) and an initial δ-precise phase estimation.Since We can implement e it The integral is a constant (∼ 3.7), thus: Our result then follows by combining (87) and (76) In this section, we explored applications of our Generalized Quantum Signal Processing framework within the context of Phase Functions to implement unitary operators.We gave a conceptually simplified formulation of the Hamiltonian simulation algorithm within the qubitization formalism with a factor 2 reduction in the number of queries to the walk operator in cases where independent queries to U and U † are needed (although in many simulation examples both can be implemented using a single query [30]).We further proposed an enhanced and provably optimal algorithm for implementing fractional queries of unitary operators.Lastly, we gave a novel approach for implementing square root Phase Functions of unitary matrices with applications in the implementation of bosonic operators.In the next section, we apply our method to implementing non-unitary operations by extending the concept of Fourier Decomposition to normal matrices.
Proof.Utilizing Theorem 13 we first build the polynomial in the computational basis: using O(d log N ) 1 and 2-qubit gates, and then apply the change of basis matrices to get: Then since implementing Q requires O(χ) 1 and 2-qubit gates, our total cost will be O(d log N + χ).
We now give an example of this in the Fourier basis (i.e.Q = QF T ) to build convolution operators.

B. Synthesizing Convolution Matrices
In this subsection, we explore the synthesis of convolution matrices, which play a critical role in signal processing, image filtering, and numerous other applications.Recent work has considered the application of linear combinations of unitaries to implement circulant matrices [32] but QSP based methods have not yet been fully explored.Our approach to synthesizing circulant matrices is based on GQSP and uses, in particular, the relationship between these matrices and circular convolutions.Specifically, a circulant matrix can be completely defined by a single vector, ⃗ c.This vector forms the first column of the matrix, and the remaining columns are each cyclic permutations of ⃗ c, each with an offset equal to the respective column index.A circulant matrix is shown below: For a circulant matrix C, given by the vector |c⟩ = N −1 j=0 c j |j⟩, and another vector |ψ⟩ = N −1 j=0 x j |j⟩, we can perform a convolution of |ψ⟩ by |c⟩ simply by multiplying |ψ⟩ with the matrix C.This operation is critical for many signal processing applications: The resulting (ψ * c) j of the convolution is defined as: A well-known characteristic of circulant matrices is that they can be defined by an associated polynomial of the cyclic permutation matrix P.This polynomial association is particularly beneficial when constructing circuits for quantum operations, as it allows for a straightforward definition and synthesis of the required matrix: The cyclic permutation matrix P is defined as: We can equivalently express the cyclic permutation matrix P as: The operator P is a cyclic adder which can be diagonalized using the Quantum Fourier Transform (QFT): Then by Theorem 15, we get the following lemma which provides bounds on the circuit size needed to construct the circulant matrix in a 1− and 2−qubit gate library.Note that the dependence on an error target ϵ is absent here because of the assumed form of the polynomial decomposition and also because rotation synthesis is not needed in this (continuous) gate set.A powerful application of the previous result is solving systems of equations that are expressed as discrete convolutions.For instance, consider a system of equations represented as Such equations are significant because we often have a known filter function that is applied to an input and we would like to have an efficient method for inverting such a transformation to find the input that is partially obscured by the convolution.To see how this can be attained, we can see from the discrete convolution theorem that the original convolutional equation can be re-expressed as a linear system via is equivalent to: where C is a circulant matrix.If C is invertible, we can then build C using Lemma 16 and invert it using the quantum matrix inversion algorithm to solve the system of equations.The cost of doing so within error ϵ is then Õ(d 2 κ 2 log 4 (N ) log(1/ϵ))) where κ is the condition number of the circulant matrix C using the inversion method of [33].This constitutes a potential exponential speedup for computing moments of the vector x over the naïve method of solving for x using matrix inversion and computing the mean from the result.Our approach allows us then to directly synthesize such matrices through the polynomial series definition of circulant matrices after diagonalization through the Quantum Fourier transform.

VII. CONCLUSION
This paper introduces a substantial advancement to quantum signal processing -the Generalized Quantum Signal Processing (GQSP) method.Unlike traditional QSP frameworks, our method employs a pair of rotations instead of solely relying on either Z− or X−rotations for signal processing operations.This strategic modification enables us to move beyond the limitations of the original QSP framework.
Another essential contribution of our GQSP method is the significant simplification it offers in the computation of phase angles compared to existing methods.In instances where both P and Q are known, we introduce a straightforward recursive formula for the angles.This substantial pedagogical improvement addresses one of the key challenges in teaching QSP methods, as the traditional techniques for finding the polynomial function in QSP can be difficult to convey.Our approach simplifies this complex aspect of QSP, making it much more accessible.Additionally, we introduced an efficient optimization algorithm for computing phase angles when only P is known, but Q is not.Our tests showed that our method can compute polynomials of degree greater than 10 7 in under a minute, an impressive computational efficiency when compared to the 10 4 degree polynomials that can be achieved using existing QSP approaches via the state-of-the-art techniques of [14].
In this paper, we explored several applications of our GQSP methodology.We presented an optimized algorithm for quantum fractional queries, along with a simplified technique for performing Hamiltonian simulation using qubitization.We proposed methods for calculating phase functions, such as the square root, and unveiled new methodologies for synthesizing circulant matrices and performing convolution operations.
As we look forward, our work reveals several potential areas for further exploration.A primary question is how to adapt the principles of our approach to multivariable QSP [15].Also, our focus on QSP suggests the potential of applying similar concepts to quantum singular value transformation for transforming block-encoded non-square matrices.In essence, this paper represents a significant progression in the QSP/QSVT framework.The exploration and understanding of the broad range of opportunities offered by these techniques promise to be a primary focus of research in quantum algorithms in the years to come.
would result in one side of (26) being 0 and the other being a non-zero value.It's easy to see when deg(P ) > deg(Q) or deg(P ) < deg(Q) setting θ d equal to 0 or π 2 respectively will result in the desired P and Q.Therefore, without loss of generality, we assume deg(P ) = deg(Q) = d.

d
n=0 a n e inx and Q(x) = d n=0 b n e inx , using our original signal operator A, then we can construct polynomials P ′ (x) = e −ikx P (x) = d−k n=−k a n+k e inx and Q ′ (x) = e −ikx Q(x) = d−k n=−k b n+k e inx for k ≤ d, using the same set of rotation angles by replacing any k instances of A with the complementary signal operator A ′ .Theorem 6 (Polynomials With Negative Powers).∀d, k ∈ N, ∀ ⃗ θ, ⃗ ϕ ∈ R d+1 , λ ∈ R and k ≤ d we have:

Suppose
P and Q are given to us as a 2×d matrix S where the first row contains the coefficients of P and the second row contains coefficients of Q.Now we'll use the induction step in the proof of 3 to formulate a simple recursive algorithm by setting θ d = tan −1 ( |b d | |a d | ) and ϕ d = Arg( a d b d ), calculating P and Q, and calling the function recursively.Calculating P and Q in this matrix representation corresponds to multiplying S by R(θ d , ϕ d , 0), and shifting the top row of the matrix which would correspond to multiplying by A † in our original formulation.The pseudo-code for this algorithm is shown in 1.

FIG. 3 .
FIG.3.Computational time required to complete the optimization as a function of the degree of the polynomial on CPU vs GPU.We can see that even on a CPU, we have an almost linear scaling of computational resources due to the O(n log n) scaling of FFT-based convolution.Furthermore, the plot showcases the superior scalability of GPUs for the optimization problem.

Theorem 11 .
Let U = e iH be a finite-dimensional unitary operator, let δ, ϵ ∈ (0, 1), t ∈ R, and assume σ min (H) ≥ δ where σ min (H) is the minimum singular value of H. Then we can implement an ϵ-approximation of e it √ H using O 1 δ log( 1 ϵ ) + |t| uses of controlled-U and 2-qubit gates using O(log 1/δ) ancilla qubits.Proof.In order to build e it √ H , we will build e i √ 2πt √ y+1 using Theorem 9 and sub-in y = x 2π − 1 to ensure y ∈ [−1 + O(δ), 1 − O(δ)].Subbing this into (69) we get: into the coefficient and re-writing the sum gives us: Corollary 8 shows that an ϵ M -approximation of e iHj 4

Lemma 16 .
Given an N × N circulant matrix C = d n=0 c n P n , we can build C (normalized) using only O(d log N + log 2 N ) 1 and 2-qubit gates.This lemma enables us to reach a valuable conclusion: Theorem 17.Given |ψ⟩ = N −1 j=0 x j |j⟩ and a filter F = {a k } d k=−d , we can convolve ψ with F : |ψ * F ⟩ = N −1 m=0 (ψ * F ) m |m⟩, where (ψ * F ) m = d k=−d a k x [m−k mod N ] (107) using only O(d log N + log 2 N ) 1 and 2-qubit gates.