A Universal Quantum Circuit Design for Periodical Functions

We propose a universal quantum circuit design that can estimate any arbitrary one-dimensional periodic functions based on the corresponding Fourier expansion. The quantum circuit contains N-qubits to store the information on the different N-Fourier components and $M+2$ auxiliary qubits with $M = \lceil{\log_2{N}}\rceil$ for control operations. The desired output will be measured in the last qubit $q_N$ with a time complexity of the computation of $O(N^2\lceil \log_2N\rceil^2)$. We illustrate the approach by constructing the quantum circuit for the square wave function with accurate results obtained by direct simulations using the IBM-QASM simulator. The approach is general and can be applied to any arbitrary periodic function.


Introduction
The field of quantum information and quantum computing advances in both software and hardware in the past few years. The achievement of 72-qubit quantum chip, Sycamore, with programmable superconducting processor [1] heralded a remarkable triumph towards quantum supremacy experiment [2]. On the other hand, the photonic quantum computer, Jiuzhang [3], demonstrated quantum computational advantages with Boson sampling using photons. The blooming of hardware development by IBM, Google, IonQ and many others provokes tremendous enthusiasm developing quantum algorithms utilizing near term quantum devices and pursuit of application in various fields of science and engineering. Recently there arises a growing body of research focusing on quantum optimization [4,5], solving linear system of equations [6,7,8], electronic structure calculations [9,10,11,12,13,14,15], quantum encryption [16,17], variational quantum eigensolver (VQE) [18,19] for various problems [20,21,22] and open quantum dynamics [23,24,25,26,27] . Recently, quantum machine learning further explored and implemented quantum software that could show advantages compared with the corresponding classical ones [28,29,30,31,32,33,34,35].
However, difficulties arise inevitably when attempting to include nonlinear functions into quantum circuits. For example, the very existence of nonpolynomial activation functions guarantees that multilayer feedforward networks can approximate any functions [36]. Even though, the nonlinear activation functions do not immediately correspond to the mathematical framework of quantum theory, which describes system evolution with linear operations and probabilistic observation. Conventionally, it is found extremely difficult generating these nonlinearities with a simple quantum circuit. The alternative approach is to make a compromise, imitating the nonlinear functions with repeated measurements [37,38,39], or with assistance of the Quantum Fourier Transformation [40] (QFT [41,42]). How to simulate an arbitrary function, especially nonlinear functions from a quantum circuit is an important issue to be addressed.
In this paper, we proposed a universal design of quantum circuit, which is able to generate arbitrary finite continuous periodic 1-D functions, even nonlinear ones such as the square wave function, with the given Fourier expansion. The output information is all stored in the last qubit, which could be measured for the function estimation, or used as an intermediate state for following computation, as the case of estimating the nonlinear activation function between layers in quantum machine learning methods. We presented the details of the quantum circuit design in the first section followed by numerical simulation of the circuit imitating the square wave function on IBM-QASM. The final section contains complexity analysis and further applications.

Design of the quantum circuit
Consider a periodic 1-D function F N (x) that can be expanded as Fourier series with N nontrivial components, where T is the period, and for simplicity we set T = π. To construct the quantum circuit that estimates the output function F N (x), we need N -qubits to store the input information, all of which are initially prepared at the state |ψ(x) = cos x|0 + sin x|1 . Additionally, there are M + 2 auxiliary qubits, with M = log 2 N qubits assigned q 1 , · · · , q M and the other two are q 1 , q 2 . All the auxiliary qubits are initially set as |0 states. Thus, the input state can be written as where subscripts indicate the group of the three different registers. Fig.(1a) illustrates the structure of the quantum circuit design for the output function F N (x), while the detailed evolution is demonstrated in fig.(1b). There are two main modules in the quantum circuit: The first one contains U pre acting on the auxiliary qubits q , converting them from |0 to state |ψ f , and Hadamard gates acting on q , converting them to states The intermediate state |ψ f can be described as where 2 M −1 n=0 γ n = 1 and γ n ≥ 0. Details about the design of U pre and γ n can be found in the supplementary materials. The succeeding module is formed by N controlled unitary operations, where q are the control qubits for the target qubits q and q. Denote the unitary operations as U n , with a general structure shown in fig.(1c). Initially, the Hadamard gates are applied on q , and a rotation Y gate is applied on the first qubit q 1 . All these three qubits then acts as control qubits, while q 2 is the target in following operation. Next, qubit q 2 , q 3 , · · · , q n are connected as a chain with simple control rotation Y gates. Finally a swap gate between q n and q N is included, ensuring that all the necessary information are stored in the last qubit q N .
For simplicity, in the operation U n , we define w k 0,1 and v k 0,1 as and α, β when k ≥ 2 as β k = arctan 2(sin 2v k+1 while we have α = 1/2 and β = 2w 1 1 when k = 1. To estimate F N (x), we need to ensure that C N n=1 a n cos(2nx where C is a nonzero constant ensuring that |CF N (x)| ≤ 1 2 , and the superscript m of α m , β m k indicating that they belong to the operation U m . The right hand side of eq.(7) is the probability to get the outcome result |1 when measuring q N itself after running the whole quantum circuit (More details can be found in the supplementary materials). Figure 1: Sketch of the quantum circuit estimating F N (x). Structure of the whole circuit is shown in fig.(1a). There are two main modules. The first one contains U pre acting on the auxiliary qubits q , and Hadamard gates acting on q . The succeeding module is formed by N controlled unitary operations denoted as U n . The corresponding evolution is demonstrated in fig.(1b), where q (blue color) are control qubits. q are converted to state |ψ f (γ) under the operation U pre , where γ is determined by F N . The detailed sketch of U n is shown in fig.(1c). Initially, Hadamard gates are applied on q , and a rotation Y gate is applied on the first qubit q 1 . All these three qubits then acts as control qubits, while q 2 is the target in following operation. Next, qubits q 2 , q 3 , · · · , q n are connected as a chain with simple control rotation Y gates. Finally a swap gate between q n and q N is included, ensuring that all the necessary information are stored in the last qubit q N .
Thus, subsequent to the whole operation, the output state will be where |φ 0,1 describing output state of all qubits and auxiliary qubits except the last one q N . Hence, information stored in q N is essential and sufficient. After measuring q N , the probability of getting |1 is an estimation of F N (x). Whereas, it is also appropriate to apply succeeding operations on q N , regarding it as an intermediate state for further computation. The same quantum circuit structure works when the input is in a superposition states, namely |Ψ s in (x) describing a vector x that contains L + 1 components, where q , q , q are entangled with some other qubits, namely in Q. The superscript s represents superposition, subscripts q , q , q and Q indicate the group of different qubits in different registers, and L l=0 |c l | 2 = 1. |Φ l is a complete orthogonal set of the subspace expanded by Q, ensuring that Φ l |Φ l = δ l l . After the whole operation, the output state is given by If we only focus on the subspace expanded by q N and Q, Then if q N is measured, probability to get result |1 will be 1 2 + C L l=0 |c l | 2 F N (x l ) 1 2 . This property leads to potential applications in quantum algorithm development, for example, the design of nonlinear activation between layers in quantum machine learning.
2 Implementation: Simulation for the square wave function In this section we will demonstrate the quantum circuit design with a trivial example: Simulation of the square wave function. Consider the following Fourier expansion for the square wave function f (x) = sign(sin 2x), the sum of the first 7 terms is given by, where F (x) is an odd function with a 2,4,6 = 0. q 1,··· ,7 are required carrying out the input information, all of which will be initially prepared in the state |ψ(x) . Additionally, we need 5 auxiliary qubits, denoting them as q 1,2 , q 1,2,3 , and a n , b n satisfy C j n=1 a n cos(2nx + b n ) = N n=1 a n cos(2nx Then for n > 1 we set Eq.(13) ensures that γ 2,4,6 are all zero, so that we do not need to construct U 2,4,6 in the circuit. We need to stress the fact that the above setting is not optimal, especially when one prefer a greater value of |C| instead of a shallow circuit. Fig.(2a) is a scheme of the whole operation. Operations in the blue block is U pre , which converts q into state |ψ f (γ) in eq.(3). Due to the fact that γ 2,4,6 are all zero, then there is no need for construct U 2,4,6 in the quantum circuit. As shown in the green block, U 1 is a single rotation Y gate acts on the last qubit q 7 directly. The other U n acts on q 1,2 , q 1,2,··· ,j and q 7 . For illustration, we only plot details of U 3 , as shown in the yellow block. Based on eq.(7), eq.(11) can be rewritten as Contributions of each single operation U j alone is shown in fig.(2b). The sum of all their contributions are shown in fig.(2c), which is the output result. P 1 is probability to get the outcome result |1 when measuring the last qubit q 7 after the whole operation. Fig.(2d) is a sketch of the connectivity structure. Each node represents a single qubit. If two qubits are connected via a blue curve, then there is at least one 2-qubit gate acting on them. All auxiliary qubits q and q are connected with each other. Meanwhile, all qubits q are connected to all of the other auxiliary qubits. For any 1 ≤ n < N , there is also a connection between q n and its neighbor q n+1 . The last qubit q N is connected to all other qubits.   Figure 2: The quantum circuit estimating 1-D square wave function. Fig.(2a) is a sketch of the quantum circuit estimating square wave function. Operations in the blue block is U pre , which converts q into state |ψ f (γ) in eq.(3). Due to the fact that γ 2,4,6 are all zero, U 2,4,6 disappears in the circuit. As shown in the green block, U 1 is a single rotation Y gate acts on the last qubit q 7 directly. The other U n acts on q 1,2 , qubits q 1,2,··· ,j and q 7 . Attributable to the space, here we only plot details of U 3 , as shown in the yellow block. Numerical simulation results are included in fig.(2b,2c). P 1 is probability to get result |1 when measuring the last qubit q 7 after the whole operation.
x is the variable in the input state |ψ(x) . Contributions of each single operation U j alone is shown in fig.(2b). Amplitudes are not included when plotting fig.(2b). In fig.(2c), the blue curve represents the sum of contributions of all U 1,3,5,7 , which is as well the expected result when measuring q 7 . Meanwhile, the red curve is the original shape of square wave functions. Fig.(2d) is a sketch of the connectivity structure. Each node represents a single qubit. If two qubits are connected via a blue curve, then there is at least one 2-qubit gate acting on them.
Further, we also implemented U 3,5,7 independently based on IBM QASM simulator, as shown in fig.(3), where for each single dot we collect data from 8192 iterative measurements. Results from IBM QASM simulator fit well with the theoretical prediction corresponding to U 3 and U 5 , as shown in fig.(3b) and fig.(3c). More details of the simulations can be found in the supplementary materials.  Red lines represent the theoretical prediction, and blue dots(or diamonds, triangles) represent the results on IBM QASM simulator. P is the probability to get state |0 or |1 when measuring the last qubit.
(Generally, we prefer to get state |1 when calculating a positive component in the expansion, and |0 when calculating a negative one.) For each single dot we collect data from 8192 iterative measurements.

Time complexity
We will compare the time complexity for three various situations: In the first situation we consider classical inputs, while for others we consider some unknown quantum states as inputs. One can either estimate x from the input and calculate F N (x) classically, or apply the method presented in this article to estimate the outputs.
To estimate a periodic function F N (x) within error , O( 1 ) times of measurement are required [43]. Initially N R y rotation gates are required for mapping x into quantum state |ψ(x) ⊗N . Moreover, we need M + 2 auxiliary qubits, where M = log 2 N for control qubits. Consider the basic unit U n . When n > 2, there are 4 Control-Control rotation gates, 4n − 8 controlrotation gates and one optional 2-qubit swap gate. Thus, it takes time O(n log 2 n 2 ) to implement the operation |n n| ⊗ U n . After taking the M auxiliary qubits into account, M control qubits are added to all gates in U n . Therefore, it takes time O(n) to achieve a single U n Notice that there are N similar U n , where n = 1, 2, · · · , N , time complexity to finish all N n=1 |n n| ⊗ U n is O(N 2 log 2 N 2 ). Totally, to derive the estimation CF N (x) within error , the time complexity is of order O(N 2 log 2 N 2 / ), which is still polynomial. On the other hand, time complexity to estimate F N (x) for a single x based on Taylor expansion is also polynomial to N . Hence under this situation there is no speedup comparing with the classical calculation.
Situation II. Estimate F N (x) with input quantum state |ψ(x) ⊗N , where the value of x is still unknown: The only difference from the previous situation is that the mapping process now can be skipped. Still, time consuming is of order O(N 2 log 2 N 2 / ). By contrast, to calculate F N (x) from |ψ(x) ⊗N classically, the first step is to derive x from the input states. It requires O( 1 ) measurements to get cos x within error , and then the time complexity to estimate F N (x) is polynomial to N . Still, under this situation there is no speedup comparing with the classical calculation.
Situation III. Estimate L l=0 |c l | 2 F N (x l ) with input quantum state |Ψ s in (x) , as described in eq.(9): Here we denote N as the number of qubits that form state |Φ , and L = 2 N − 1. Even though more variables are introduced into the input, we do not need to change anything in the quantum circuit. To get an estimation of C L l=0 |c l | 2 F N (x l ) with the quantum method, time consuming is still of order O(N 2 log 2 N 2 / ), which does not depend on the scale of N (Number of qubits in Q). However, in the classical method, as c j are unknown initially, one must estimate them before calculating L l=0 |c l | 2 F N (x j ).
Time complexity of quantum tomography is exponential in N , or at least polynomial in N with shadow quantum tomography. The time complexity of quantum method is only determined by N , while the classical method is at least polynomial in N [45]. Thus, when N >> N , the quantum method will lead to polynomial speedup comparing with the classical one.
Consider the task to estimate |c 0 | 2 F (x 0 ) + |c 1 | 2 F (x 1 ), where |c 0 | 2 + |c 1 | 2 = 1. For simplicity, here we set F (x) = cos 3 (x − 0.2384), which can be implemented by U 3 itself, excluding the q registers. (Structure of U 3 can be found in Fig.(2) in the main article and Fig.(S2) in the SM.) In Situation III, the input states |Ψ s in (x) can be described by Eq.(9). In the task estimating |c 0 | 2 F (x 0 ) + |c 1 | 2 F (x 1 ), we only need one qubit as Q preparing the initial states. A sketch of the quantum circuit calculating |c 0 | 2 F (x 0 ) + |c 1 | 2 F (x 1 ) is shown in Fig.(4). There are totally 6 qubits, q 1,2 and q 1,2,3 implementing U 3 , and qubit Q corresponding to Eq. (9). Initially, all qubits are set as |0 . Operations in the dashed blue square converts them into state where |Ψ in (x) q ,q ,q = |0 ⊗2 q ⊗ |ψ(x) ⊗3 q , and |ψ(x) = cos x|0 + sin x|1 . Coefficients c 0,1 are determined by the gate R y (θ sup ) acting on Q, leading to Then, operation U 3 is applied on q and q, implementing the calculation of F (x). After the whole operations, the output state is given by In this example, P 0 , which represents the probability to get result |0 when measuring q 3 , can be calculated as Thus, we only need to measure q 3 itself to estimate |c 0 | 2 F (x 0 ) + |c 1 | 2 F (x 1 ). Figure 4: A sketch of the quantum circuit that calculates |c 0 | 2 F (x 0 ) + |c 1 | 2 F (x 1 ). There are totally 6 qubits, q 1,2 and q 1,2,3 implementing U 3 , and qubit Q corresponding to Eq. (9). Initially, all qubits are set as |0 . Operations in the dashed blue square converts them into state |Ψ s in (x 0 , x 1 ) , as shown in Eq. (14). Then, operation U 3 is applied on q and q, implementing the calculation of F (x). After the whole operation, we only to measure q 3 itself to estimate |c 0 | 2 F (x 0 ) + |c 1 | 2 F (x 1 ). P 0 , the probability to get result |0 when measuring q 3 is presented in Eq. (15).
Further, we implement the operation shown in Fig.(4) on IBM QASM simulator. Simulation results obtained from IBM QASM simulator are shown in Fig.(5b), where P 0 indicates the probability to get result |0 when measuring q 3 after the whole operation. Blue dots or diamonds represent the simulation results from IBM QASM simulator, while the red curve is theoretical prediction based on Eq.(15). In Fig.(5a), we set x 0 = 0, θ sup = π, and collect P 0 under various x 1 values. P 0 shows the shape of F (x 1 ), as we set θ sup = π so that x 0 has no contribution to P 0 . In Fig.(5b), we set x 0 = 3.4, x 1 = 0.2, and collect P 0 under various θ sup values. Theoretically P 0 should have a single sin function shape, and the simulation results fit the prediction very well. For each single dot we collect data from 8192 iterative measurements.
In the simulation only one qubit, q 3 , is measured, and |c 0 | 2 F (x 0 ) + |c 1 | 2 F (x 1 ) can be estimated from P 0 , the probability to get result |0 . Therefore, with the quantum algorithm as we proposed, we do not need to estimate the exact values of c l , so that the quantum method leads to speedup comparing with the classical version in situation III, especially when L is large. Here we present the simulation results obtained from IBM QASM simulator. In both figures P 0 indicates the probability to get result |0 when measuring q 3 after the whole operation shown in Fig.(4). Blue dots or diamonds represent the simulation results from IBM QASM simulator, while the red curve is theoretical prediction based on Eq.( (15)). In Fig.(5a), we set x 0 = 0, θ sup = π, and collect P 0 under various x 1 values. P 0 shows the shape of F (x 1 ), as we set θ sup = π so that x 0 has no contribution to P 0 . In Fig.(5b), we set x 0 = 3.4, x 1 = 0.2, and collect P 0 under various θ sup values. Theoretically P 0 should have a single sin function shape, and the simulation results fit the prediction very well. For each single dot we collect data from 8192 iterative measurements.

Conclusion
Here, we propose a universal quantum circuit design for any arbitrary one-dimensional periodic function. The inputs are sufficient qubits prepared at the same state, while the last one will represent the output outcome. One can either estimate the exact value from repeating measurements on the last qubit, or regard it as an intermediate state prepared for succeeding operations. Superposition in the input leads to similarly superposition in the output, which leads to speedup under some certain circumstances, especially when dealing with unknown quantum inputs. As an simple example we illustrate the quantum circuit design for the square wave function. Both exact simulations and implementation on IBM-QASM gave very accurate result and illustrate the power of this proposed general design. This general approach might be used to construct an appropriate quantum circuit for the electronic wave function in periodic solids and materials, moreover in quantum machine learning particularly simulating the non-linear function used in the network.

Consider a 1-D continuous bounded function
As a continuous periodic function, F (x) can be expanded with Fourier series where a n and b n are defined as The first N nontrivial components in Fourier expansion are given by For simplicity, in the main article and following discussion we assume that T = π. Hence, we have Given a certain F N as eq.(S6), our goal is to design a quantum circuit that can estimate F N (x) with N copies of the input state |ψ(x) , where |ψ(x) = cos x|0 + sin x|1 (S7) If more than N nontrivial terms are obtained from eq.(S1), then F N (x) becomes an approximation of the original function F (x). We will focus on the construction of quantum circuit estimating F N (x), while the remainder of the Fourier series (if any) is not covered. Therefore, accuracy of the quantum algorithm is similar to the approximation F N (x), and in general to reach a higher accuracy more terms would be required. Figure S1: A sketch of a simple chain of qubits connected by control gates.
To convert the three qubits q 1 , q 2 , q 3 into |ψ f (γ) , the iterative process is repeated for three times. In the first iterative 'loop', we only need to apply a single rotation gate on q 1 . In the second loop, q 1 performs as a control qubit. Two different rotation gates are applied on q 2 corresponding to the two eigenstates of q 1 . Similarly, in the third loop, q 1 and q 2 together play the role of control qubits, and there are four rotation gates acting on q 3 . Totally, three are one rotation y gate acting on q 1 , two control rotation gates and four control control rotation gates.

S2 Design of U n
Consider the simple operation shown in fig.(S1), which is a chain of qubits connected by control gates. All qubits are set as |ψ(x) initially. Denote the probability to get result |0 when applying a single Z measurement on qubit q n as P n 0 , and probability to get |1 as P n 1 , we have P n 0 + P n 1 = 1. We denote U 0,1 n as rotation gates act on qubit q n when the control qubit q n−1 is |0 or |1 , with an exception that U 1 is a single qubit operation acts on q 1 . Corresponding to sketch fig.(S1), we have U n 0 = R y (θ n )), and U n 1 = R y (θ n ). As F N (x) only contains a single variable, here we set all control gates as control rotation-y gates. For simplicity, in the operation U n , we define w k 0,1 and v k 0,1 as Due to there is no control qubits controlling the first qubit, we have v 1 0 = w 1 0 , v 1 1 = w 1 1 . Using eq.(S8), we can write down the iterative relation between P n 1 and P n+1 where A n (x) and B n (x) are defined as Notice that P 0 1 = A 0 (x), we have B n (x) can be rewritten as · cos 2x − arctan 2(sin 2v n+1 where we denote β n = arctan 2(sin 2v n+1 ). Substitute eq.(S13) into eq.(S12), where for simplicity, we denote )| cos β j+1 + cos 2w j+1 1 ) (S17) Then we can rewrite P 1 n as Figure S2: Sketch of quantum circuit U n (Θ n ). U n (Θ n ) is the basic modulus of the whole circuit that can estimate outputs with given function F N . Θ n contains all parameters θ, θ or v, w of the control rotation gates. U n (Θ n ) can generate components of Fourier series no greater than the n-th order. Different from the simple qubit chain shown in fig.(S1), there are two auxiliary qubits q 1 and q 2 in quantum circuit U n (Θ n ). q 1 and q 2 are initially prepared at state |0 , and will acts as control gates after the Hadamard (H) gate. Mathematically, it can generates output as described in eq.(S20). In the main article, there is additionally a swap gate in U n , which ensures all U n share the same qubit carrying outputs.
Further, if we replace the initial state of q 1 with |+ , discard U 1 , and keep all the rest the same, then the probability to get |1 when measuring q N will be Hence, for the circuit shown in fig.(S2), if we apply Z measurement on q N after the whole operation, the probability to get |1 will be In the main article we denote the circuit shown in fig.(S2) as U n (Θ n ), where there are totally n qubits. We use Θ n to represent parameters θ and θ of the rotation gates. Thus, at most N operations are required to rebuild F N (x), denote them as U n , n = 1, 2, · · · , N . U n |ψ(x) ⊗n can generate Fourier series no greater than the n-th order. U n contains a control operation chain that acts on q 1 , q 2 , q N −n+1 , q N −n+2 , · · · , q n , where q 1 , q 2 are still auxiliary qubits and now the control operation chain starts from qubit q N −n+1 . Generally in each operation U n , there are several free variable sharing the similar notation, like v, w, β.
To avoid misleading notations, we introduce superscript to indicate the operation U n , and define when k ≥ 2 Thus, if we apply U n on N input qubits all prepared at state |ψ(x) , the probability to get |1 when measuring the final qubit q N will be while we have α = 1/2 and β = 2w 1 1 when k = 1.
The present scheme can be extended to higher dimensions. By replacing the input state from |ψ(x) ⊗n into |ψ(x) ⊗nx ⊗ |ψ(x) ⊗ny , the probability to get |1 when measuring the final qubit q N after U n will be where n x , n y ≥ 0 and n x + n y = n. With this approach multi variable can be included into the scheme, leading to the potential applications for higher dimensions.
To extend the method into complex varaiables, the real and complex terms must be calculated separately. In most classical fast Fourier transform (FFT) algorithms, both the real terms and complex terms are included. Therefore, at least two parallel quantum circuits are required if one would like to calculate the complex functions based on our algorithm. S3 Design of U pre U pre is introduced to convert the auxiliary qubits q 1 , q 2 , · · · , q M into |ψ f where γ = (γ 0 , γ 1 , · · · , γ 2 M −1 ), γ n ≥ 0 and 2 M −1 n=0 γ n = 1. Here we provide a universal but not optimal design of U pre . Denote operation U pre (γ) that satisfies Figure S3: A sketch of U pre that acts on three qubits.
To convert the three qubits q 1 , q 2 , q 3 into |ψ f (γ) , the iterative process is repeated for three times. In the first iterative 'loop', we only need to apply a single rotation gate on q 1 . In the second loop, q 1 performs as a control qubit. Two different rotation gates are applied on q 2 corresponding to the two eigenstates of q 1 . Similarly, in the third loop, q 1 and q 2 together play the role of control qubits, and there are four rotation gates acting on q 3 . Totally, three are one rotation y gate acting on q 1 , two control rotation gates and four control control rotation gates.
Then consider constraints of γ. If we measure the last qubit q N after the whole operation, probability to get |1 is Substitute eq.(S22), we can continue the above deduction, Subsequent to the whole operation, the output state is where |φ 0,1 describing output state of all qubits except the last one q N , and C is a constant ensuring 1 2 ± CF N (x j ) ≥ 0. Hence, information stored in q N is essential and sufficient. After measuring q N , the frequency getting |0 is an estimation of F N (x). Whereas, it is also appropriate to apply succeeding operations on q N , regarding it as an intermediate.
To estimate F N (x), we need to ensure that where γ gives |ψ f . Theoretically, any design that satisfies eq.(S33) can be used to estimate F N (x). Here we only give one solution set that we used in our simulation. Denote a n , b n as C j n=1 a n cos(2nx + b n ) = N n=1 a n cos(2nx + b n ) −

S4 Implementation on IBM QASM simulator
In the main article, we implemented U 3,5,7 independently on IBM QASM simulator. Fig.(S4) is a sketch of the implementation of U 5 , where there are two auxiliary qubits q 0,1 and five qubits q 2 , · · · , q 6 carrying the input information x. The control-control-RY gates are decomposed into two CNOT gates and three control-RY gates. U 5 is designed to generate the component cos 5 (2x − 0.0046). Notice that the corresponding γ 5 is negative, we set,    θ n = 0 , n > 1 θ n = −2 · [(β n − π/2) mod π] , n > 1 θ 0 = θ 0 = −β 0 (S36) where β n = 0.0046. Measure the last qubit q 6 after the whole operation, probability to get state 0, namely P 0 , will be proportional to cos 5 (2x − 0.0046). In other words, by this design we generate the component −cos 5 (2x − 0.0046) in the final output, as P 1 is used for the estimation of F N (x). To implement U n , totally O(n) CNOT gates and control-RY gates and NOT gates are required. On the other hand, these setting can also be used for a positive term together with a final NOT gate applying on the last qubit.
Generally there are three approaches when optimizing the parameters with the constraint shown in Eq.(7). One approach is to optimize the parameters ensuring that |C| is as large as possible. The larger the |C| value is, the less measurements is required to obtain the estimation. Another approach is to optimize the parameters ensuring that the circuit is as shallow as possible. The last approach is to make a balance, ensuring that the total time consuming is as less as possible. Here we choose the second approach, ensuring that the circuit is as shallow as possible. In this example, as the square wave function is an odd function, our design with γ 2,4,6 = 0 reduces half U n operations. The special choice of θ and θ ensure that half of them are zero, so that we can further reduce it to about half the gates. We need to stress that the our setting is not optimal, especially when one prefer a larger value of |C| instead of a shallow circuit. Figure S4: A sketch of U 5 implemented on IBM QASM simulator. There are two auxiliary qubits q 0,1 and five qubits q 2 , · · · , q 6 carrying the input information x. For simplicity, other qubits are not included. The control-control-RY gates are decomposed into two CNOT gates and three control-RY gates.