Quantum algorithm and circuit design solving the Poisson equation

The Poisson equation occurs in many areas of science and engineering. Here we focus on its numerical solution for an equation in d dimensions. In particular we present a quantum algorithm and a scalable quantum circuit design which approximates the solution of the Poisson equation on a grid with error \varepsilon. We assume we are given a supersposition of function evaluations of the right hand side of the Poisson equation. The algorithm produces a quantum state encoding the solution. The number of quantum operations and the number of qubits used by the circuit is almost linear in d and polylog in \varepsilon^{-1}. We present quantum circuit modules together with performance guarantees which can be also used for other problems.

In this paper we present a quantum algorithm and circuit solving the Poisson equation. The Poisson equation plays a fundamental role in numerous areas of science and engineering, such as computational fluid dynamics [13,14], quantum mechanical continuum solvation [15], electrostatics [16], the theory of Markov chains [17,18,19] and is important for density functional theory and electronic structure calculations [20].
Any classical numerical algorithm solving the Poisson equation with error ε has cost bounded from below by a function that grows as ε −αd , where d denotes the dimension or the number of variables, and α > 0 is a smoothness constant [21,22]. Therefore the cost grows exponentially in d and the problem suffers from the curse of dimensionality.
We show that the Poisson equation can be solved with error ε using a quantum algorithm with a number of quantum operations which is almost linear in d and polylog in ε −1 . A number of repetitions proportional to ε −4α guarantees that this algorithm succeeds with probability arbitrarily close to 1. Hence the quantum algorithm breaks the curse of dimensionality and, with respect to the dimension of the problem d, enjoys exponential speedup relative to classical algorithms.
On the other hand, we point out that the output of the algorithm is a quantum state that encodes the solution on a regular grid rather than a bit string that represents the solution. It can be useful if one is interested in computing a function of the solution rather than the solution itself. In general, the quantum circuit implementing the algorithm can be used as a module in other quantum algorithms that need the solution of the Poisson equation to achieve their main task.
In terms of the input of the algorithm, we assume that a quantum state encoding a superposition of function evaluations of the right hand side of the Poisson equation is available to us, and we do not account for the cost for preparing this superposition. In general, preparing arbitrary quantum states is a very hard problem. Nevertheless, in certain cases one can prepare efficiently superpositions of function evaluations using techniques in [23,24]. We do not deal with the implementation of such superpositions in this paper.
There are many ways to solve the Poisson equation. We choose to discretize it on a regular grid in cartesian coordinates and then solve the resulting system of linear equations. For this we use the quantum algorithm of [25] for solving systems of linear equation. The solution of differential and partial differential equations is a natural candidate for applying that algorithm, as already stated in [25]. It has been applied to the solution of differential equations in [26,27]. In the case of the Poisson equation, however, that we consider in this paper there is no need to assume that the matrix is given by an oracle. Indeed, a significant part of our work deals with the Hamiltonian simulation of the matrix of the Poisson equation. Moreover, it is an open problem to determine when it is possible to simulate a Hamiltonian with cost polynomial in the logarithm of the matrix size and the logarithm of ε −1 [28]. Our results show that in the case of the Hamiltonian for the Poisson equation the answer is positive.
Our analysis of the implementation includes all the numerical details and will be helpful to researchers working on other problems. All calculations are carried out in fixed precision arithmetic and we provide accuracy and cost guarantees. We account for the qubits, including ancilla qubits, needed for the different operations. We provide quantum circuit modules for the approximation of trigonometric functions, which are needed in the Hamiltonian simulation of the matrix of the Poisson equation. We show how to obtain a quantum circuit computing the reciprocal of the eigenvalues using Newton iteration and modular addition and multiplication. We show how to implement quantum mechanically the inverse trigonometric function needed for controlled rotations. As we indicated, our results are not limited to the solution of the Poisson equation but can be used in other quantum algorithms. Our simulation module can be combined with splitting methods to simulate the Hamiltonian −∆ + V , where ∆ is the Laplacian and V is a potential function. The trigonometric approximations can be used by algorithms dealing with quantum walks. The reciprocal of a real number and a controlled rotation by an angle obtained by an inverse trigonometric approximation are needed for implementing the linear systems algorithm [25] regardless of the matrix involved.

Overview
We consider the d-dimensional Poisson equation with Dirichlet boundary conditions.
For simplicity we study this equation over the unit cube but a similar analysis applies to more general domains in R d . Often one solves this equation by discretizing it and solving the resulting linear system. A finite difference discretization of the Poisson equation on a grid with mesh size h, using a (2d + 1) stencil for the Laplacian, yields the linear system where f h is the vector obtained by sampling the function f on the interior grid points [31,30,32]. The resulting matrix is symmetric positive definite.
To solve the Poisson equation with error O(ε) both the discretization error and the error on the solution of the system should be O(ε). This implies that ∆ h is a matrix of size proportional to ε −αd × ε −αd , where α > 0 is a constant that depends on the smoothness of the solution which, in turn, depends on the smoothness of f [33,30,21]. For example, when f has uniformly bounded partial derivatives up to order four then α = 1/2.
There are different ways for solving this system using classical algorithms. Demmel [31, Table 6.1] lists a number of possibilities. The conjugate gradient algorithm [34] is an example. Its cost for solving this system with error ε is proportional to where κ denotes the condition number of ∆ h . We know κ = ε −2α , independently of d.
The resulting cost is proportional to ε −αd−α log ε −1 . For details about the solution of large linear systems see [35]. Observe that the factor ε −αd in the cost is the matrix size and its contribution cannot be overcome. Any direct or iterative classical algorithm solving this system has cost at least ε −αd , since the algorithm must determine all unknowns. So any algorithm solving the system has cost exponential in d. In fact a much stronger result holds, namely, the cost of any classical algorithm solving the Poisson equation in the worst case must be exponential in d [21].
We present a scalable quantum circuit for the solution of (2) and thereby for the solution of Poisson equation with error O(ε) that uses a number of qubits proportional to max{d, log 2 ε −1 }(log 2 d+log 2 ε −1 ) 2 and a number of quantum operations proportional to max{d, log 2 ε −1 }(log 2 d + log 2 ε −1 ) 3 . It can be shown that log 2 d = O(log 2 ε −1 ) and the above expressions are simplified to max{d, log 2 ε −1 }(log 2 ε −1 ) 2 qubits and max{d, log 2 ε −1 }(log 2 ε −1 ) 3 quantum operations. A measurement outcome at the final state determines whether the algorithm has succeeded or not. A number of repetitions proportional to the square of the condition number yields a success probability arbitrarily close to one.
In Section 3 we deal with the discretization of the Poisson equation showing the resulting matrix. We also describe how the matrix in the multidimensional case can be expressed in terms of the one dimensional matrix using Kronecker products. This, as we'll see, is important in the simulation of the Poisson matrix. In Section 4 we show the quantum circuit solving the Poisson equation. We perform the error analysis and show the quantum circuit modules computing the reciprocal of the eigenvalues and from those the controlled rotation needed at the end of the linear systems algorithm [25]. In Section 5 we deal with the Hamiltonian simulation of the matrix of the Poisson equation. The exponential of the multidimensional Hamiltonian is the d-fold tensor product of the exponential of one dimensional Hamiltonian. It is possible to diagonalize the one dimensional Hamiltonian using the quantum Fourier transform. Thus it it suffices to approximate the eigenvalues in a way leading to the desired accuracy in the result. We show the quantum circuit modules performing the eigenvalue approximation and derive the overall simulation cost. In Section 6 we derive the total cost for solving the the Poisson equation. Section 7 is the conclusion. In Appendix 1 we list a number of elementary quantum gates and in Appendix 2 we present a series of results concerning the accuracy and the cost of the approximations we use throughout the paper.

One dimension
We start with the one-dimensional case to introduce the matrix L h that we will use later in expressing the d-dimensional discretization of the Laplacian, using Kronecker products. We have where f is a given smooth function and u is the solution we want to compute. We discretize the problem with mesh size h = 1/M and we compute an approximate solution v at M +1 grid points Using finite differences at the grid points to approximate the second derivative (3) becomes where ξ i is the truncation error and can be shown to be O(h 2 || d 4 u dx 4 || ∞ ) if f has fourth derivative uniformly bounded by a constant [31].
Ignoring the truncation error, we solve With boundary condition v 0 = 0 and v M = 0, we have M − 1 equations and M − 1 unknowns v 1 , ..., v M −1 : where L h is the tridiagonal (M − 1) × (M − 1) matrix above; for the properties of this matrix, including its eigenvalues and eigenvectors see [31, Sec. 6.3].

Two dimensions
In two dimensions the Poisson equation is We discretize this equation using a grid with mesh size h = 1/M ; see Figure 1. Each node is indexed u j,k , j, k ∈ {1, 2, ..., M } (Figure 1(a) and (b)). We approximate the second derivatives using Omitting the truncation error, and denoting by −∆ h the discretized Laplacian we are led to solve where f j,k = f (jh, kh), j, k = 1, 2, . . . , M − 1 and v j,k = 0 if j or k ∈ {0, M } i.e., when we have a point that belongs to the boundary. Using the fact that the solution is zero at the boundary, we reindex (8) to obtain Equivalently, we denote this system by where ∆ h is the discretized Laplacian. For example, when M = 4, as in Figure 1, we have that v = [v 1 , ..., v 9 ] T . Furthermore (9) becomes where I is the 3 × 3 identity matrix, B is  A is a Hermitian matrix with a particular block structure that is independent of M . In particular, on a square grid with mesh size h = 1/M we have and A can be expressed in terms of L h as follows: and its size is (6) and I is the (M − 1) × (M − 1) identity matrix. Moreover, A can be expressed using Kronecker products as follows

d dimensions
We now consider the problem in d dimensions. Consider the Laplacian We discretize ∆ on a grid with mesh size h = 1/M using divided differences. As before, this leads to a system of linear equations Note that −∆ h = h −2 A is symmetric positive definite matrix and A is given by (6) and I is the (M − 1) × (M − 1) identity matrix. See [31] for the details.
Observe that the matrix exponential has the form for all γ ∈ R, where i = √ −1. We will use this fact later in deriving the quantum circuit solving the linear system.

Quantum circuit
We derive a quantum circuit solving the system −∆ h v = f h , where h = 1/M and without loss of generality we assume that M is a power of two. We obtain a solution of the system with error O(ε). The steps below are similar to those in [25]: (i) As in [25] assume the right hand side vector f h has been prepared quantum mechanically as a quantum state |f h and stored in the quantum register B. Note β j |u j where |u j denote the eigenstates of −∆ h and β j are the coefficients.
(ii) Perform phase estimation using the state |f h in the bottom register and the unitary matrix e −2πi∆ h /E , where log 2 E = log d + log(4M 2 ). The number of qubits in the top register of phase estimation is n = O(log(E/ε)). (iii) Compute an approximation of the inverse of the eigenvalues λ j . Store the result on a register L composed of b = 3 log ε −1 qubits ( Figure 2). The approximation error of the reciprocals is at most ε.
(iv) Introduce an ancilla qubit to the system. Apply a controlled rotation on the ancilla qubit. The rotation operation is controlled be the register L which stores the reciprocals of the eigenvalues of −∆ h ( Figure 2). The controlled rotation results to (v) Uncompute all other qubits on the system except the qubit introduced on the previous item.
(vi) Measure the ancilla qubit. If the outcome is 1, the bottom register of phase estimation collapses to the state where |u j denote the eigenstates of −∆ h . This is equal to the normalized solution of the system. If the outcome is 0, the algorithm has failed and we have to repeat it. An alternative would be to include amplitude amplification to boost the success probability. Amplitude amplification has been considered in the literature extensively and we do not deal with it here.

Error analysis
We carry out the error analysis to obtain the implementation details. For d = 1 the eigenvalues of the second derivative are 4M 2 sin 2 (jπ/(2M )) j = 1, . . . , M − 1.
For d > 1, the eigenvalues of −∆ h are given by sums of the one-dimensional eigenvalues, i.e., We consider them in non-decreasing order and denote them by Then the eigenvalues are bounded from above by E. Recall that we have already assumed that M is a power of two. Then E = 2 log 2 d 4M 2 ∈ N.
Note that the implementation accuracy of the eigenvalues determines the accuracy of the system solution.
Our algorithm uses approximationsλ j , such that |λ j −λ j | ≤ 17·E 2 ν ≤ ε; see Theorem 2 in Appendix 2. We use n = log 2 E + ν bits to represent each eigenvalue, of which the log 2 E most significant bits hold each integer part and the remaining bits hold each fractional part. Without loss of generality, we can assume that 2 ν E. More precisely, we consider an approximation∆ h of matrix ∆ h such that the two matrices have the same eigenvectors while their eigenvalues differ by at most ε.
We use phase estimation with the unitary matrix e −i∆ h t 0 /E whose eigenvalues are e 2πiλ j t 0 /(E2π) . Setting t 0 = 2π we obtain the phases φ j =λ j /E ∈ [0, 1). The initial state of phase estimation is ( Figure 2) where |u j is the jth eigenvector of −∆ h and β j = u j |f h , for j = 1, 2, . . . , (M − 1) d . Since we are using finite bit approximations of the eigenvalues, we have Then φ j 2 n is an integer and phase estimation succeeds with probability 1 (see [36,Sec. 5.2,pg. 221] for details).
The state prior to the application of the inverse Fourier transform in phase estimation is After the application of the inverse Fourier transform to the first n qubits we obtain Now we need to compute the reciprocals of the eigenvalues. Observe that where the last inequality holds trivially for M sufficiently large. This impliesλ j /C d ≥ Append b qubits initialized to |0 on the left (Reg.L in Figure 2), to obtain Note that from (18) k j ,λ j andλ j /C d have the same bit representation. The difference between the integer k j and the other two numbers is the location of the decimal point; it is located after the log 2 E most significant bit inλ j , and after the log 2 (E/C d ) most significant bit inλ j /C d . Therefore, we can use the labels |k j , λ j and λ j /C d interchangeably, and write the state above as Now we need to compute h j := h(λ j /C d ) = C d /λ j . We do this using Newton iteration. We explain the details in Section 4.2. We obtain an approximationĥ j such that where ε 0 = min{ε, E −1 }. We store this approximation in the register composed of the leftmost b = 3 log 2 ε −1 0 qubits. This leads to the state We append, on the left, a qubit initialized at |0 (Anc. in Figure 2). We get We need to perform the conditional rotation For this, we will approximate the first qubit by We discuss the cost of implementing this approximation in Section 4.3.
The result of approximating the conditional rotation is to obtain h j , whereh j is a q = Θ(log 2 ε −1 1 ) bit number less than 1 satisfying |h j −ĥ j | ≤ ε 2 1 and, therefore, Ignoring the ancilla qubits needed for implementing the approximation of the conditional rotation, we have the state Uncomputing all the qubits except the leftmost gives the state Let P 1 = |1 1| ⊗ I be the projection acting non-trivially on the first qubit. The We derive the error as follows where the second from last inequality is obtained using (20) and the last inequality is due to the fact that Setting ν = log 2 (17E/ε) gives error ε(1+o(1)) and the number of matrix exponentials used by the algorithm is O(log 2 (E/ε)). Therefore, if we measure the first qubit of the state |ψ and the outcome is 1 the state collapses to a normalized solution of the linear system.

Computation of λ −1
In this part we deal with the computation of the reciprocals of the eigenvalues, which is marked as the 'INV' module in Figure 2. For this we use Newton iteration to approximate v −1 , v > 1. We perform s iterative steps and obtain the approximationx s . The input and the output of each iterative step are b bit numbers. All the calculations in each step are performed in fixed precision arithmetic. The initial approximation iŝ x 0 = 2 −p , 2 p−1 < v ≤ 2 p . (We use the notationx i to emphasize that these values have been obtained by truncating a quantity x i to b bits of accuracy, i = 0, . . . , s). Theorem 1 of Appendix 2 gives the error of Newton iteration which is ) and the number of bits satisfies b ≥ 2 log 2 ε −1 0 + O(log 2 log 2 log 2 ε −1 0 ). Therefore, it suffices that the module of the quantum circuit that computes 1/λ j carries each iterative step with 3 log 2 ε −1 0 qubits of accuracy. The quantum circuit computing the initial approximationx 0 , of the Newton iteration is given in Figure 3. The second register holds |v and is n qubits long, of which the first log 2 (E/C d ) qubits represent the integer part of v and the remaining ones its fractional part. The first register is b qubits long. Recall thatλ j /C d ≥ 4. So input values below 4 do not correspond to meaningful eigenvalue estimates and we don't need to compute their reciprocals altogether; they can be ignored. Hence the circuit implements the unitary transformation |0 ⊗b |v → |0 ⊗b |v , if the first log 2 (E/C d ) − 2 bits of v are all zero. Otherwise, it implements the initial approximationx 0 through the transformation |0 ⊗b |v → |x 0 |v . Each iteration step x i+1 = −vx 2 i + 2x i is implemented using a quantum circuit of the form shown in Figure 4 that computes |x i |v → |x i+1 |v . This involves quantum circuits for addition and multiplication which have been studied in the literature [37].
The register holding |v is n qubits long and the register holding the |x i and |x i+1 is b qubits long. Note that internally the modules performing the iteration steps may use more than b qubits, say, double precision, so that the addition and multiplication operations required in the iteration are carried out exactly and then return the b most significant qubits of the result. The total number of qubits required for the implementation of each of these modules is O(log ε −1 0 ) and the total number of gates is a low degree polynomial in log ε −1 0 .

Controlled rotation
We now consider the implementation of the controlled rotation Assume for a moment that we have obtained |θ , a q qubit state, corresponding to an angle θ such that sin θ approximates ω. Then we can use controlled rotations R y about   the y axis to implement R. We consider the binary representation of θ and have Then where Y is the Pauli Y operator and θ ∈ [0, π/2]. The detailed circuit is shown in Figure 5.
In Appendix 2 we show the error in approximating the sine function using fixed precision arithmetic. In Section 5 we show the details of the resulting quantum algorithm computing the approximation to the sine function. These results, with a minor adjustment in the number of bits needed can be used here. We won't deal with the details of the quantum algorithm for the sine function in this section since we present them in Section 5 that deals with the simulation of Poisson's matrix. We will only describe the steps of the algorithm and its cost. Algorithm: (i) Take as an initial approximation of θ the value π/4.
(ii) Approximate the sin(θ) with error ε 2 1 /2 using our algorithm for the sine function (details in section 5 and Appendix 2). Let s θ denote this approximation.
An evaluation at the midpoint of an interval yields a value that satisfies either the condition of step 3, or that of step 4, or |s θ − ω| ≤ ε 2 1 /2. If at any time both the conditions of steps 3 and 4 are false then θ will not change its value until the end. Then, at the end, we have | sin(θ) − ω| ≤ | sin(θ) − s θ | + |s θ − ω| ≤ ε 2 1 , since the error in computing the sine is ε 2 1 /2. On the other hand, if θ is updated until the very end of the

|1
T algorithm the final value of theta also satisfies | sin(θ) − w| ≤ ε 2 1 , because in the final interval we have | sin(θ) − ω| ≤ |θ − sin −1 (ω)| ≤ ε 2 1 . In a way similar to that of Proposition 1 and Proposition 2 of Appendix 2 we carry out the steps of the algorithm in q bit fixed precision arithmetic, q = max{2ν + 9, 13 + ν + 2 log 2 M } and sufficiently large ν to satisfy the accuracy requirements. (The last expression for q is slightly different form that in Proposition 2 because it accounts for the fact that in the case we are dealing with here the angle is Ω(1/M 2 )). This gives us an approximation to the sine with error 2 −(ν−1) . We set ν = log 2 ε −2 1 + 1.
The algorithm for the sine function is based on an approximation of the exponential function using repeated squaring. Each square requires O(q 2 ) quantum operations and O(q) qubits. This is repeated ν times before the approximation to the sine is obtained. Thus the cost of one bisection step requires O(νq 2 ) quantum operations and O(νq) qubits. So, in terms of ε 1 , the total cost of bisection is proportional to (log 2 ε −1 1 ) 4 quantum operations and (log 2 ε −1 1 ) 3 qubits.

Hamiltonian simulation of the Poisson matrix
In this section we deal with the implementation of the 'HAM-SIM' module ( Figure 2) which effectively applies e −i∆ h t 0 onto register B. In our case the eigenvectors of the (a) Generic circuit for T M = Dπ, for details refer to [38].
· · · · · · · · · . . . = (b) Implementation of B and P m gates in (a). , P m denotes the map |x → |x + 1 mod 2 n on n qubits. Its implementation is described in [39]. See Appendix 1 for the definitions of basic gates.
discretized Laplacian are known and we use approximations of the eigenvalues. From (11) and (15) we have Thus it suffices to implement e ih −2 L h γ , for certain γ ∈ R, γ = 2π · 2 t /E, t = 0, 1, . . . , log 2 E − 1 that are required in phase estimation. This can be accomplished by considering the spectral decomposition SΛS of the matrix L h , where S is the matrix of the sine transform [40,31]. Then S can be implemented using the quantum Fourier transform. We will implement an approximation of Λ.
We remark that the quantum circuits presented here can be used in the simulation of the Hamiltonian −∆+V using splitting formulas. For results concerning Hamiltonian simulation using splitting formulas see [41,9,28].

One-dimensional case
We start with the implementation of e ih −2 L h γ , γ = 2π2 t /E, E = 4M 2 when d = 1 and t = 0, 1, . . . , n − 1, where n is the number of qubits in register C; see (17). The form of L h is shown in (6) and is positive definite. It is a Toeplitz matrix and it is known that this type of matrices can be diagonalized via the sine transform S [42]. We have The relationship between the sine and cosine transforms and the Fourier transform can be found in [40,Thm. 3.10].
In particular, using the notation in [40], we have where C M +1 , S M −1 denote the cosine and sine transforms, and the subscripts M − 1 and M + 1 emphasize the size of the respective matrix. F 2M is the 2M × 2M matrix of the Fourier transform. The matrix T M has size 2M × 2M and is given by (25).
The quantum circuits for implementing the unitary transformation T M is discussed in [38]. The action of T M can be described by [38] T M |0x = 1 where i 2 = −1, x is an n-bit number ranging 1 ≤ x < 2 n and x denotes its two's complement i.e. x = 2 n − x. The basic idea of implementing T M is to separate its operation into an operator D, which ignores the two's complement in T M , and a controlled permutation π, which transforms the state |bx to |bx only if b is 1. Therefore the action of D and π can be written as Clearly, T M = Dπ and the overall circuit for implementing operation T M is shown in Figure 8.
By (24) the sine transform S can be implemented by cascading the quantum circuits in Figure 8 with the circuit for Fourier transform [36]. An ancilla bit is added to register b. It is kept in the state |1 in order to select the lower-right block from the unitary operation T † M F 2M T M (24), a ∈ C. Considering the state |f h , that corresponds to the right hand side of (6), and for b i = i|f h we have then the element a in Equation 28 has no effect, and the circuit in Figure 6 is equivalent to applying (S M −1 e 2πiΛ2 t /E S M −1 ) onto the (M − 1) elements of |f h . This is also equivalent to simulating the Hamiltonian e 2πih −2 Λ2 t /E with the state |f h stored in register b.
We obtain eachλ j , j = 1, . . . , M − 1 by the following algorithm. The general idea is to approximate sinx = (e ix ) = ((e ix/r ) r ) with W r where W = 1 − ix/r + x 2 /r 2 is the Taylor expansion of e ix/r up to the second order term. W r is computed efficiently in fixed point arithmetic using repeated squaring. The detailed steps are the following: Eigenvalue Simulation Algorithm (ESA): (i) Let r = 2 ν+7 where ν is positive integer which is related to the accuracy of the result. The inputs and the outputs of the modules below are s = max{2ν + 9, 11 + ν + log 2 M } bit numbers. Internally the modules may carry out calculations in higher precision O(s), but the results are returned using s bits. This value of s follows from the error estimates in Proposition 2.
(ii) We perform the transformation |j |0 ⊗s → |j |ŷ j =x j /r s qubits , wherex j is the s bit truncation of x j = πj 2M . Note that y j = x j /r ∈ (0, 1) andŷ j is the s bit truncation of y j . Recall that r ≥ 2 and 2M are powers of 2. Calculations are to be performed in fixed precision arithmetic, so division does not need to be performed actually. All one needs to do is multiply j by π with O(s) bits of accuracy, keeping in track the position of the decimal point and then take the s most significant bits of the result.
(iii) We compute the real and imaginary parts of the complex numberŴ 1 by truncating, if necessary, the respective parts ofŴ 0 = 1 −ŷ 2 + iŷ to s bits of accuracy; see (42) in Proposition 1. This is expressed by the transformation Note that since |ŷ j is s qubits long,Ŵ 0 can be computed exactly using double precision and ancilla qubits and the final result can be returned in s qubits. Complex numbers are implemented using two registers, holding the real and imaginary parts. Complex arithmetic is performed by computing the real and imaginary parts of the result.
(iv) We computeŴ r approximatingŴ r 1 using repeated squaring. Each step of this procedure is accomplished by the transformation which describes the steps in (42). The registers holding real and imaginary parts of the numbers are s qubits long.
(v) (Ŵ r ) approximates sin(πj/(2M )) with error 2 −(ν−1) . Hence 2 (Ŵ r ) approximates the sin 2 (πj/(2M )). We compute the square of (Ŵ r ) exactly and multiply it by 4M 2 (this involves only shifting). We keep the ν + log 2 (4M 2 ) most significant bits of the result, which we denote by j . This means that the log 2 (4M 2 ) bits of the binary string representing j compose the integer part and the last ν bits compose the fractional parts of the approximation to λ j . Then For the error estimate details see Proposition 2. When d = 1, n (the number of qubits in register C) and ν are related by n = ν + log 2 (4M 2 ). Moreover, in the one dimensional caseλ j = j .
(vi) Let k j be the binary string representing j . For a fixed t, we implement the transformation |k j n qubits |0 ⊗n → |k j k j 2 t n qubits This is accomplished using CNOTs with the circuit shown in Figure 9, since t ≤ n the total number of quantum operations and qubits required to implement the circuit for all the values of t is O(n 2 ). (vii) Finally, we use phase kickback (see e.g. [43]) to obtain e 2πiφ j 2 t from the state |k j 2 t where φ j is the phase corresponding to the eigenvalue j that approximates λ j ; see (18).

Multidimensional case
To implement e −i∆ h γ , γ = 2π2 t /E, E defined in (16) and t = 0, . . . , n − 1 we use Therefore the quantum circuit implementing e −i∆ h γ in d dimensions is obtained by the replication and parallel application of the circuit simulating e ih −2 L h γ . For example, when d = 2 we have the circuit in Figure 6. The register B of Figure 2 contains dm qubits, m = log 2 M and its initial state is assumed to have the form (24) in each circuit for e ih −2 L h γ . Recall that |f h corresponds to the right hand side of (14).
The eigenvalues in the d dimensional case are given as sums of the one dimensional eigenvalues. We do not need to form the sums explicitly for the simulation of −∆ h ; they are computed by the tensor products. The difference between the d dimensional and the one dimensional case is that the register C in Figure 2 has log 2 d additional qubits; i.e n = log 2 d + log 2 4M 2 + ν. Accordingly, we generate the one dimensional approximations to the eigenvalues using the steps 1 − 5 of the eigenvalue estimation algorithm of the previous section. Then we append log 2 d qubits initialized to |0 ⊗ log 2 d to the left of the register holding the | j and carry out the remaining two steps 6 − 7 with n = log 2 d + log 2 4M 2 + ν. The error in the approximate eigenvalues is equal to 17M 2 d/2 ν ; see Theorem 2.

Simulation cost
Simulating the sine and cosine transforms (24) requires O(m 2 ), m = log 2 M quantum operations and O(m) qubits [38]. The diagonal eigenvalue matrix of the one dimensional case (23) is simulated by ESA. Its steps 1−3 and step 5 require O(s 2 ) quantum operations and O(s) qubits. In step 4 repeated squaring is performed ν + 7 times. Each repetition or step of the procedure requires O(s 2 ) quantum operations and O(s) qubits. The total cost of step 4 is proportional to ν · O(s 2 ) quantum operations and ν · O(s) qubits, accounting for any ancilla qubits used in repeated squaring.
Step 6 requires O(n + t) quantum operations and qubits for fixed t.
Step 7 requires O(n 2 ) quantum operations, due to the Fourier transform, and O(n) qubits.
Using Theorem 2, and requiring error ε in the approximation of the eigenvalues, we have i.e. ν = Θ(log 2 d + m + log 2 ε −1 ). We also have n = Θ(ν) and s = Θ(n). We derive the simulation cost taking the following facts into account: • Steps 1 − 5 deal with the approximation of the eigenvalues. These computations are not repeated for every t = 0, . . . , n − 1. The total cost of these steps is O(n 3 ) quantum operations and O(n 2 ) qubits.
• The total cost of step 6, resulting from all the values of t, is O(n 2 ) quantum operations and qubits.
• The total cost of step 7, that applies phase kickback for all values of t, does not exceed O(n 3 ) quantum operations and O(n 2 ) qubits.
Therefore the total cost to simulate e ih −2 L h γ , γ = 2π2 t /E, for all t = 0, . . . , n − 1, is O(n 3 ) quantum operations and O(n 2 ) qubits. From (22) we conclude that the cost to simulate Poisson's matrix for the d dimensional problem is d·O(n 3 ) quantum operations and d · O(n 2 ) qubits. Finally, we remark that the dominant component of the cost is the one resulting from the approximation of the eigenvalues (i.e., the cost of steps 1 − 5).

Total cost
We now consider the total cost for solving the Poisson equation (1). Discretizing the second derivative operator on a grid with mesh size h = 1/M results to a system of linear equations, where the coefficient matrix is (M − 1) d × (M − 1) d , i.e. exponential in the dimension d ≥ 1. Solving this system using classical algorithms has cost that grows at least as fast as the number of unknowns (M − 1) d . For the case d = 2, [31, Table 6.1] summarizes the cost of direct and iterative classical algorithms solving this system.
The relation between the matrix size and the accuracy is very important in assessing the performance of the quantum algorithm solving a linear system, since its cost depends on both of these quantities [25]. In particular, for the Poisson equation we have ignored, so far, the effect of the discretization error of the Laplacian ∆. If the grid is too coarse the discretization error will exceed the desired accuracy. If the grid is too fine, the matrix will be unnecessarily large. Thus the mesh size and, therefore, the matrix size should depend on ε, i.e. M = M (ε). This dependence is determined by the smoothness of the solution u, which, in turn, depends on the smoothness of the right hand side function f . For example, if f has uniformly bounded partial derivatives up to order four, then the discretization error is O(h 2 ) and we set M = ε −1/2 ; see [31,30] for details. In general, we have M = ε −α , where α > 0 is a parameter depending on the smoothness of the solution. This yields n = O(log 2 d + log 2 ε −1 ), since m = log 2 M = α log 2 ε −1 . The resulting number of the quantum operations for the circuit is proportional to max{d, log 2 ε −1 }(log 2 d + log 2 ε −1 ) 3 , and the number of qubits is proportional to It can be shown that log 2 d = O(log 2 ε −1 ) and the number of quantum operations and qubits become proportional to max{d, log 2 ε −1 }(log 2 ε −1 ) 3 , and max{d, log 2 ε −1 }(log 2 ε −1 ) 2 ,

respectively.
Observe that the condition number of the matrix is proportional to ε −2α and is independent of d. Therefore a number of repetitions proportional to ε −4α leads to a success probability arbitrarily close to one, regardless of the value of d. This follows because repeating an algorithms many times increases its probability to succeed at least according to the Chernoff bounds [36,Box 3.4,pg. 154]. In contrast to this, the cost of any deterministic classical algorithm solving the Poisson equation is exponential in d. Indeed, for error ε the cost is bounded from below by a quantity proportional to ε −d/r where r is a smoothness parameter [21].

Conclusion and future directions
We present a quantum algorithm and a circuit for approximating the solution of the Poisson equation in d dimensions. The algorithm breaks the curse of dimensionality and in terms of d yields an exponential speedup relative to classical algorithms. The quantum circuit is scalable and has been obtained by exploiting the structure of the Hamiltonian for the Poisson equation to diagonalize it efficiently. In addition, we provide quantum circuit modules for computing the reciprocal of eigenvalues and trigonometric approximations. These modules can be used in other problems as well.
The successful development of the quantum Poisson solver opens up entirely new horizons in solving structured systems on quantum computers, such as those involving Toeplitz matrices. Hamiltonian simulation techniques [41,9,28] can also be combined with our algorithm to extend its applicability to PDEs, signal processing, time series analysis and other areas.
Proposition 1. Let r = 2 ν+7 for ν ≥ 1 and consider the procedure computing W r , as defined in Lemma 1 using repeated squaring. Assume each step computing a square carries out the calculation using fixed precision arithmetic and that its inputs and outputs are s bit numbers. LetŴ r be the final result. Then the error is W r −Ŵ r ≤ 2 ν+9 2 s , for s ≥ 11 + ν + log 2 M , where 1/M is the mesh size in the discretization of the Poisson equation.
where r = 2 ν+7 and the error terms e 1 , e 2 , . . . , e r are complex numbers denoting that the real and imaginary parts of the results are truncated to s bits of accuracy. Observe that if |Ŵ 2 j−1 | < 1 then |Ŵ 2 j | < 1, since |Ŵ 2 j−1 | 2 < 1 and truncation of real and imaginary parts does not increase the magnitude of a complex number. Since |Ŵ 0 | < 1, all the numbers in the sequence (42) belong to the unit disk S in the complex plane.

Proof.
We have for s = max{2ν + 9, 11 + ν + log 2 M }, which completes the proof of the first part. The proof of the second and third part follows immediately.