Fourier Analytic Approach to Phase Estimation

For a unified analysis on the phase estimation, we focus on the limiting distribution. It is shown that the limiting distribution can be given by the absolute square of the Fourier transform of $L^2$ function whose support belongs to $[-1,1]$. Using this relation, we study the relation between the variance of the limiting distribution and its tail probability. As our result, we prove that the protocol minimizing the asymptotic variance does not minimize the tail probability. Depending on the width of interval, we derive the estimation protocol minimizing the tail probability out of a given interval. Such an optimal protocol is given by a prolate spheroidal wave function which often appears in wavelet or time-limited Fourier analysis. Also, the minimum confidence interval is derived with the framework of interval estimation that assures a given confidence coefficient.

For a unified analysis on the phase estimation, we focus on the limiting distribution. It is shown that the limiting distribution can be given by the absolute square of the Fourier transform of L 2 function whose support belongs to [−1, 1]. Using this relation, we study the relation between the variance of the limiting distribution and its tail probability. As our result, we prove that the protocol minimizing the asymptotic variance does not minimize the tail probability. Depending on the width of interval, we derive the estimation protocol minimizing the tail probability out of a given interval. Such an optimal protocol is given by a prolate spheroidal wave function which often appears in wavelet or time-limited Fourier analysis. Also, the minimum confidence interval is derived with the framework of interval estimation that assures a given confidence coefficient. Estimating/identifying the unknown unitary operator is discussed in both research fields of quantum computation [1,2] and quantum statistical inference, therefore, it is a fundamental topic in quantum information. In quantum computation, the unknown unitary operator is given as an oracle, and it is discussed how many applications are required for identifying the given unknown oracle with a given precision. In quantum statistical inference, in contrast, many researchers optimize the average fidelity or mean square error between the true unitary and the obtained guess. In the both research sides, we optimize the state inputting the unknown unitary as well as the measurement, and quadratic speedup is reported from the both research sides [3,4,5,6]. However, the both areas discuss the same topic based on the different criterion in this manner, and there is an example such that the quadratic speedup appears depending on the choice of the criterion. Therefore, since the relation between both criteria is not clear, it is required to treat this problem with a common framework.
In the present paper, as the most typical example, we focus on the phase estimation, in which, the quadratic speedup was demonstrated experimentally [7]. Kitaev first treated the phase estimation problem in quantum computation viewpoint [8]. Since it appears in Shor's factorization, it is considered as a fundamental topic in quantum computation as well as physics. In order to treat the quadratic speedup more deeply, we focus on the limiting distribution, which provides the stochastic behaviour of the estimate around the neighborhood of the true parameter. That is, it provides the distribution of the random variable n(θ − θ) when the estimate, the true parameter, and the number of applications are given asθ, θ, and n. While the limiting distribution is a common concept in statistics and was studied in estimation of quantum state [10,11,12], it was not studied in estimation of the unknown unitary operator systematically.
The concept of 'limiting distribution' is very useful for the following four points. First, the variance of the limiting distribution gives the asymptotic first order coefficient of the mean square error. Second, the tail probability of the limiting distribution for a given interval provides the tail probability of the interval when the width of the interval behaves the order 1 n . Third, using the limiting distribution, we can discuss the phase estimation under the framework of interval estimation, whose meaning is explained later. Forth, using this concept, we can treat the number of applications required for attaining the given accuracy, i.e., the error probability and the error bar in the asymptotic framework. That is, the above four advantages correspond to respective criteria. Therefore, limiting distribution provides a unified framework for these criteria. The first three criteria are familiar in statistics, and the forth criterion is familiar in computer science.
In the present paper, we analyze the limiting distribution in the phase estimation systematically, and show that the limiting distribution is expressed by Fourier transform of a square integrable function on the closed interval [−1, 1], which approximately gives the input state in the asymptotic setting.
In the realistic setting, the optimization of the forth criterion is more appropriate than that of the first criterion, i.e., the optimization of the mean square error or the average fidelity. However, the estimator only with several special input states were treated in the forth formulation, and its optimization was not discussed while the optimization concerning the mean square error and the average fidelity were done in the existing researches [4,13].
In the statistics, in order to treat this problem, they consider interval estimation, in which, our inference is given as an interval. Indeed, there are two formulations in the statistics; one is the point estimation, in which, our estimate is given as only one point, and the other is the interval estimation. In the point estimation, it is not easy to guarantee the quality of our estimate because the estimated value always has statistical fluctuation. In order to resolve this problem, in the interval estimation, for given data and confidence coefficient, our estimate is given as a interval, which is called a confidence interval. In this formulation, smaller width of confidence interval is better. As is mentioned in Section II, when number n of data is sufficiently large, the confidence interval can be provided by using limiting distribution [14].
Further, we analyze the variance and the tail probability of limiting distribution. As our result, concerning the limiting distribution, we prove that the limiting distribution minimizing the asymptotic variance does not minimize the tail probability. It is also shown that the limiting distribution minimizing the tail probability depends on the width of interval. The definition of the tail probability depends on the width of interval. Such an optimal input state is given by a prolate spheroidal wave function [15] which often appears in wavelet or timelimited Fourier analysis. This function is a solution of the linear differential equation [16]: Originally, prolate spheroidal wave functions appears in analysis on Helmholtz equation in electromagnetics [17] or determination of laser mode [18]. Employing this wave function, Slepian and Pollak [15] extended Shannon's sampling theorem the case where the time-interval is limited as well as the bandwidth while the original Shannon's sampling theorem treats the bandwidth-limited case. Further, by using these facts, we study optimal interval estimation. We provide the estimation protocol minimizing the width of confidence interval that assures the given confidence coefficient. In this case, the optimal estimation protocol depends on a given confidence coefficient. That means, we must prepare input state properly depending on the confidence coefficient.
The paper is organized as follows. In Section II, the formulation of the phase estimation is given and the limiting distribution is introduced with explanation of its meaning. In Section III, we clarify the relation between the limiting distribution and Fourier transform. In Section IV, we analyze the variance of the limiting distribution. This problem is reduced to find the minimum eigenvalue of a operator in the Dirichlet problem. In section V, the tail probability of the limiting distribution is discussed. It is shown that the limiting distribution minimizing the variance does not provide a small tail probability. In section VI, we treat interval estimation problem. This problem can be analyzed by a prolate spheroidal wave function and the eigenvalue of the defining differential equation. In section VII, the phase estimation with a single copy is discussed in the continuous system. The discussions in Sections IV and V can be applied to this formulation under the deterministic energy constraint. In section VIII, we shortly note on the asymptotic Cramér-Rao lower-bound. As is mentioned later, this formulation essentially contains the most general framework with n applications of the unknown unitary. In the following discussion, the true parameter is described by θ and our estimate is bŷ θ.
As is shown in Appendix A, this problem is equivalent to estimate the parameter θ of the unitary operation: Our scheme for estimating θ is as follows. First, prepare an input state |φ 0 = n k=0 a k |k where the coefficients a = {a k } n k=0 satisfy the normalizing condition k |a k | 2 = 1. Second, evolve the input state |φ 0 by the unitary evolution U θ . And the last, perform a measurement described by a POVM M = M (θ)dθ. Then, the estimateθ obeys the probability distribution When our error function is given by R(θ,θ), we optimize the mean error D θ (M, a) := 2π 0 R(θ,θ)P M θ, a (θ)dθ. We only consider the covariant framework, i.e., the error function R(θ,θ) is assumed to be given by a function of the difference (θ −θ) Mod 2πZ. For example, when we focus on the gate fidelity | Then, our measurement may be restricted into a group covariant measurement where |t = n k=0 e iξ k |k .
This is because the minimum of the Bayesian average value min M 1 2π 2π 0 D θ (M, a)dθ under the invariant prior and the mini-max value min M max θ D θ (M, a) can be attained by the same group covariant measurement [19]. Therefore, we restrict our measurement to covariant measurements in the following discussion. Then, our protocol is described by the pair of the coefficient of the input state a and the vector |t given in (3).
Further, without loss of generality, we can restrict our protocol to the pair of ( a, |t 0 ) as follows, where For any protocol ( a, |t ), we define a ′ = {a k } by Then, as is explained below, the protocol ( a ′ , |t 0 ) has the same performance as the protocol ( a, |t 0 ).
Therefore, the choice of our protocol is essentially given by the choice of input state. Dam et al [24] proved this argument in a more general framework as follows. When the number of application is n, any protocol can be simulated by the above formulation. That is, any adaptive application of the unknown unitary V θ can be simulated by the n-fold unitary evolution V ⊗n θ under the above error function.
The main target of the present paper is analyzing the asymptotic behavior of output distribution for the sequence of input states M := { a n }. For this purpose, we treat the distribution concerning the parameter z n = n(θn−θ) 2 because the estimateθ n approaches the true parameter θ with the order 1 n when an appropriate measurement and an appropriate input state are used. When the random variable z n converges to a random variable z in probability, the distribution P (M) of z is called the limiting distribution of the sequence of input states M. In the case of state estimation including the classical case, if we apply a suitable estimator, the limiting distribution is the Gaussian distribution under a suitable regularity condition. In the classical case, more precisely, the asymptotic sufficient statistics for the given parameter obeys this Gaussian distribution. That is, any estimator is given as a function of the statistics obeying the Gaussian distribution, asymptotically. This Gaussian distribution is characterized only by the variance.
Even in the quantum case of state estimation, the estimation problem can be reduced in that of quantum Gaussian states family in the asymptotic sense [10,11,12]. In particular, if we treat the estimation of one-parameter model, we obtain the same conclusion as is in the classical case. Hence, it is sufficient to evaluate the variance for considering the limiting distribution. That is, there is no variety concerning the limiting distribution of the state estimation because the limiting distribution is uniquely determined as the Gaussian distribution. The main problem in the present paper is, on the other hand, considering whether there exist a variety concerning the limiting distribution of the phase estimation.
When the cost function R(θ,θ) has the form 1 4 . Hence, the analysis on the limiting distribution yields the asymptotic analysis on average gate fidelity. Further, the analysis on the limiting distribution provides the asymptotic analysis on the phase estimation from another aspect. For example, the tail probability of the sequence of input states M can be described as follows: Using this relation, we can evaluate the required number of applications of the unknown unitary gate for the given allowable error width B and the given allowable error probability ǫ as follows. First, we choose A by P (M){|z| > A} = ǫ. Next, we choose n by A n = B, i.e., n = A B . Thus, the required number of applications is equal to A B if we use the sequence of input states M. The above discussions clarify that the analysis on the limiting distribution yields various types of asymptotic analysis on phase estimation.
Hence, in the present paper, for a deeper and unified asymptotic analysis on phase estimation, we analyze the limiting distribution of the sequence of input states M.

III. RELATION WITH SQUARE INTEGRABLE FUNCTIONS
In this section, we give a remarkable relation between limiting distributions and square integrable functions on [−1, 1].

Theorem 1 For any sequence of input states M having the limiting distribution, there exists a function
where Conversely, for a function f ∈ L 2 ([−1, 1]) with the normalizing condition there exists a sequence of input states M satisfying the condition (7).
Due to this relation, we can reduce the analysis on limiting distributions to the analysis on wave functions on the interval [−1, 1]. That is, our problem results in Fourier analysis on the interval [−1, 1]. Proof: As the first step of the proof, we construct a function f ∈ L 2 ([−1, 1]). for a given sequence of input states M := { a n }. For this purpose, we define a function f ∈ L 2 ([−1, 1]) by , where x k := 2k−n n+1 . In the following, the set of the above L 2 functions is denoted by L 2 n . The parameter z n = n(θn−θ) 2 can be replaced by the parameter y n = (n+1)(θn−θ) 2 because the ratio yn zn → 1. Since When f n goes to a function f ∈ L 2 ([−1, 1]), since y n+1 as n goes to infinity, the distribution of the normalized outcome y = (n+1)(θn−θ) 2 (= y n ) convergences to the distribution P f (dy). If f n does not converge, by replacing the sequence {f n } by a converging subsequence {f n k }, i.e., f n k → f , we can show that the distribution of the normalized outcome converges the distribution P f (dy).
Next, we prove the opposite argument. For a given function f ∈ L 2 ([−1, 1]) satisfying the normalizing condition, we construct a sequence of input states satisfying (7). There exists a sequence of functions f n ∈ L 2 n such that f n → f . Then, we define the coefficient a 2π .

IV. VARIANCE OF LIMITING DISTRIBUTION
In the previous section, we have shown that limiting distributions P f of outcomes are acquired through Fourier transforms of wave functions f ∈ L 2 ([−1, 1]) they correspond to coefficients a (n) k of input states. In this section, let us seek the input state minimizing the variance by utilizing this fact. As is mentioned in Section II, optimizing the first-order coefficient of the variance is equivalent with minimizing the variance V Define the multiplication operator Q and the momen- That is, the problem is thus reduced to find the minimum eigenvalue of the operator P 2 . Its eigenvalues are π 2 m 2 (m = 1, 2, . . . ) and corresponding eigenfunctions are φ m (x) = 2 √ 2 sin πm( x+1 2 )/C m [20] where C m is a normalizing constant.
Note that a careful treatment is required for a function f ∈ L 2 ([−1, 1]) when the Dirichlet condition f (−1) = f (1) = 0 does not hold. In this case, the function f has a discontinuity at 1 or −1 as an element of L 2 (R). Hence, the variance f |P 2 |f is infinity. For example, in the case of f = 1 2 , the variance ∞ −∞ sin 2 y dy 2π diverges. In this case, the limiting distribution is obtained with the order 1 n 2 [1] while the mean square error goes to zero only the order 1 n , i.e., we have a quadratic speedup concerning limiting distribution, but no quadratic speedup concerning mean square error. This fact is closely related with the divergence of the integral ∞ −∞ sin 2 y dy 2π .

V. TAIL PROBABILITY OF LIMITING DISTRIBUTION
In the asymptotic statistics, the behavior of tail probability of the limiting distribution is one of the most important properties [21] because it provides the performances of interval estimation and the powers of the one(or two)-side test. Thus, we consider the tail probability of limiting distribution P φm .
In the statistics, the tail probability of the limiting distribution is often discussed [21]. So, we consider the tail probability of limiting distribution P φm . In the i.i.d case, the minimum tail probability and minimum variance among limiting distributions can be realized by the same Gaussian distribution. However, in our setting, the Gaussian distribution does not minimize the variance. Hence, it is not clear whether the minimum tail probability can be attained by the same distribution as the minimizing the variance.
Corresponding limiting distributions are acquired through the Fourier transforms: Thus, the tail probability of P φm decreases with the order O(y −4 ). In order to improve the tail probability, we focus on the well-known fact that the Fourier transform of a rapidly decreasing function is also a rapidly decreasing function. In our problem, the support of the original wave function f is included in [−1, 1]. Under this condition, f is a rapidly decreasing function if and only if f is smooth function. Note that a rapidly decreasing function does not decrease 'suddenly'. That is, the smoothness is an essential requirement. For example, the function φ m is not smooth at −1 and 1. In the following, we construct a rapidly decreasing wave function f whose support is included in [−1, 1]. In this construction, the smoothing at −1 and 1 is essential.
First, functions g 0 , g 1 , and g 2 are defined by Using these functions, we define a rapidly decreasing g 3 whose support is included in [−1, 1] by where C is the normalizing constant. As is checked numerically (See Fig. 4.), this function improves the tail probability. Now, we analyze the decreasing speed on the tail probability of P g3 ([−R, R] c ). Their Fourier transformations are where F (g 1 ) * F (g 2 ) is the convolution of F (g 1 ) and F (g 2 ).
When y is sufficiently large, Then, as is shown in Appendix B, we obtain Therefore, there exists a function f such that the tail probability of P f is exponentially small and the support is included in [−1, 1]. Note that, the above wave function g 3 does not minimize the variance V (f ). This fact tells us that the input state minimizing the variance is not optimal concerning the tail probability of the limiting distribution. That is, the optimal input state depends on the choice of the criterion.
Next, we consider the maximization of the probability P f ([−R, R]). For this purpose, we denote the natural projection Π R from L 2 (R) to L 2 ([−R, R]). By using the operator F R := F † Π R F , this probability has the form That is, our aim is the following maximization: This problem is equivalent with the calculation of the maximum eigenvalue of Π 1 F R Π 1 . Slepian and Pollak [15] showed that the eigenfunction ψ R of Π 1 F R Π 1 associated with the maximum eigenvalue is given as a solution of the linear differential equation, which is called prolate spheroidal wave function: where ξ(R) is chosen depending on the minimum eigenvalue. Slepian [16] showed that the maximum eigenvalue λ(R) of Π 1 F R Π 1 behaves as when R is sufficiently large. The numerical calculation of this minimum probability min is given in Fig. 4. Thus, the minimum probability That is, the minimum tail probability min f ∈L 2 ([−1,1]) P f ([−R, R] c ) goes to zero with exponential rate 2. This optimal value is attained when the input state is given by the eigenfunction ψ R of P 1 F R Π 1 associated with the maximum eigenvalue λ(R). Now, we numerically compare the functions φ 1 , g 3 , ψ 2 , and ψ 10 .
The density functions of the distributions P φ1 , P g3 , P ψ2 , P ψ10 are plotted in Fig.  3. Their tail probabilities are plotted in Fig. 4. The tail probabilities P ψ2 ([−y, y] c ) (thick dashed) and P ψ10 ([−y, y] c ) (thick solid) attain the minimum tail probability min f ∈L 2 ([−1,1]) P f ([−y, y] c ) only at 2 and 10, respectively. The distributions P φ1 and P ψ2 concentrate in the range [−2, 2], however, their tail probabilities are not decreasing as rapidly as those of the distributions P g3 and P ψ10 . This comparison indicates that the optimizations of the concentration and the tail probability are not compatible. That is, the distributions of the Fourier transforms of the functions g 3 and ψ 10 have a small tail probability (Fig. 3). These functions are smooth at −1 and 1. That means, we have checked that the smoothness is closely related to the tail probability.
Since the function R → P f ([−R, R]) is a strictly monotone increasing function, the inverse function β → R(β) is a strictly monotone increasing function. Thus,   Probability density function minimizing variance P φ 1 (thin dashed), probability density function improving tail probability P g 3 (thin solid), probability density function P ψ 2 (thick dashed), and probability density function P ψ 10 (thick solid).
Further, the LHS coincides with for any real number a.

VI. INTERVAL ESTIMATION
Now, we treat the phase estimation problem with the interval estimation. In the interval estimation, given a confidence coefficient β, we estimate the confidence interval [L, U ], which the unknown parameter θ is guaranteed to belong to with the probability β. Here, since our parameter space is the torus R/2πZ, a careful treatment However, since it is quite difficult to treat this optimization with a finite n, we treat the following asymptotic setting as follows: This optimal value is attained when the input state constructed by the wave function ψ R(β) and the measurement is given by the covariant measurement (2) with the vector |t 0 . That is, there exists a pair of functions U and L such that |[L(ω), U (ω)]| ≤ 2R(β) and P M θ, a {ω|θ ∈ [L(ω), U (ω)]} ≥ β. The optimal input state depends on the choice of the confidence coefficient β.

VII. CONTINUOUS CASE WITH SINGLE COPY
Let us consider the phase estimation in the continuous case with single copy, in which by inputing the wave function f , we estimate the parameter θ in a group-covariant model ρ θ = e iθQ |f f | e −iθQ on the space L 2 (R).
It is known that when the shift-covariance condition is assumed for estimators, our estimator is restricted into the measurement of the observable P [23]. Then, the outcomeθ obeys the distribution P f , and the variance of the outcome is given by f |∆P 2 |f , which is abbreviated by ∆P 2 .
If we can input any wave function f , the variance can be reduced infinitesimally. Hence, it is natural to assume a constraint for input wave function f . Here, we assume that the potential is given as a monotone function of the absolute value |Q|. While we often assume a constraint for average potential, we consider a deterministic condition for potential. That is, the wave packet of f is assumed to exist only in the region where the potential is less than a given constant. In the following, for a simplicity for our analysis, we assume that the input wave function belongs to L 2 ([−1, 1]). Hence, the discussion in Sections IV and V can be applied to this problem.
Here, it is meaningful to consider the relation with the Cramér-Rao bound. It is known in general that the Fisher information J θ for a group-covariant model ρ θ = e iθQ |f f | e −iθQ is given by J θ = ∆Q 2 because the symmetric logarithmic derivative (SLD) is given by Q − Q [23].
Since the operator P has a commutation relation with Q, we have the Heisenberg limit ∆P 2 ∆Q 2 ≥ 1/4, which is equivalent with the Cramér-Rao inequality: Especially, if and only if f is a squeezed state satisfying ∆P 2 = c and ∆Q 2 = c −1 , the above inequality is achievable because its attainability is equivalent with that of ∆P 2 ∆Q 2 ≥ 1/4. Thus, if f is not a squeezed state, the Cramér-Rao lower bound 1 4 J −1 θ cannot be attained uniformly in the one-copy case. As is shown in the next section, our asymptotic case is essentially equivalent to the above group-covariant model under the restriction of suppf ⊂ [−1, 1].
The equality holds if and only if the wave function f is a squeezed state. However, since the support f belongs to [−1, 1], the equality of the above cannot be attained. This fact indicates that the Cramer-Rao approach does not yield the attainable bound in the estimation of unitary action even in the asymptotic formulation, while this approach generally yields the attainable bound in the estimation of quantum state. This point is the essential difference between the state estimation and the unitary estimation.

IX. CONCLUSION
As a unified approach to the asymptotic analysis on the phase estimation, we have treated the limiting distribution on the sequence of estimators because we can recover various asymptotic performance of the estimation protocols from the limiting distribution.
As the first step, we have found a one-to-one correspondence between a limiting distribution and a wave function on L 2 ([−1, 1]). That is, we have shown that any limiting distribution is given by the absolute square of the Fourier transform of a wave function f ∈ L 2 ([−1, 1]). Due to this correspondence, it is sufficient to optimize the distribution given as the square of the Fourier transform on L 2 ([−1, 1]).
As the next step, the minimization of the variance has been treated among the above distributions by treating the Dirichlet problem in the similar way as Buzek et al [4]. We have also considered its tail probability. In order to guarantee the small error probability out of the given interval, the limiting distribution is better to be rapidly decreasing. However, it has been clarified that the limiting distribution minimizing the variance is not rapidly decreasing. In order to construct such a limiting distribution, we employ a smoothing method so that we construct a rapidly decreasing function whose support is included in [−1, 1]. It has been numerically checked that this function improves the tail probability remarkably.
Further, the tail probability for a given interval has been minimized among these limiting distribution by employing the Slepian and Pollak's analysis on signal processing [15]. The optimal limiting distribution depends on the width of this interval. Using this optimization, we have treat the interval estimation in the asymptotic setting.
Next, we have treated the relation with the phase estimation in the continuous system with the one copy setting. In this case, the Heisenberg's uncertainly relation is equivalent with Cramér-Rao inequality. Using this relation, we have obtained the condition for attainability of Cramér-Rao inequality. Further, we have applied this relation to the asymptotic analysis on the variance of the phase estimation. Then, we have clarified that the Cramér-Rao bound cannot be attained in our framework.
Throughout these discussions, it has been clarified that the optimization of asymptotic phase estimation cannot be characterized by a single parameter while this problem can be characterized by the single parameter, i.e., the variance, in the state estimation of a single parameter model with a regularity condition due to the asymptotic normality [10,11,12]. This property is the biggest difference from the state estimation.
Indeed, a similar property can be expected in a general unitary estimation. It is a future problem to investigate the limiting distribution in the estimation of unitary operation in a more general case. Then, the estimation problem of U ′ θ can be reduced in that of U θ given in (1).
In the case of y < 0, we can show (11) by replacing y ′ by −y ′ .