Bayesian quantum frequency estimation in presence of collective dephasing

We advocate a Bayesian approach to optimal quantum frequency estimation - an important issue for future quantum enhanced atomic clock operation. The approach provides a clear insight into the interplay between decoherence and the extent of the prior knowledge in determining the optimal interrogation times and optimal estimation strategies. We propose a general framework capable of describing local oscillator noise as well as additional collective atomic dephasing effects. For a Gaussian noise the average Bayesian cost can be expressed using the quantum Fisher information and thus we establish a direct link between the two, often competing, approaches to quantum estimation theory


Introduction
Modern atomic clocks allow for time keeping with instability better than 10 −15 , and are reaching towards new applications in geodesy [1] and tests of fundamental physics [2]. As of 2013 the best atomic clock have instability 10 −18 after 7 hours of averaging [3]. Crystal oscillators or stabilized lasers have excellent short-time frequency stability but their frequency ω LO (t) tends to drift due to variations of temperature or stress and it needs to be locked to a narrow atomic transition ω 0 , between two atomic levels |0 , |1 , in order to guarantee long-time stability, see Fig. 1. The resulting stability is limited by local oscillator (LO) noise and the atomic signal to noise ratio. A single estimation strategy have to be used periodically to determine the frequency offset ω(t) = ω LO (t) − ω 0 . Knowledge of the LO noise and the value of ω from the previous feedback cycle provides a natural prior, making Bayesian analysis well suited for a study of the optimal estimation protocol.
In a typical Ramsey interferometric scheme N atoms interact with an external electromagnetic field evolving at frequency ω, which for now is a time independent parameter. First, the atoms experience a π/2-pulse which transforms the ground state |0 of each one into the (|0 + |1 )/ √ 2 superposition. After evolving freely for a time t, they experience another π/2-pulse, subsequently being subjected to a measurement determining the number n 1 of atoms which made a transition to the excited state |1 . In the case of perfect synchronization, ω LO = ω 0 , all atoms end up in the |1 state, while in the presence of detuning the average ratio reads n 1 /N = cos 2 [ωt/2] [4]. The scheme is identical regardless of whether the ω 0 transition is microwave (Cs-fountain clocks) or optical (ion traps, optical lattices). The procedure estimates the frequency difference ω on the basis of the number of excited atoms detected and performs a feedback on ω LO , locking it to the atomic frequency ω 0 . Mathematically, the scheme is equivalent to an N-photon Mach-Zehnder interferometric experiment with a relative optical phase delay ϕ = ωt between the arms, which play the role of the two atomic levels [5]. Just as in the optical interferometry, fluctuation in the number of detected atoms gives rise to the shot noise 1/ √ N scaling of the frequency estimation precision. Shot noise scaling is a direct consequence of lack of correlations among the atoms. If the atoms were prepared in a correlated quantum state, such as a spin-squeezed or a GHZ state, the 1/ √ N limit could be beaten and the 1/N Heisenberg bound approached, at least in idealized decoherence free scenarios [5][6][7][8][9][10]. While quantum enhanced sensing ideas have proved useful in a number of practical applications [11][12][13][14], it has been observed that in the presence of losses [15][16][17][18][19][20], dephasing [21][22][23] or other decoherence processes [24,25] the maximum achievable quantum enhacement is limited. The ultimate goal of this type of research is to study the performance of optimal strategies in which probe states, measurements and inferring strategies are optimized in a given sensing protocol [26,27]. The resulting optimal sensing precision may be then regarded as fundamental, i.e. imposed by the laws of nature. Powerful tools are available for the effective identification of these limits [28][29][30]. A number of papers have studied the effectiveness of quantum strategies in Figure 1. a) Basic scheme of atomic clock operation. LO coupled to an atomic reference generates an error signal on frequency difference ω = ω LO − ω 0 which is used to keep the LO locked to the atomic transition. b) Prior distribution of ω spreads while atoms are being interrogated due to LO instability. Estimation and a feedback correction procedure help to keep the ω distribution narrow. frequency estimation [21,23,31,32] or, more specifically, in atomic clock performance [33][34][35][36][37]. These approaches, however, lacked generality. Some ignored the role of prior frequency distribution or the presence of decoherence. Others studied a particular estimation scheme. The aim of this paper is to propose a simple, general and effective Bayesian approach to the study of general quantum enhancement schemes in frequency estimation. We show that finding the optimal estimation scheme in our setting has equivalent complexity as optimizing quantum Fisher Information (QFI) [38,39] and in the case of Gaussian priors we establish a direct link between these two approaches. We also argue that the Bayesian approach is more fundamental in the study of the limitations on the performance of quantum enhanced atomic clocks -a point of view shared by other authors [33,36,40,41].

Cramér-Rao bound approach
In the case of a probe state ρ ω with an encoded parameter ω to be estimated, the standard quantum Cramér-Rao bound (CRB) states that, regardless of the measurements and unbiased estimators used, the estimation variance ∆ 2 ω is lower bounded by where F is the QFI, { , } is the anticommutator and L ω , implicitly defined above, is the symmetric logarithmic derivative of ρ ω . When estimating frequency the probe state will typically be: where ρ is the interferometer input state, t is the interrogation time, H is the generator of the unitary evolution encoding ω, while Λ t represents decoherence processes. In this case, since dρω dω = it[ρ ω , H], F (ρ ω ) does not depend on ω and reads: where λ i , |i are the eigenvalues and the eigenvectors of Λ t (ρ) respectively. The optimal strategy is obtained by maximizing F leading to the optimal ρ, t and the optimal projective measurement given by the eigenbasis of L ω . Although applied fruitfully in many realistic metrological scenarios, including lossy interferometry [17,19,20,28] and noisy frequency estimation [23,28,31,32], CRB approach suffers from a number of drawbacks. CRB is not saturable in general. It is only guaranteed to be saturable in special cases including Gaussian models, or in a repeated independent experiment framework (for a large number k of experiments it is possible to achieve ∆ 2 ω ≈ 1 kF ) [42]. Moreover, since QFI is a local quantity-for a given ω it only depends on ρ ω and its first derivative-it completely ignores any possible ambiguities in a reconstruction the frequency value from a phase value that may arise for a sufficiently broad prior parameter distribution.
In frequency estimation the interrogation time t is a controllable parameter subject to optimization. If t is large, even a very narrow prior distribution in ω may result in a phase ϕ = ωt distribution outside a local regime, or even broad enough on [0, 2π] interval for reconstruction ambiguities to become relevant. Validity of the local regime, where CRB based conclusions hold, cannot be a priory assumed. It depends on all the details of estimation scheme, the LO noise, the interrogation time and the initial state. In contrast, the Bayesian approach allows for full control of all the issues raised above, and any optimal scheme derived in these framework can be used as a universal benchmark. In certain parameter regimes the Bayesian approach may produce results compatible with the QFI approach but this is not the case in general. The Bayesian approach has been present in quantum estimation literature from the beginning [38]. Nevertheless, strict Bayesian approach is often computationally challenging and rigorous solutions are scarce and limited to decoherence-free scenarios [9,43]. In this paper we show that the frequency estimation problem described above is efficiently solvable within the Bayesian framework even in the presence of arbitrarily time-correated collective dephasing processes.

Bayesian approach
Let p ω be the prior distribution and ρ ω the evolved probe state. Without loss of generality we assume dω p ω ω=0. The state ρ ω is subject to a POVM measurement [42] {Π x }, Π x ≥ 0, dxΠ x = 1 1 and the parameter ω is estimated on the basis of a measurement result x using an estimator function ω x . For the optimal performance the average estimation variance should be minimized over ρ, {Π x },ω x , as well as, the interrogation time t. It was shown [44] (see also [38,Chapter VIII]) that the optimal measurement may be restricted to the class of standard projection von-Neumann measurements Π x = |x x|, x|x ′ = δ x,x ′ and the full information on the measurement-estimation strategy is contained in a single observable L = ω x |x x|dx. Optimization of L yields the minimal variance where ∆ 2 ω is the variance of the prior distribution p ω and the optimal L is implicitly given by Eq. (5), whereρ = dω p ω ρ ω ,ρ ′ = dω p ω ρ ω ω. A simple derivation of the above formula, together with a new elementary proof of its optimality, is given in Appendix A. The optimal estimator ω(x) equals the mean of the updated frequency distribution ∝ p ω Tr(ρ ω Π x ), which is the prior for the next experiment. For a fixed initial state ρ the optimization amounts to solving the anti-commutator Eq. (5) for L and is equivalent in complexity to a computation of QFI for a mixed state. Full optimization requires further optimization of ∆ 2ω over ρ and t. Interestingly, we have found that an iterative algorithm analogous to the one proposed in [43] is very effective. We begin with a random input state, and iteratively find the optimal measurements and corresponding states. The procedure converges to optimal solutions and in the cases we have studied outperforms the brute force optimization of the QFI allowing to obtain a solution in the number of particles regime where the brute-force optimization ceases to be practical on a standard PC (n 50). The convergence of the procedure has been analyzed in [45], and its efficiency has been additionally corroborated in optimization of phase estimation schemes in presence of local dephasing and loss [46]. See Appendix B for details of the implementation of the algorithm.
The similarity between Eqs. (5) and (1) becomes even more evident when considering a gaussian prior distribution p ω ∝ exp(−ω 2 /2∆ 2 ω). In this caseρ ′ = with F (ρ, Ht) defined in Eq. (3). Looking for the optimal probe states in the Bayesian approach is thus equivalent to maximizing QFI for a spread out stateρ.

Decoherence-free frequency estimation
Let us first consider an idealized estimation model, in which N two-level atoms are subject to unitary evolution e −iHωt with the generator is the projector on the excited state of the i-th particle. Without loss of optimality we may assume that the input probe state is pure ρ = |ψ ψ| and is supported on the symmetric subspace [33]. Let |n denote a symmetric state with n atoms in |1 , then |ψ = N n=0 c n |n and H = n n|n n|. For a Gaussian prior the averaged stateρ in the |n basis reads: where ρ nm = n|ρ|m = c n c * m . The results of the numerical minimization of ∆ 2ω with respect to |ψ as a function of interrogation time for different atom numbers are presented in Fig. 2, where we introduced the optimal variance reduction factor as R(τ ) with τ being the dimensionless time parameter. Minimal estimation variances achievable with non-entangled states are presented for comparison. The variances which correspond to the optimal interrogation times are depicted in the inset as a function of N. For the optimal states the curve slowly approaches the 1/N Heisenberg scaling, while for product states it is limited by 1/ √ N . For small times, τ ≪ 1/N, the GHZ states |ψ = (|0 + |N )/ √ 2 minimize R(τ ), as the effective prior phase distribution is narrow enough not to suffer from the characteristic GHZ 2π/N ambiguity. In this regime the exact formula for the minimal variance reads R(τ ) = 1 − N 2 τ 2 exp(−N 2 τ 2 ). For times τ ≫ 1/N the optimal states closely approach the "Sine" states introduced in [9,47], while in the intermediate regime the optimal states have a structure which interpolates between the GHZ and "Sine" states. Note also that there is no point in increasing the interrogation time too much, τ 1, as inferring a frequency value is becomes ambiguous and thus the overall estimation variance increases. Thanks to Eq. (6) relating the Bayesian cost and the QFI, one can get a detailed insight into the structure of the optimal states by invoking the results presented in [48] where the C-R bound approach in presence of global dephasing has been pursued (see e.g. Fig. 2(i) of [48]). One just needs to identify the collective dephasing parameter Γ 0 from [48] with τ 2 in our formulas. We should also add, that instead of using the exact optimal states, which may be difficult to deal with in practice, we have checked numerically that the optimal performance may be closely approached using appropriately prepared one-or two-axis spin squeezed states even though they structure differs significantly from that of the optimal states.s

Frequency estimation under dephasing
We now consider a model with time dependent ω which reflects the relevant aspects of atomic clock operation as depicted in Fig. 1. The dominant decoherence effects is the LO noise [3,34] and the collective dephasing of atoms [11,23,49]. Local dephasing of atoms [21] or loss [50] might also be relevant in certain atomic interferometry setups, but we neglect them here for the clarity of presentation. Let ω LO (t) be a stochastic process describing LO noise and ω(t) = ω LO (t) − ω 0 the corresponding detuning. ω LO (0) represents the LO frequency which stochastically drifted and was corrected during preceding feedback cycles. For a given realization of ω(t) the output state of the atoms after the interrogation time t is ρ t = U t ρU † t , where U t = e −iH t 0 ds(ω(s)+Ω(s)) , and Ω(t) is the stochastic process representing the non-LO sources of the collective dephasing of the atoms. Measurement of ρ t yields a result x with probability equal to Tr(ρ t Π x ). The estimatorω(x) is used to correct the LO frequency. The outcome frequency ω(t) −ω(x) has a variance where · denotes averaging with respect to the stochastic processes ω(t) and Ω(t).
Averaging over ω(t) corresponds to the average over the prior in Eq. (4). The optimal estimation strategy yields the same formula for the minimal variance as in the decoherence-free case above, Eq. (5), but now with ∆ 2 ω(t) = ω(t) 2 ,ρ = ρ t and ρ ′ = ω(t)ρ t .
When Ω(t), ω(t) are independent Gaussian processes with zero means, we obtain (see Appendix C for the derivation): where K X 2 (t) = 1 t) and K X (t 1 , t 2 ) = X(t 1 )X(t 2 ) is the two point correlation function of the process X(t). Again, making a connection to QFI we have Calculation ofρ can be viewed as averaging ρ with respect to a gaussian "effective prior" with variance It is clear that the optimal input state for Bayesian frequency estimation in the presence of decoherence is that which maximizes QFI after being spread out with the "effective prior". This implies that the solution in the presence of decoherence is in our case already contained in the solution of the decoherence-free case and therefore requires no additional numerical optimization. Using the variance reduction factor R(τ ) introduced for the decoherence-free case and depicted in Fig. 2, the formula for the minimal ∆ 2ω (t) in the presence of decoherence reads: As an example consider ω(t) = (ω LO (0) − ω 0 )e −γt + f OU (t). This is the Ornstein-Uhlenbeck (OU) process with the initial value ω LO (0)−ω 0 and the correlation function of the zero-mean stochastic term f OU (t) given by K OU (t 1 , t 2 ) = α[e −γ|t 1 −t 2 | −e −γ(t 1 +t 2 ) ]. We assume that ω LO (0) is normally distributed with ω LO (0) = ω 0 and is independent from f OU (t), hence: ∆ 2 ω(t) = K ω (t, t) = ∆ 2 ω(0)e −2γt + α(1 − e −2γt ), which for times smaller than the OU correlation time, t ≪ γ −1 , yields the diffusive character of frequency distribution broadening ∆ 2 ω(t) ≈ ∆ 2 ω(0) + 2γt[α − ∆ 2 ω(0)]. The OU process is not perfect in representing noise in real LOs such as lasers, where 1/f noise for low frequencies and white noise for larger frequencies is typical [34,[51][52][53]. It may, however, well approximate the real noise in the low frequency regime, with simple analytic formulas for the relevant quantities: K ω In order to take into account the white noise contribution present in real systems we represent it in the atomic dephasing process Ω with K Ω (t 1 , t 2 ) = βδ(t 1 − t 2 ), K Ω 2 (t) = β/t. We exclude the white noise part from the LO noise, as this would yield infinite variances ∆ 2 ω(t), but take it into account as a factor in atomic decoherence. This is consistent with the fact that high frequency white noise contributes to the tails of atomic Lorenzian line shapes without significantly affecting their full-width at half maximum [54]. Using variance as a measure of the width of the line is justified provided only low frequency noise is taken into account.
Treating [53] as a guideline we choose α = 1 Hz 2 , γ = 0.2 Hz so that the OU power spectrum approximates reality for small frequencies 10Hz and set β = 10 −3 Hz to represent the effects of high frequency white noise. The reduction of frequency variance as a function of the interrogation time and the number of atoms used is presented in Fig. 3. We took initial frequency variance ∆ 2 ω(0) = 0.167 Hz 2 so that the blue solid curve representing the optimal quantum strategy for N = 5 touches the ∆ 2 ω(t)/∆ 2 ω(0) = 1 line. This indicates that a stationary operation, where the spread due to frequency noise is compensated by estimation feedback, is possible with optimally entangled states of N = 5 atoms. It is also clear that using product states with the same number of atoms (blue dashed) or a smaller number of entangled atoms does not permit keeping frequency uncertainty at this level. For a given N varying t to minimize ∆ 2 ω(0) while keeping the stationary condition fulfilled yields the optimal stationary strategy. The inset depicts the minimal stationary ∆ 2 ω(0) as a function of the number of atoms for optimal (solid) and product states (dashed). The clear advantage of quantum over classical strategies is evident. We should also mention that formula (11) allows one to quantify the significance of the prior knowledge in comparison to decoherence effects. Specifically, when ∆ 2 ω(0) ≪ ∆ 2 ω K (t), the prior knowledge is not important and the optimal states are determined solely by the effects of decoherence and will be the same as in the standard QFI approaches [23,31,32].

Summary
To summarize, we have presented a consistent Bayesian approach to studying the potential quantum enhancement in frequency calibration. The tools presented are general enough to be able to deal with more sophisticated and promising atomic clock setups including many simultaneously evolving atomic ensembles [52,55]. However, a rigorous connection between our results and the performance of the actual atomic clock, requires the study of the Allan variance, instead of instantaneous variance, as a figure of merit. The Allan variance takes into account the effects of the correlations between subsequent interrogation steps, i.e. ω LO (0) and ω LO (t)−ω LO (0), which is more adequate in the quantification of the performance of atomic clocks but at the same time makes the study of the optimal Bayesian strategies much more involved. We hope to address this problem in the near future. Actually, some work has already been done recently in this direction, using bounds on precision rather than considering optimal estimation strategies [40] or taking a numerical approach based on the semi-definite programming [41].
First we prove that, without loss of optimality, we can restrict the class of measurements to standard projective von-Neumann measurements, so that x ∈ {1, . . . , d} where d is the dimension of the relevant Hilbert space and Π x = |x x|, x|x ′ = δ x,x ′ .
Let Π x , ω x be a measurement (not necessarily projective) and the accompanying estimator. We write the eigendecomposition of the corresponding L = dx ω x Π x as: where the eigenprojectors |y y| and the eigenvalues υ y can now be interpreted as new measurement operators and estimators respectively; we will refer to this strategy as the projective strategy. Below, we will prove that replacing the original strategy with the projective one can only reduce the estimation variance. Inspecting Eq. (A.1) we notice that information on measurement and estimation strategy enters the formula only through L and L 2 operators. Had the projective strategy been used, we would have obtained: In order to show that we can only benefit by replacing the original strategy with the projective one, it is sufficient to prove that L 2 −L 2 ≥ 0 as this implies Tr(ρL 2 ) ≥ Tr(ρL 2 ) and, consequently, the resulting estimation variance of the projective strategy is not greater than the original one. More explicitly, we need to demonstrate that: The above inequality, however, is a special case of operator generalization of Jensen's inequality [56] when applied to the operator convex function f (t) = t 2 . The operators Π x now play the role of weights in the convex combination.
Having proved the optimality of the projective strategy we can substitute L 2 = L 2 in Eq. (A.1) and, as result, face a simple problem of the minimization of: with respect to a single hermitian matrix L and no additional constraints on optimization. Explicit differentiation with regard to matrix elements L ij leads to: kρ ki L jk +ρ jk L ki − 2ρ ′ ji = 0, (A.6) which in compact notation results in Eq. (5). We note that the optimal estimator L can have both continues or discrete spectra. The former is the case for example for Gaussian states, where mean position is the parameter to be estimated in which case L is proportional to the position operator. We should also remark that since we have put no restriction on the allowed measurements, they could be the most general quantum measurements (POVMs) [42], this in particular includes the adaptive measurement strategies where measurements on part of the atoms depend on the results obtained from previous measurements on the the other part of the atoms, see. e.g. discussion in [57]. This implies that such adaptive strategies are not advantageous for the present problem as the simple von-Neumann measurement has been proven optimal.
Appendix B. Efficient iterative procedure for finding the optimal strategy numerically Using Eq. (5) allows one to find the optimal estimation strategy for a given input probe state ρ and the corresponding evolved state ρ ω = Λ ω (ρ). Finding the optimal strategy, however, also requires the optimization of the input probe state. The brute force optimization of Eq. (5) over ρ proves to be highly inefficient. Here we propose a heuristic iterative procedure that has proved to be extremely efficient in searching for the optimal strategy. We know that for a particular input probe state the corresponding optimal measurement and estimation strategy is given by the explicit formula in Eq. (5). The opposite is also true. Given a particular measurement and estimator the corresponding optimal input probe state can easily be found as is demonstrated below.
The heuristic procedure we advocate is now as follows. We start with a random input pure state ρ (0) = |ψ (0) ψ (0) | (or a state that one expects to be close to the optimal one, but preferably with some small random noise). Using Eq. (5) we find the corresponding optimal projective measurement strategy L (0) . We now calculate operator dωp ω Λ * ω (L (0)2 − 2ωL (0) ) (where L (0) 2 was replaced by L (0)2 since the measurement is projective), and find the eigenvector corresponding to its most negative eigenvalue. This way we obtain |ψ (1) . We repeat the procedure until we arrive at a satisfactory convergence. The results presented in Fig. 2 were obtained exactly in this way for Λ ω (ρ) = e −iHωt ρe iHωt and a gaussian prior p ω . This method can be applied to any problem involving optimization of the Fisher information with respect to input probe states and is described in detail in [45], where also the analysis of the convergence of the algorithm is given.