On-Line Estimation of Local Oscillator Noise and Optimisation of Servo Parameters in Atomic Clocks

For atomic frequency standards in which fluctuations of the local oscillator (LO) frequency are the dominant noise source, we examine the role of the servo algorithm that predicts and corrects these frequency fluctuations. We derive the optimal linear prediction algorithm, showing how to measure the relevant spectral properties of the noise and optimise servo parameters while the standard is running, using only the atomic error signal. We find that, for realistic LO noise spectra, a conventional integrating servo with a properly chosen gain performs nearly as well as the optimal linear predictor. Using simple analytical models and numerical simulations, we establish optimum probe times as a function of clock atom number and of the dominant noise type in the local oscillator. We calculate the resulting LO-dependent scaling of achievable clock stability with atom number for product states as well as for maximally-correlated states.

(Dated: 16th July 2018) For atomic frequency standards in which fluctuations of the local oscillator (LO) frequency are the dominant noise source, we examine the role of the servo algorithm that predicts and corrects these frequency fluctuations. We derive the optimal linear prediction algorithm, showing how to measure the relevant spectral properties of the noise and optimise servo parameters while the standard is running, using only the atomic error signal. We find that, for realistic LO noise spectra, a conventional integrating servo with a properly chosen gain performs nearly as well as the optimal linear predictor. Using simple analytical models and numerical simulations, we establish optimum probe times as a function of clock atom number and of the dominant noise type in the local oscillator. We calculate the resulting LO-dependent scaling of achievable clock stability with atom number for product states as well as for maximally-correlated states.
The instability of frequency standards limits the total uncertainty achievable in a measurement of finite duration [1,2]. This limit can be practically relevant even when performing measurements of static frequency ratios, since many-month-long measurement campaigns place stringent demands on the reliability of all components in an experiment. Instability becomes a fundamental concern when attempting to measure time-varying frequency ratios. For instance, in the emerging field of chronometric leveling [3][4][5], direct observation of tidal fluctuations expected in the gravitational red shift [6] requires frequency ratio measurements with a fractional uncertainty at the level of 10 −18 to be completed in a matter of hours. Physics beyond the Standard Model might be detectable in clock frequency ratio measurements as postulated transient shifts associated with dark-matter domain walls [7] or ultralight scalar darkmatter candidates [8,9]. Searches for such signals require the highest possible measurement resolution at timescales where the statistical uncertainty due to instability plays a far greater role than long-term systematic uncertainty.
Of the noise processes contributing to the instability of atomic frequency standards, the most fundamental one is quantum projection noise [10], which arises from the discreteness in the measurement results obtainable from a finite number of atoms. For an ensemble of N uncorrelated two-level atoms, this noise imposes a minimum statistical uncertainty on any measurement of the phase accumulated in an atomic superposition state. For a standard operating at a frequency ω and in the ideal case of Ramsey interrogation without technical noise, this leads to a long-term fractional * Ian.Leroux@nrc-cnrc.gc.ca; Current Address: National Research Council Canada, Ottawa, Ontario, Canada K1A 0R6 instability [11] σ C (τ) = 1 ωT N T c τ (2) where T is the duration of a single Ramsey interrogation and T c is the length of the frequency standard's operating cycle, such that τ/T c measurements can be performed in an averaging time τ. This quantum projection noise limit (QPN) 1 for clocks using uncorrelated atoms depends on the experimenter's choice of probe time T , becoming arbitrarily small for sufficiently long probe times. Thus, Eqn. 2 sets no limit on achievable clock instability at long averaging times unless some additional scale in the problem restricts the choice of T . One such restriction is set by excited-state decay in the atoms, which sets a fundamental limit to interrogation times. The performance of optical frequency standards operating at this limit has been analysed in Refs. [12][13][14][15]. However, for many of the optical frequency standards now being investigated, frequency fluctuations of the local oscillator restrict T to less than a second even when the atoms' excited-state lifetime is measured in minutes ( 87 Sr) or even years ( 171 Yb + ) [1]. Because the local oscillator's noise is common to all the atoms in the standard, and because it typically exhibits significant power-law temporal correlations, its effects are qualitatively different from those of excited-state decay. In fact, it might at first glance seem odd that local-oscillator noise limits clock stability at all: the local oscillator frequency is in some sense the measurand in an atomic frequency standard and its fluctuations are constantly monitored and corrected. Local-oscillator noise affects the stability of the standard only to the extent that it cannot be corrected by feedback from the atoms. This can happen, for instance, if the cyclic atomic interrogation protocol allows undetected aliased frequency components of the local-oscillator noise spectrum to contaminate the output signal of the standard, a phenomenon known as the Dick effect [16]. Even in the absence of the Dick effect, however, the quantised measurement signal from the atoms has a fundamentally limited dynamic range: one cannot extract more than log 2 (N + 1) bits of frequency information from a single measurement of N atoms [17]. The useful domain of the measurement, i.e. the frequency band in which it can be unambiguously interpreted, must be broad enough to cover the frequencies which the local oscillator is likely to emit in the interrogation. Frequency excursions beyond the domain for which the reference provides useful information lead to less informative measurement results, and hence to degraded instability. In the worst case the servo, working from ambiguous or uninformative measurement results, may be unable to keep the output frequency locked to the atomic reference. The output frequency then either hops between different zero-crossings of a frequencyperiodic Ramsey error signal or drifts aimlessly far from the resonance of a Rabi error signal. This case is catastrophic and the operating parameters must be chosen to make it vanishingly unlikely. Thus, even in the absence of the Dick effect (e.g. with dead-time-free Ramsey interrogation [18]), the achievable measurement resolution ultimately depends on the scale of local-oscillator frequency fluctuations seen by the atoms, and hence on the performance of the clock's feedback loop which corrects these fluctuations.
In this work, we study the limits to the stability of frequency standards dominated by local-oscillator noise with realistic temporal correlations. We focus on clocks using a single ensemble of atoms periodically interrogated using the same protocol for every interrogation cycle, whose instability we quantify using the Allan variance at long times. Our work is thus less general, but more directly relevant to current experiments, than analyses of multi-ensemble clocks or of interrogation protocols which are modified on-the-fly [19][20][21][22], and our approach is a more concrete complement to the derivation of universal performance bounds in mathematically idealised settings [22,23]. Using simple analytical arguments and numerical simulations of clocks with different local-oscillator noise spectra, we study the performance of the servo controller which predicts and corrects local-oscillator noise and then analyse its implications. After establishing notation and conventions in Sec. I, we begin by deriving the optimal linear prediction algorithm and evaluating its performance in Sec. II. In Sec. III we show that a feedback controller with nearoptimal performance can be designed without prior knowledge of the noise spectrum, by monitoring the error signals in normal clock operation. We also show that the same techniques provide useful diagnostic information on the local oscillator's noise, allowing on-line monitoring of its performance. We turn to the effects of the noise in Sec. IV, in which we derive a modification to the QPN that takes into account the performance of the servo controller. This modified QPN formula predicts an overall limit to achievable clock instability, which is attained for an optimal choice of atomic interrogation time that we discuss in Sec. V. Sec- A local oscillator (LO) emits a signal with a fractional deviation x from the nominal clock frequency. The servo controller attempts to predict x and corrects the frequency by its prediction h. The corrected signal (with fractional frequency deviation x − h) is then used to interrogate an atomic reference, which produces an estimate e of the prediction error or, equivalently, an estimate y of the LO's unknown frequency deviation x. These estimates can be used by the servo in future predictions.
tion VI considers the merits of using entangled atomic states to modify the phase resolution of Eqn. 1, giving a simple dimensional argument for the disappointing performance of maximally-correlated states in atomic clocks and arguing for the superiority of states that enhance the dynamic range of atomic measurements [17,[24][25][26]. This result is complementary to that of Ref. [12], which considered independent dephasing of the atoms rather than the collective dephasing associated with the LO, and takes into account temporal correlations in the LO noise rather than assuming a white spectrum as in Ref. [20,27]. Section VII considers the instability of the clock at short times, which may be limited by finite feedback gain rather than measurement noise, showing that the second integrator recommended in Ref. [15] to correct for linear drift of the LO is also necessary to saturate the QPN limit in the presence of random-walk noise. We conclude with some remarks on proposed frequency standards whose design does not follow the conventional pattern we consider in this work. Figure 1 sketches the structure of the frequency standards we consider and summarises the notation we will use for the various signals we must consider. The goal of the standard is to produce a continuous classical oscillatory signal whose frequency corresponds to a reference transition frequency ω in a particular atomic species. As the classical signal is generated by a macroscopic local oscillator (LO) subject to environmental perturbations, its frequency ω L will differ from the target frequency ω by a fluctuating fractional discrepancy x = (ω L /ω) − 1. The scale of these fluctuations is summarised in the Allan deviation σ L (τ) of the LO. In order to suppress these fluctuations, a servo controller generates a prediction h of the LO frequency error, which is used to frequency-shift the LO output signal back to atomic resonance. The resulting signal, with net fractional frequency error x − h, is provided to the users of the standard and has an Allan deviation σ C (τ). This corrected signal is supplied to a reference, where it interacts with N atoms according to some fixed interrogation protocol, such as Ramsey or Rabi interrogation. Measurement of the atoms' state at the end of the interrogation protocol conveys some information on the residual frequency error x − h, which we express as an error estimate e. The error estimate might, for instance, correspond to the imbalance of atomic state populations at the end of a Ramsey sequence divided by the accumulated phase ωT . For consistency, we express the error estimate in the same units as x and h, so that y = h + e is an estimate of the LO's uncorrected frequency error x, one that uses only the most recent atomic data and takes no account of previous measurements. Note that y and e differ from, and fluctuate more than, x and x − h respectively, because they are affected by the noise of the atomic reference.

I. SETUP AND NOTATION
We consider periodically-stabilised frequency standards with an operating cycle of period T c , where the reference provides a series of error estimates {e i }. At any given point in time, we label them as follows: e 1 is the most recent available error estimate, e 2 the preceding error estimate, and so forth. e 0 is then the error estimate which will be produced at the end of the current operating cycle. We label the other signals similarly: h 0 is the servo's prediction of the (average) LO frequency error x 0 during the current operating cycle, while h j and x j correspond to the j th most recent completed cycle. Causality requires that the servo compute the prediction h 0 without knowledge of e 0 , using only {e 1 , e 2 , . . .} or, equivalently, {y 1 , y 2 , . . .}.
The Allan deviation σ C (τ) is that of the physical signal produced by the frequency standard as it is operating. With the exception of Sec. VII, most of the analysis presented in this work also applies to "paper clocks", i.e. virtual signals generated by post-processing measurement data. Although the post-processing need not respect causality and can use later measurements to correct estimates of the frequency at earlier times, the quality of the measurements themselves still depends on the ability of the (causalityrespecting) servo to keep the corrected LO frequency near the atomic resonance frequency ω while the clock is running, and constraints on this ability affect the performance of the reference no matter how the resulting data is subsequently used.
Where it is necessary to assume a definite interrogation protocol in the atomic reference, we will focus on deadtime-free Ramsey interrogation, where the measured signal depends on the average of the corrected signal frequency during some interrogation time T . While we assume in our examples that T , which sets the frequency resolution of the interrogation, is equal to T c , which sets the repetition rate of the interrogation cycle, the two times are conceptually distinct and we will use separate symbols for them throughout. References whose operating cycle includes dead time (T < T c ) or which use a different interrogation protocol (such as Rabi or hyper-Ramsey [28,29]) will suffer from the Dick effect, which can be modelled as additional measurement noise in the atomic reference.
In numerical examples we will consider LOs with simple power-law noise, such that σ 2 L (τ) ∝ τ µ , with µ = −1, 0, 1 for white frequency noise, flicker frequency noise and random walk of frequency noise, respectively. As argued in the introduction, the LO noise gives the problem a characteristic time scale which ultimately limits the useful resolution of measurements on the atoms. We define this time Z , without assuming a particular form of LO noise spectrum, by the implicit equation where Z c is the cycle time of the clock when operated with a probe time Z . In other words, Z is the choice of probe time for which the LO Allan deviation at one clock cycle is as large as the quantum projection noise of a single atom (Eqn. 2 with N = 1). This definition lets us combine the LO noise and the choice of probe time into a single dimensionless parameter T /Z which can be compared between clocks of different types using LOs with different performance. Note that Z will be on the order of a few seconds for a typical current optical frequency standard with a fractional LO instability around 10 −16 .
In the remainder of this paper, we will have frequent recourse to Monte-Carlo simulations of clocks. Because our model assumes a fixed interrogation protocol, it is possible to predetermine the start and end times of every radiation pulse in a simulated run of the clock, and thus to generate efficiently the (noisy) mean frequency of the free-running LO during each pulse. Given such a frequency history for the free-running LO, it is straightforward to simulate the response of the atomic reference at each clock cycle and the resulting servo correction for the next clock cycle. White noise is generated as a random variable whose variance scales inversely with the duration of each pulse. (Damped) random walks are obtained by first generating the frequency at the beginning and end of each pulse as a (damped) running sum of steps whose variance depends on the time step length, then computing the expectation value of the mean of the random walk in each pulse given fixed start-and endpoints, and finally adding a white noise component corresponding to the dispersion of the mean about this expectation value. Flicker-frequency noise is generated as a sum of damped random walks with damping time constants ranging by factors of 2 from 1 % of the shortest pulse in the clock's operating cycle (the shortest time scale in the problem) up to 100 times the duration of the entire run (the longest time scale in the problem).

II. SERVO CONTROLLER DESIGN
We now focus our attention on the servo. Given a history . . . , h 3 , h 2 , h 1 of its own past predictions and of the corresponding error signals . . . , e 3 , e 2 , e 1 obtained from the atomic reference, it must make a prediction h 0 of the LO frequency in the next operating cycle. The prediction should take into account the temporal correlations of the LO noise, which dictate the timescale over which past measurement results remain relevant to predicting future LO behaviour.
We begin by considering the simple integrator, the basic building block of the servo algorithm used in most contemporary optical frequency standards [15]. In our notation, the simple integrator makes the prediction where g is a dimensionless gain specifying the fraction of the frequency error measured in the last cycle to apply as a correction to the last prediction. The prediction can also be expressed in terms of past estimates of the LO's fractional frequency deviation {y k } as follows: While Eqn. 4 is easier to implement, Eqn. 5 is easier to reason about because the statistical properties of the estimated LO frequency y are mostly determined by the LO noise and by the measurement noise of the reference, depending only weakly on the design of the servo controller itself. To a good first approximation, then, we can take the fluctuations and correlations of the {y k } as given, and try to choose g so as to minimise the error of the prediction h 0 . It is instructive to study the broader class of linear predictors, whose predictions are weighted averages of past LO frequency estimates of the form where the weights w k are required to satisfy the normalisation condition k w k = 1.
The simple integrator of Eqn. 5 is a special case of a linear predictor, with w k = g (1 − g ) k−1 . The optimisation of such linear predictors has been studied extensively since the pioneering work of Wiener [30] and Kolmogorov [31] (see e.g. Ref. [32]). Here we derive the minimum-mean-squarederror predictor in a form similar to that used for ordinary kriging in geostatistics (see e.g. [33]). We begin by computing the mean squared difference between the prediction h 0 and the next frequency estimate y 0 : where we have collected the weights {w k } into a vector w and introduced the two-sample covariance matrix for the estimated LO frequency, whose entries are defined as Note that 〈(h 0 − y 0 ) 2 〉 is not the same as the mean squared prediction error 〈(h 0 − x 0 ) 2 〉, since it also includes the noise of the atomic reference which estimates that error.
Provided, however, that the atomic reference is unbiased, the same choice of weights will minimise either measure of noise, so we proceed to minimise Eqn. 9 and find that the optimum weights satisfy with λ a Lagrange multiplier that must be chosen to satisfy the normalisation constraint of Eqn. 7. Thus the optimal weights can be found by solving Eqn. 11 for w /λ and normalising the result, provided that one knows the covariance matrix C . If the noise properties of the components in the frequency standard are known, then C can be computed simply as the sum of matrices for each independent noise process. Explicit expressions for the C matrix associated with a known noise spectrum are provided in Appendix A.
As discussed in Sec. III, C can also be estimated, and the servo controller optimised, without prior knowledge of the system noise properties, using only data generated during normal clock operation. Although linear predictors with arbitrary coefficients are not difficult to implement following Eqn. 6, one can also use the preceding formalism to optimise the gain of conventional integrators. Appendix B derives an explicit, albeit cumbersome, formula for the optimal integrator gain given known noise model parameters. Alternatively, one can use Eqn. 11 to choose a vector of weights for a hypothetical linear predictor and then simply set the integrator gain to the leading entry of this vector g = w 1 . Simulations show that for common power-law noise processes, the resulting integrating servo performs almost as well as the optimal linear predictor, with a penalty of less than 10 % in the prediction variance. The formalism we have developed can thus be used to optimise the parameters of a conventional integrating servo algorithm, without requiring any modifications to an already-running clock experiment. As we will see, this optimisation can be performed even without prior knowledge of the experiment's noise characteristics.
It may be helpful to visualise the spectral response of optimised servos. Fig. 2 shows, for a few simple cases, the RMS magnitude of the prediction error caused by a frequency modulation of the LO at some frequency f . In the absence of servo correction (black solid line), the response is flat at low frequencies but falls off as 1/ f at high frequencies due to the averaging of the LO frequency within each interrogation cycle. Note that the servo prediction error vanishes at those frequencies to which the reference is insensitive: noise at these frequencies cannot be removed from the clock output, but it does not disturb the atomic reference and is therefore irrelevant as far as the servo is concerned. For a white-noise dominated system, the optimal controller has the lowest practical gain, or equivalently averages as much history as is available. The optimal spectral response function in this case looks essentially identical to  the black line. For pure random-walk noise, which falls off rapidly at high frequencies, it is worth increasing the sensitivity to high-frequency noise in order to obtain better suppression at low frequencies: the optimal controller is then an integrator with a gain g = 1.27 (blue dotted line) 2 . The power spectral density of random-walk noise ∝ f −2 combines with the ∝ f 2 (power) response of an integrator to yield a flat spectrum of contributions to the prediction error. In the intermediate case of flicker noise, the same flat spectrum could be achieved by a controller with a power response ∝ f , i.e. an amplitude response ∝ f . The optimal 50-term linear controller (green chain-dotted line) approximates this behaviour in the range of frequencies it can observe, from roughly 1/(100T c ) up to 1/(2T c ). At very low frequencies, corresponding to fluctuations slower than the 50-cycle memory of the controller, the response falls back to that of an integrator. A simple integrator cannot have a f amplitude response, so the best integrator for pure flicker noise (red dotted line) is more sensitive to fluctuations with periods of a few cycles. As a result it performs about 5 % worse than the more general linear predictor.
To gauge their impact on clock performance, we quantify the scale of the servo's prediction errors by the dimension- of the phase accumulated in the Ramsey interrogation of duration T . The variance v plays an important role in determining both the robustness of the lock to atomic resonance and the achievable long-term stability (see Sec. IV), so that it is worth studying its behaviour. The solid lines in Fig. 3 illustrate the performance of linear predictors, based on simulations of clock operation with between 1 and 10 4 atoms in the reference. As a function of the choice of probe time T /Z , and thus of the ratio between LO noise and measurement noise, one can distinguish three qualitatively different regimes. In the limit of large atom numbers and long probe times, the simulations approach an N -independent limit This is the scaling one expects for the case where the LO noise completely dominates the measurement noise of the reference, given the postulated power-law scaling of the LO noise with exponent µ (c.f. Sec. I) and the definition of Z in Eqn. 3. The proportionality constant ξ can be estimated from the simulations or derived using the formulae in Appendix A, and varies between 1 (for a white-noisedominated LO) and 2 (for a random-walk-noise-dominated LO). In the opposite limit of low atom number or short probe time, the (white) quantum projection noise dominates and the servo performance depends only on the number of measurements n which it averages in making its prediction, and thus v ≈ 1 N n (14) for sufficiently short probe times. For n > ∼ 20 this limit has no impact on correctly optimised clock operation (see Sec. V). Between the two limits considered above there is a trade-off between averaging many measurements to reduce the impact of measurement noise and considering only the recent measurements most relevant to the LO's current frequency. We know of no simple, accurate expression for the achievable servo performance in this intermediate regime, but the rough scaling that one would expect from the afore- does hold in simulations. So far, we have discussed the simple integrator and its generalisations. Practical frequency standards, however, must use a double-integrator to correct for steady drifts in the LO frequency [15]. With the addition of the second integrator, the servo predictions become Aside from its role in suppressing steady-state frequency errors with drifting LOs, the second integrator is necessary for the servo to have enough low-frequency gain to attain the projection noise limit in many-atom clocks, a point to which we will return in Sec. VII. However, as long as its gain g 2 is chosen low enough to avoid servo oscillations, the additional integrator has only a negligible impact on the variance of the prediction errors 3 . The controller can thus be designed by optimising a simple integrator or linear predictor as discussed above, and then adding the driftcorrection integrator with a gain g 2 ≪ w 1 = g . Besides minimising prediction variance, another desirable feature in practical servo controllers is robustness, the quality of remaining locked to the (correct) atomic resonance for long periods. In principle the two qualities are distinct, but we find empirically that for well-optimised servos they are tightly coupled. In simulations of clocks with a wide range of atom numbers (i.e. reference signal-to-noise ratios), we find that the rate at which a clock hops to different Ramsey fringes depends, for a given LO and a fully optimised servo, only on the prediction variance v. Suboptimal servos (such as integrators with incorrectly chosen gain) have both greater prediction variance v and a higher rate of fringe hops for a given v, so that they are less robust as well as noisier. We conjecture that the best servos are simultaneously the most robust and the least noisy, so that there is no need to choose between the two qualities provided that one can, in fact, find this optimal servo design.

III. ON-LINE SERVO OPTIMISATION AND NOISE CHARACTERISATION
In practice, the noise spectrum of the LO may not be known accurately. A significant benefit of the formalism presented in the previous section is that it allows one to optimise the servo controller without prior knowledge of the LO noise properties. This is possible because the definition of C in Eqn. 10 involves only the estimated LO frequency error in each clock cycle, which is routinely recorded in normal clock operation 4 . As a demonstration of such optimisation, we have run clock simulations with integrating servos whose gains were chosen, without knowledge of the true LO noise, by the following empirical procedure: 1. Start by setting the gain to g = 0.2, an arbitrary but reasonable initial value chosen to allow reliable, if suboptimal, clock operation under a wide range of conditions.
3. Compute C according to Eqn. 10 and thence the vector w of optimal weights. Set g = w 1 5 .
4. Simulate the clock with the newly optimised servo and reoptimise, repeating as necessary.
Even when the servo gain is initially chosen blindly, we find that five rounds of optimisation suffice for the gain g to converge to a value that yields performance indistinguishable from that of an integrator designed with full knowledge of the LO noise spectrum. Under more realistic conditions, where the initial choice of servo parameters reflects some prior knowledge of the LO performance, the optimisation could be performed much more quickly. The symbols in Fig. 3 show the prediction variance of such empiricallyoptimised integrators, which can be compared to the performance of optimal linear predictors shown as solid lines.
For clocks operated near their optimal probe times (to be discussed in Sec. IV), the difference in prediction variance is less than 10 %. Thus, it is possible to develop controllers that take full advantage of the time correlations in the LO noise even without independent knowledge of those correlations. Unfortunately, it is not always possible to verify the servo performance directly, because the observed variance of the error signal e contains contributions both from the servo prediction error and from measurement noise of the reference. For a single-atom clock this problem is insurmountable: the observed fluctuations of a binary error signal must correspond to quantum projection noise, independent of servo performance. For a many-atom clock where the detection noise is well-characterised it is possible to measure the servo prediction variance as an increase in the fluctuations of the error signal, but the resulting estimates are generally optimistic. As discussed in Sec. IV, even in a correctly optimised clock there will be unavoidable ambiguities in interpreting the error signal (e.g. 2π phase slips in Ramsey interrogation) and the resulting measurement errors contribute to the servo prediction variance without being observable in the experimentally recorded measurement data.
One can, however, use the correlation matrix C estimated during clock operation to partially characterise the LO. Although white noise of the LO is indistinguishable from measurement noise in the atomic reference, flicker or random-walk noise can produce detectable temporal correlations even when their contributions to the total measurement variance are small. By fitting the estimated C to a linear combination of the correlation matrices expected for white, flicker and random-walk noise, one can obtain estimates of the Allan  variances associated with each class of noise process, which are simply the coefficients in the linear combination. Explicit expressions for the correlation matrices associated with arbitrary noise spectra are provided in Appendix. A. Fig. 4, for instance, shows the flicker and random-walk Allan deviations reconstructed in this fashion from a correlation matrix C estimated from 2 × 10 6 cycles of operation of a simulated single-atom clock, normalised to quantum projection noise, as a function of the true level of the respective noise processes. Random-walk noise, whose correlations differ more strongly from those of the white measurement noise, is easier to detect, but as seen in Fig. 4 both flicker and random walk noise can be reliably estimated from levels too low to affect the clock's instability (variance less than 1 % of projection noise) up to levels that would be unacceptably high in normal operation, when the clock servo is jumping between Ramsey fringes. Although this method provides much less detailed information on the LO noise spectrum than does the optical spectrum analyser of Ref. [34], it requires no measurements beyond those performed as part of the clock's normal operation. It can therefore be used to monitor the LO while the clock is running, even in a singleion frequency standard, providing an early warning of performance degradations as well as information useful for the optimisation of interrogation parameters in the atomic reference (see Sec. V).

IV. IMPACT OF SERVO PERFORMANCE ON LONG-TERM STABILITY
Any measurement performed on a finite number of atoms can yield only a finite number of possible results. The optimisation of the measurement protocol thus involves a compromise between fine resolution over a narrow usable domain of LO frequencies and coarse frequency resolution over a broader domain. The best compromise depends on the range of frequencies which might plausibly have to be measured by the reference, i.e. on the variance of servo prediction errors. In this section we study this compromise, showing that it leads to a finite optimal probe time and an overall limit on the long-term stability of clocks with noisy LOs.
By way of illustration, consider Ramsey interrogation of a single atom, where the probability to find the atom in the excited state depends on the phase error φ = (x − h)ωT as This excitation spectrum is shown as a solid line of Fig. 5. Let us assume that, before any measurement, our best knowledge of the (corrected) LO frequency is represented by the chain-dotted distribution, corresponding to the distribution of LO prediction errors. Our best knowledge after a measurement on a single atom, in the event that it is found in the excited state, is shown by the dashed probability distribution. In this case we can be certain that the phase was not near φ = −π/2 (since the excitation probability would then have been 0), and we can be reasonably confident that it lies in the region between 0 and π, but we cannot rule out that it lies near −π, where both the excitation probability and the prior probability distribution are nonnegligible. Varying the probe time, and hence the spacing of the Ramsey fringes, involves a trade-off: increasing T narrows the main lobe of the posterior distribution, thanks to the steeper slope of the excitation signal, but also increases the weight of secondary lobes due to other fringes of the Ramsey spectrum.
To quantify this trade-off more formally, we consider Ramsey interrogation of N uncorrelated atoms, taking the prior distribution (chain-dotted curve in Fig. 5) to be a Gaussian of variance v This distribution encodes, formally, all that is known about the LO frequency before the atomic measurement result becomes available. In a practical sense, it is the distribution of servo prediction errors: if the servo prediction were perfect (h = x) then the phase accumulated in the Ramsey interrogation would be zero. The ansatz of Eqn. 20 thus amounts to an assumption that the servo prediction errors are normally distributed. Although our simulations show some small deviations from the normal distribution, amounting to a negative excess kurtosis of a few percent with a single-atom reference that produces a binary error signal, the Gaussian ansatz is a surprisingly good approximation. As we will see, it leads to simple analytical results which agree well with more detailed simulations. The reference does not, unfortunately, supply us with the expectation value P e . Rather, the measurement yields a random fraction F of atoms detected in the excited state that fluctuates about the expectation value P e due to measurement noise of the atomic reference. In the absence of technical noise on the reference signal, the variance of F is where the averages 〈·〉 are taken over the prior distribution P (φ). The first term expresses the fluctuations in the measured excitation fraction due to actual changes in the φ-dependent excitation probability, while the second corresponds to quantum projection noise of the binomiallydistributed excitation signal. The usefulness of the excitation fraction in estimating the frequency depends on the covariance of the two quantities which determines how much weight should be given to F in constructing the posterior estimate of the LO frequency.
Choosing the weight to minimise the variance v ′ of the error in this posterior estimate, we find (c.f. Appendix C): This posterior variance combines information from the measurement with information that was known beforehand. In order to isolate the contribution of the former, we define an effective measurement variance v m by This is the usual relation for the variance of a (posterior) estimate obtained by an optimal linear combination of two independent pieces of information. v m is thus the variance of a hypothetical measurement, one that could be interpreted without any prior knowledge, and which would reduce our uncertainty on the LO frequency as much as did the actual measurement. For the case we consider, where, in the second line, we have expanded the effective measurement variance in powers of the prior variance. The first term is the conventional QPN on the measurement of the phase, valid when v is small and the corrected LO frequency is known a priori to be well centred on the Ramsey fringe. The third term reflects the additional uncertainty arising when the corrected LO frequency can lie outside the range where the reference produces a meaningful result. This term is independent of atom number, and dominates the effective measurement variance as v approaches 1.
Replacing the standard phase variance of Eqn. 1 by the effective measurement variance in Eqn. 2 yields a new prediction for clock stability in the limit of large averaging time τ, one that accounts for the effects of limited prior information in each interrogation: To demonstrate the validity of the simplifying approximations made in our model, such as the Gaussian ansatz for the distribution of servo errors, we compare the zerofree-parameter prediction of Eqn. 29 (Fig. 6, solid lines) to the stability of clocks simulated without making those approximations (Fig. 6, symbols). Clocks with white-, flicker-, and random-walk-noise-limited LOs were simulated using Ramsey interrogation of uncorrelated atoms with no dead time (T c = T ). Each point in Fig. 6 is obtained from a simulation of 2 × 10 6 cycles of clock operation. The Allan deviation is computed for a time τ long enough that the instability has reached the asymptotic 1/ τ regime (corresponding to 2 × 10 4 cycles of clock operation), then rescaled to a fixed averaging time Z and normalised to a fixed noise level σ L (Z ) to obtain a dimensionless result that is comparable across systems. The graphs thus show the achievable long-term instability as a function of the choice of probe time. When the probe time is short, the Ramsey fringe is broad and v is small, the instability improves as 1/ T as conventionally expected. The improvement with increasing probe time stops either when the additional v-dependent terms in Eqn. 29 grow important or when the servo can no longer reliably lock the LO to the reference transition: the curves in Fig. 6 end when the fringe-hop rate reaches 1 per 2 million cycles. As N increases, quantum projection noise is reduced relative to LO noise and it becomes advantageous to reduce the probe time so as to be less sensitive to the latter.  Table II. Dashed line marks the limit for perfect phase estimation with no projection noise. The three graphs are, from top to bottom, for a white-, flicker-, or random-walk-dominated LO. Simulations ran for 2 × 10 6 clock cycles.
Thus, the optimal probe time gets shorter with increasing atom number, and the fully optimised clock instability does not scale as N −1/2 . The asymptotic scaling with N is given in Table I. For white noise, the most extreme case, the optimal probe time scales as N −1/3 in the large-N limit, leading to a N −1/3 scaling of the long-term Allan deviation. For flicker or random-walk noise, v falls off more steeply as the probe time is shortened (see Eqn. 13), so that the the optimal probe time is less sensitive to atom number and a scaling closer to the conventional QPN limit is obtained. In the absence of projection noise, i.e. in the limit N → ∞, the servo performance limit of Eqn. 13 combines with Eqn. 29 LO Noise Type Asymptotic Scaling of σ C (τ)ω Z τ  Table II. Recommendations for the choice of Ramsey interrogation time. The last column gives the optimal probe time in the limit of many atoms. There is nothing to be gained by probing longer than this time. It may be necessary to use shorter probe times to avoid fringe hops; a suggested safe upper bound on the probe time is given in the second column.
to yield a general measurement-noise-independent limit on clock instability, which we plot as dashed lines in Fig. 6. This limit arises solely from the unpredictability of the LO noise and from the finite domain over which the Ramsey error signal can be unambiguously interpreted. Strictly speaking, Eqn. 24 holds only if the estimated frequency error is a linear function of the measured excitation fraction. Since the excitation probability is a non-linear (e.g. sinusoidal) function of the LO frequency error, one might hope to do better than the estimated performance of Eqn. 29 by using a non-linear function to convert the excitation fraction to a frequency error estimate. Simulations show, however, that correcting for the curvature of the Ramsey fringe by estimating the accumulated phase as arcsin(2F −1) rather than simply 2F −1 has no significant effect on v or on the achievable long-term clock stability. One can understand this finding by noting that, when v is large enough that the curvature within a single Ramsey fringe is significant, the effect of unavoidable ambiguities such as the secondary lobe in Fig. 5 is much larger and dominates the posterior variance.

V. GUIDELINES FOR INTERROGATION PARAMETERS
To choose the operating parameters for a clock, one can in general use the formalism of Sec. II to predict the servo error variance v as a function of those operating parameters and Eqn. 29 to predict the resulting long-term stability, which can then be optimised. Table II, for example, provides recommended Ramsey interrogation times for clocks dominated by different types of power-law LO noise, expressed as multiples of Z . In the many-atom limit, the servo prediction errors become independent of the quantum projection noise and we can solve Eqn. 29 to obtain the asymptotically optimal probe time (last column of Table II). Increasing the probe time beyond this optimum always leads to an increase in effective measurement variance and long-term instability, and is of no practical interest. At small atom numbers, shorter probe times are required to keep the servo controller robust against fringe hops. The purely phenomenological bound in the second column of Table II is chosen to be slightly shorter than the time for which we observe fringe-hops at a rate of 1 per million simulated clock cycles, with a 20 % safety margin. Our choice of maximum acceptable fringe-hop rate, corresponding to a requirement that the clock remain locked to the correct fringe for a few days, is arbitrary, but as the onset of fringe-hopping is extremely steep (the fringe-hop rate in simulations increases by two to three orders of magnitude when the probe time is doubled), the maximum safe probe time is only weakly dependent on this choice of threshold. A full optimisation of all common probe protocols in the presence of realistic experimental imperfections is beyond the scope of this work, but we expect qualitatively similar behaviour from Rabi or hyper-Ramsey probing, with somewhat longer optimal probe times and slightly degraded instability due to the increased width of the observed atomic resonance in theses schemes. Conversely, we expect that clocks with significant dead time in their operating cycle will need to use somewhat shorter probe times to compensate for the servo's inability to correct unobserved LO frequency fluctuations, which will lead to a v higher than in our dead-timefree simulations.

VI. CONSTRAINTS ON THE BENEFITS OF ENTANGLED ATOMIC REFERENCES
The arguments developed in the preceding sections also apply to certain Ramsey-like protocols using entangled states in the atomic reference, provided that LO noise is the limiting form of decoherence. For instance, the scheme proposed in Ref. [35] and demonstrated in Refs. [36][37][38], employing N atoms in a maximally correlated state [|ψ〉 = |g 〉 ⊗N + |e〉 ⊗N / 2, where |g 〉 and |e〉 are the atomic eigenstates], is fully equivalent to Ramsey interrogation of a single atom with an N -fold enhanced transition frequency by the corresponding harmonic of the LO radiation. Now the long-term instability of a single-atom frequency standard can be expressed as with s a dimensionless constant of order unity encoding the choice of probe time T /Z and the additional contribution of LO noise at this probe time. The long-term instability of the clock using maximally-correlated atoms thus becomes where Z N is the noise timescale for the N -fold frequencymultiplied LO: In the dead-time-free limit Z Nc = Z N , one can compare Eqn. 32 with Eqn. 3 and find with µ again describing the time-dependence of the LO Allan variance (see Sec. I). The entangled clock's long-term instability thus scales as Thus, if the clock stability is limited by white LO noise (µ = −1), a reference using a maximally-correlated state of N atoms performs no better than a reference using a single atom, and is in fact worse than a reference using uncorrelated interrogation of the N atoms. This scaling has been observed experimentally for correlated magnetic field noise in a 14 ion GHZ state [38]. For flicker-floor LO noise, the Allan deviation improves as N −1/2 with N maximally-entangled atoms, very slightly better than the asymptotic N −5/12 scaling achievable without entanglement, but worse than the scaling achieved with unentangled atoms for N < 10 2 . It is only for random-walk LO noise that maximally entangled states offer measurable benefits, with an N −2/3 scaling of the long-term Allan deviation. We illustrate these scalings in Fig. 7, which plots the Allan deviation spectrum recorded in simulations of fully optimised clocks using either 100 uncorrelated atoms or a 100-atom maximally-correlated state for all three LO noise types. Maximally entangled states reduce the signal-to-noise ratio of a measurement on N atoms to that of a single qubit, providing less new information per measurement but accelerating the clock cycle so that more measurements can be averaged. That is why their use is advantageous with random-walk LO noise, when fast measurements can take advantage of the reduced LO noise at short time scales. Other approaches to the use of entanglement in atomic references, such as spin squeezing [24][25][26][39][40][41], focus instead on improving the signal-to-noise ratio of the measurement, and thus increasing the amount of new information obtained in each interrogation. The error signal produced in such schemes has the same periodic ambiguities as in Ramsey interrogation of uncorrelated atoms, and so they are subject to the same projection-noise-independent sinh v − v ≈ v 3 /6 limit on their effective measurement variance (c.f. Eqn. 28). The additional noise introduced when servo prediction errors allow the anti-squeezed quadrature to contaminate the measurement result [42] can in principle be eliminated by a suitable readout procedure [27] in which case we expect such interrogation protocols to offer benefits comparable to suppressing the projection noise by increasing atom number, even with white-or flicker-floor limited LOs.

VII. SOME REMARKS ON SHORT-TERM INSTABILITY
So far we have focused on the long-term instability of the clock once it reaches the asymptotic σ C ∝ 1/ τ regime, without considering the averaging time required to reach this regime. In general, a clock reaches its asymptotic instability when the fluctuations in the frequency of the output signal are dominated by the measurement noise of the atomic reference. For single-ion clocks in which the signalto-noise ratio of the atomic measurements is no better than 1, this condition is reached at the servo attack time τ 1 , as soon as the output signal of the clock stops following the free-running LO and is locked to the noisy signal from the atomic reference. However, clocks using many atoms have a much higher signal-to-noise ratio, i.e. the resolution of the error signal from their atomic reference is much finer than the LO frequency fluctuations that they can reliably measure. In such clocks the optimal probe times are long enough that the quantum projection noise is well below the LOlimited short-term instability. In order to reach the asymptotic regime they must initially average down faster than 1/ τ. This is possible provided that the servo has enough gain to suppress the measured LO frequency fluctuations. A single integrator can suppress measured LO fluctuations by a factor ∼ τ/τ 1 in the standard deviation 6 , so that the clock instability initially averages down as τ −3/2 or τ −1 for white or flicker LO noise respectively. This is fast enough to reach the measurement-noise limited regime in an averaging time of roughly N τ 1 or N τ 1 for a white-noiselimited or flicker-floor limited LO respectively, as seen in the first two graphs of Fig. 7. However, a single integrator can only suppress random-walk LO noise to a level that scales as τ −1/2 , which will not catch up with the measurement noise limit which is averaging down at the same rate. A many-atom clock using only a single integrator would thus be forever limited by the finite gain of the servo rather than by the noise of the atomic measurements. It is only when a second integrator allows the servo to suppress noise by an additional factor of τ/τ 2 that the clock instability can average down as τ −3/2 until it reaches the measurement noise limit in a time N τ 2 . The third graph of Fig. 7 illustrates this behaviour, with the uncorrelated 100-atom clock initially averaging down at the servo-limited rate of τ −1/2 until around τ 2 ≈ 45Z . The second integrator then allows the instability to catch up with the lower-lying asymptotic noise limit, which it reaches around τ ≈ 400Z . It is interesting to note that clocks using maximally-correlated states, because they behave like single-atom clocks and are always measurement-noise limited, would have an advantage in short-term instability even when their long-term instability is little better than that of a clock with uncorrelated atoms (lower two graphs of Fig. 7). This observation mirrors, in a simpler setting, the finding of Ref. [20].
Thus the second integrator in a clock servo, beyond its role in correcting for linear drifts, is also needed to suppress random-walk noise of the LO in many-atom clocks. It is desirable to set the gain g 2 of this drift-correction integrator as high as possible, in order to reach the asymptotic instability in a reasonable time. However, it must not be so high that it induces oscillations in the lock. With a conventional twostage integrating servo, the ratio of the two gains must be no more than a few percent (we use g 2 = g /50 in our simulations). Linear predictors optimised as in Sec. II are somewhat more robust against oscillations, and can be operated with higher gain g 2 = w 1 /10 for the drift-correction integrator.
When post-processing measurement results to generate a virtual "paper" clock signal, the causality requirements which limit the gain of the servo during physical clock operation no longer apply. Thus, while the long-term stability limits discussed in Sec. IV hold equally for physical and paper clocks because they arise from limits on the noise of the atomic reference, the short-term stability limits discussed in this section can be avoided entirely in paper clocks and frequency ratio measurements, where the LO frequency fluctuations can always be corrected as well as they can be measured.
More abstractly, this section can also be understood in terms of the difference between steering the clock's frequency and steering its accumulated phase (i.e. indicated time). The asymptotic limit of Eqn. 29 corresponds to an unavoidable random walk of phase due to the undetectable and uncorrelated frequency measurement errors of the atomic reference. To reach it, one must first correct the clock's output for all the detected LO frequency errors, which dominate the short-term instability in multi-atom clocks. Within our model this is done by the servo, the only component of the system with memory, and thus the only component capable of remembering and correcting past phase errors: an Allan deviation averaging down faster than 1/ τ indicates that the servo is steering phase rather than simply locking frequency. This can happen only slowly, however, as it must not interfere with the servo's primary task of keeping the LO frequency near atomic resonance so that the reference continues to yield informative measurement results. In clocks where separate corrections are applied to the output signal and to the signal used for atomic interrogation, the latter can be kept on resonance while the former's phase is corrected as fast as possible (even preemptively in the case of a paper clock), thus minimising short-term fluctuations in the timing error.

VIII. OUTLOOK
In this work we have studied the effects of LO noise on frequency standards that monitor the LO frequency using a single ensemble of atoms periodically interrogated according to a fixed protocol and that correct the measured frequency fluctuations using a linear prediction formula. Most current optical atomic clocks fit this description and can, without hardware modifications, use the framework presented here to identify and approach the stability limit imposed by their LO performance. The interrogation times we recommend are specific to dead-timefree Ramsey interrogation, but qualitatively similar results for other (Rabi, hyper-Ramsey) protocols can be found by the same arguments, since our treatment of the servo is protocol-independent and since all interrogation protocols face the same trade-off between measurement resolution and unambiguous measurement domain. Within this framework, the most promising approaches to improving long-term clock instability (besides improving LO performance) seem to be those that improve the dynamic range of atomic measurements (such as spin squeezing), whereas methods which attempt to make faster measurements with poor dynamic range (such as spectroscopy with maximallycorrelated states) have been shown to offer modest or no benefits for realistic LO noise spectra.
There are, however, many architectures for frequency standards that do not fit the framework presented here, and it would be interesting to consider which of them can overcome the limits we have identified. The simplest extension to implement would be the use of non-linear prediction algorithms, which might improve the robustness of the servo, allowing longer probe times and better stability at small atom number. We expect that the performance of such algorithms would still be subject to the measurement-noiseindependent limit of Eqn. 13, so that they are unlikely to offer more than a modest constant-factor stability improvement in the large-N limit.
Proposed multi-ensemble or cascaded clocks [19,20] circumvent the limits we have discussed here by monitoring the LO noise with several different atomic references with progressively finer resolution. References with a broad domain of useful frequencies provide coarse-resolution results sufficient to narrow the prior v for other, finer-resolution references. The analysis we have presented here applies directly to the first (coarsest) reference in the cascade, and the resulting stabilised signal can then be treated as an effective LO used by the next reference in the ensemble, thus proceeding step-by-step down the cascade. However, even if our analysis is locally valid for every reference treated individually, the overall behaviour of such a multi-ensemble system may be qualitatively different than would be naively expected from the single-ensemble analysis [19,20,22].
Finally, it would be interesting to make an analogous study for continuously-interrogated atomic references [43][44][45][46][47][48][49][50]. Such systems, whether based on continuous spectroscopic observation of an atomic sample or on direct lasing on the clock transition ("active optical clock") [46,47] face a conceptually similar trade-off between suppressing the noise of the atomic signal (driving the system weakly to minimise the disturbance to the atoms) and suppressing classical fluctuations in the probe laser, cavity mirrors, etc. (driving the system strongly to gain information quickly and maximise the useful feedback bandwidth). Thus, the stability of these superficially different systems may depend on the noise of their classical components and on the size of the atomic sample in ways qualitatively similar to those we have examined here.
We thank S. King and R. Demkowicz-Dobrzański for stimulating discussions. I.D.L. acknowledges a fellowship from the Alexander von Humboldt foundation. The work presented here was partly supported through the project EMPIR 15SIB03 OC18. This project has received funding from the EMPIR programme co-financed by the Participating States and from the European Union's Horizon 2020 research and innovation programme. We acknowledge support from the DFG through CRC 1128 (geo-Q), project A03 and CRC 1227 (DQ-mat), project B03. Appendix A: Fluctuation correlation matrices C for common noise processes In order to reconstruct LO noise properties from the experimentally observed correlation matrix C , it is helpful to have explicit expressions for the correlations induced by common power-law noise processes, which we summarise here.
Consider a continuous noisy process y(t ) with one-sided power spectral density S(f ), whose autocorrelation reads If we associate the discrete estimates y j with time averages over a clock cycle of duration T c such as would be measured by a perfect classical frequency counter, then the definition of Eqn. 10 reduces to where u is a dimensionless dummy integration variable. For a white-noise process [S(f ) ∝ f 0 ] of Allan deviation σ w , we find 2 1 1 1 · · · 1 2 1 1 · · · 1 1 2 1 · · · 1 1 1 2 · · · . . .
Note that C 11 is, by definition, twice the single-cycle Allan variance for the noise process under consideration. Also note that, while Eqn. A3 is valid only for perfect frequencycounting or dead-time free Ramsey interrogation, Eqns. A4 through A14, expressed in terms of observed Allan variances, are valid for arbitrary noisy time series of the specified power-law noise type, including frequency estimates made with arbitrary measurement protocols that may include dead time.
As in Sec. IV, we define φ such that 〈φ〉 = 0. The mean squared error v ′ which we wish to minimise is then v ′ =〈(ψ − φ) 2 The minimum is attained for Note that the weight β given to the latest measurement result in estimating the frequency of the LO decreases as the measurement becomes noisier (i.e. as var(F ) grows) or as the measurement becomes less strongly correlated with the underlying phase φ that we wish to estimate (i.e. as cov(φ, F ) shrinks). The minimum posterior variance given in Eqn. 24 is obtained directly upon substitution of the optimised weight β into Eqn. C4. For simplicity, we restrict ourselves to a linear combination of the measurement signal F with the prior estimate 〈φ〉, as this allows the results to be expressed entirely in terms of experimentally accessible (co)variances of noise distributions. The variance of the errors in non-linear estimators depends on higher-order moments of the noise distributions which are difficult to characterise experimentally. As argued at the end of Sec. IV, non-linear estimators are empirically unnecessary, at least for simple Ramsey-like protocols.