Optomechanical parameter estimation

We propose a statistical framework for the problem of parameter estimation from a noisy optomechanical system. The Cram\'er-Rao lower bound on the estimation errors in the long-time limit is derived and compared with the errors of radiometer and expectation-maximization (EM) algorithms in the estimation of the force noise power. When applied to experimental data, the EM estimator is found to have the lowest error and follow the Cram\'er-Rao bound most closely. Our analytic results are envisioned to be valuable to optomechanical experiment design, while the EM algorithm, with its ability to estimate most of the system parameters, is envisioned to be useful for optomechanical sensing, atomic magnetometry, and fundamental tests of quantum mechanics.


Introduction
There has been spectacular technological advances in the use of high-quality optomechanical oscillators for force sensing, enabling ultra-sensitive force measurements of single spin, charge, acceleration, magnetic field and mass [1][2][3][4][5]. Such advances have opened up the exciting possibility of experimentally studying quantum lightmatter interactions in macroscopic structures [6,7], hence paving the way towards new technologies for quantum information science and metrology [8][9][10][11].
Thermal and measurement noises impose major limitations to the accuracy of mechanical force sensors. Furthermore, while the development of higher quality and lower mass mechanical oscillators has played a central role in advancing the sensitivity of optomechanical force sensors [12]; such oscillators also have increased sensitivity to their environment. This introduces new sources of noise that can cause fluctuations in parameters, such as the effective oscillator temperature and mechanical resonance frequency. As optomechanical technology continue to advance, it can be expected that methods to characterize, monitor, and control these additional noise sources, in conjunction with thermal and measurement noise, will become increasingly important. In this context, statistical signal processing techniques that are provably optimal in a theoretical sense offer the potential to improve the actual sensing performance significantly, beyond the heuristic curve-fitting procedures commonly employed in the field.
In this paper, we introduce a statistical framework to study the problem of parameter estimation from a noisy optomechanical system. This problem is especially relevant to the recent optomechanics experiments reported in Refs. [12,13]. We derive analytic expressions for the Cramér-Rao lower bound on the estimation errors and apply various estimation techniques to experimental data to estimate the parameters of an optomechanical system, including the force power, mechanical resonance frequency, damping rate, and measurement noise power.
Our analytic results provide convenient expressions of the estimation errors as a function of system parameters and measurement time and should be valuable to optomechanical experiment design. Another highlight of our study is the use of the expectation-maximization (EM) algorithm [14][15][16], which is generalized here for a complex Gauss-Markov model and applied to a cavity optomechanical system, both for the first time to our knowledge. Among the estimators we have studied, including the one used in Refs. [12,13], we find that the root-mean-square errors of the EM algorithm in estimating the force noise power are the lowest, following the Cramér-Rao bound most closely and beating the estimator in Refs. [12,13] by more than a factor of 5 for longer measurement times.
Our framework is also naturally applicable to quantum systems that can be described by a homogeneous Gauss-Markov model [17], such as quantum optomechanical systems [18][19][20][21][22] and atomic spin ensembles [23,24]. This makes our study, and the EM algorithm in particular, relevant not just to future precision sensing and system identification applications, but also to fundamental tests of quantum mechanics [18,19,25].

Experiment
To motivate our theoretical model and numerical analysis, we first describe the optomechanical experiment presented in Ref. [13] that was used to produce the data. The transducer under consideration consists of a room temperature microtoroidal resonator that simultaneously supports mechanical modes sensitive to external forces and high quality optical modes that permit ultra-precise readout of the mechanical displacement. We couple shot-noise limited 1550nm laser light into a whispering gallery mode of the microtoroid via a tapered optical fiber which is nested inside an all fiber inteferometer. Excitation of the mechanical mode, which has fundamental frequency, damping rate and effective mass of Ω m = 40.33 MHz, γ = 23 kHz and m eff = 7 ng respectively, induces phase fluctuations on the transmitted light which is measured by shot-noise limited homodyne detection. To maintain constant coupling of optical power into the microtoroid we use an amplitude and phase modulation technique to actively lock the toroid-taper separation [26] and laser frequency respectively. The relative phase of the bright local oscillator to the signal is controlled via a piezoactuated fiber stretcher that precisely tunes the optical path length in one arm of the inteferometer. To specifically demonstrate power estimation, a small incoherent signal is applied to the mechanical oscillator in addition to the thermal fluctuations. This is achieved by the electrostatic gradient force applied by a nearby electrode driven with white noise from a signal generator [27].
The measurement record is acquired from the homodyne signal by electronic lock-in detection which involves demodulation of the photocurrent at the mechanical resonance frequency allowing real time measurement of the slowly evolving quadratures of motion, denoted I(t) and Q(t) where x(t) = I(t) cos(Ω m t)+Q(t) sin(Ω m t). The room temperature thermal fluctuations of the mechanical mode are observed with a signal-to-noise ratio of 37dB and calibrated via the optical response to a known reference modulation [28]. The resulting force sensitivity, which can be extracted from Fourier analsis of the measurment record, will depend on the specific protocol used. Here we evaluate the force sensitivity of 3 parameter estimation protocols relative to the Cramér-Rao lower bound.

Continuous-time model
A simple linear Gaussian model for the mechanical mode can be described by the following equation for the complex analytic signal z(t) of the mechanical-mode displacement: where Ω is the mechanical resonance frequency relative to Ω m , γ is the damping rate, and ξ(t) is the stochastic force as a sum of the thermal noise and the signal. ξ(t) is assumed to be a complex zero-mean white Gaussian noise [29] with power A and covariance function The measurements can be modeled in continuous time as where C is a real parameter and η(t) is the measurement noise, assumed to be a complex additive white Gaussian noise with power R: We assume that the parameters are constant in time, such that z(t), ξ(t), y(t), and η(t) are stationary stochastic processes given θ. y(t), in particular, has a power spectrum given by Although this simple model suffices to describe our experiment, it is not difficult to generalize our entire formalism to describe more complicated dynamics and colored noise [30]. This is done by generalizing z(t) to a vector of state variables for more mechanical and optical modes, Eq. (1) to a vectoral equation of motion, and the parameters (Ω, γ, A, C, R) to matrices that describe the coupled-mode dynamics and the noise statistics.

Binary hypothesis testing
Although hypothesis testing [16,29] is not the focus of our study, the theory is useful for the derivation of the Cramér-Rao bound, so we present the topic here briefly for completeness. Suppose that there are two hypotheses, denoted by H 0 and H 1 , with prior probabilities P 0 and P 1 = 1 − P 0 . From a measurement record Y , with a probability density P (Y |H 0 ) or P (Y |H 1 ) that depends on the hypothesis, one wishes to decide which hypothesis is true. Given the densities and a decision rule, one can compute P jk , the probability that H j is chosen when H k is true. The average error probability is P e ≡ P 10 P 0 + P 01 P 1 .
P e can be minimized using a Bayes likelihood-ratio test: which means that H 1 is chosen if Λ ≥ P 0 /P 1 and vice versa. The resulting P e is often difficult to compute analytically, but can be bounded by upper and lower bounds. For where the upper bound is the Chernoff bound, and F (0.5) is known as the Bhattacharyya distance between the two probability densities [31]. Let Y be a record of continuous measurements: If x(t) is a realization of a real zero-mean stationary process with spectrum S x (ω|H j ), the exponent of F (s) in the case of stationary processes and long observation time This expression means that F (s) has the form of where − ln β(T ) is asymptotically smaller than T : and therefore β(T ) decays more slowly than exp(−Γ F T ). For the model in Sec. 3.1, y(t) is a complex signal, and (15) needs to be modified. This can be done by assuming that y(t) is bandlimited in [−πb, πb] and considering a real signal x(t) given by where ω 0 is a carrier frequency assumed to be > πb. We then have Using this expression in (15) leads to another expression for the Chernoff exponent given by Note the absence of the 1/2 factor in (20) compared with (15).

Cramér-Rao Bound
We now consider the estimation of θ from Y with a probability density given by P (Y |θ).
Defining the estimate asθ(Y ), the error covariance matrix is Assuming thatθ satisfies the unbiased condition the Cramér-Rao bound on Σ is [16] Σ where J(θ) is known as the Fisher information matrix: It turns out that J(θ) can be related to the Bhattacharyya distance in a hypothesis testing problem with F (s) defined in (12) becomes a function of θ and θ ′ , and J(θ) can be expressed as [29] For the model in Section 3.1, y(t) is a realization of a stationary process given θ, so J(θ) for the estimation of θ from Y = {y(t); −T /2 ≤ t ≤ T /2} in the SPLOT case can be obtained by combining (20) and (29): where S y (ω|θ) is given by (6). This expression means that J(θ) for any stationaryprocess parameter estimation problem increases linearly with time as T → ∞, in the sense of where o(T ) is asymptotically smaller than T : In the asymptotic limit, maximum-likelihood (ML) estimation can attain the Cramér-Rao bound [15], so the bound is a meaningful indicator of estimation error. Despite the asymptotic assumption, the simpler analytic expressions are more convenient to use for experimental design purposes.
Although the preceding formalism is applicable to the estimation of any of the parameters, in the following we focus on A, the force noise power. The Cramér-Rao bound on the mean-square estimation error Σ A is This bound allows us to investigate the efficiency of the parameter estimation algorithms presented in the next section.

Averaging
We first consider the estimator used in Refs. [12,13]: The rationale for this simple averaging estimator is that, in the absence of measurement noise (R = 0), it is an unbiased estimate for T → ∞: The unbiased condition breaks down, however, in the presence of measurement noise, and we are therefore motivated to find a better estimator.

Radiometer
The "radiometer" estimator described in Ref. [29] can be easily generalized for complex variables. The result iŝ where h(t − t ′ ) filters y(t ′ ) before correlating the result with y * (t), and G and B are parameters chosen to enforce the unbiased condition. We see that the averaging estimatorÂ avg given by (36) also has the radiometer form. It can be shown that, for T → ∞, The mean-square error, on the other hand, has the asymptotic expression lim T →∞ This expression coincides with the Cramér-Rao bound given by (34) and (35) if we set and A happens to be equal to D. For any other value of A, the radiometer is suboptimal.

Expectation-maximization (EM) algorithm
A major shortcoming of the radiometer is its requirement of parameters other than A to be known exactly. Another issue is that it assumes continuous time and relies on asymptotic arguments, when the measurements are always discrete and finite in practice. We find that the EM algorithm [14][15][16], which performs maximum-likelihood (ML) estimation and is applicable to the linear Gaussian model we consider here, overcomes both of these problems. ML estimation aims to find the set of parameters θ that maximizes the log-likelihood function ln P (Y |θ). This task can be significantly simplified by the EM algorithm if there exist hidden data Z that results in simplified expressions for P (Z|Y, θ) and P (Y, Z|θ). Starting with a trial θ = θ 0 , the algorithm considers the estimated log-likelihood function where the superscript k is an index denoting the EM iteration, and finds the θ k+1 for the next iteration by maximizing Q: The iteration is halted when the difference betwen θ k+1 and θ k reaches a prescribed threshold, and the final θ k+1 is taken to be the EM estimateθ EM . To apply the EM algorithm to our model in Section 3.1, we consider a complex discrete-time Gauss-Markov model: y j = cz j + v j , j = 0, 1, . . . , J.
In general, z j and y j can be column vectors, and f and c are matrices. w j and v j are complex independent zero-mean Gaussian random variables with covariances given by where † denotes the conjugate transpose, ⊤ denotes the transpose, and q and r are covariance matrices. The parameters of interest θ are the components of f , c, q, and r. The EM algorithm for a real Gauss-Markov model described in Refs. [15,16] is generalized to account for complex variables in Appendix A. The problem may become ill-conditioned when too many parameters are taken to be unknown and multiple ML solutions exist [15,16,32], so we choose a parameterization with known q: where δt is the sampling period. With the EM estimatesf EM ,ĉ EM , andr EM and assuming that δt and C are known by independent calibrations, we can retrieve estimates of Ω, γ, A, and R: It can be shown that the ML parameter estimator for the Gauss-Markov model is asymptotically efficient [15], meaning that it attains the Cramér-Rao bound in the limit of T → ∞.

Procedure
There are two records of experimental data, one with thermal noise in ξ(t) and one with additional applied white noise in ξ(t), leading to a different A for each record, denoted by A (0) and A (1) . Each record contains J max + 1 = 3, 750, 001 points of y (n) j . With a sampling frequency b = 1/δt = 15 MHz, the total time for each record is T max = (J max + 1)δt ≈ 0.25 s. From independent calibrations, we also obtain C = 2.61 × 10 −2 (fN/ √ Hz) −1 . To investigate the errors with varying T , we divide each record into slices of records with various T , resulting in M(T ) = floor(T max /T ) number of trials for each T . Using a desktop computer (Intel Core i7-2600 CPU@3.4GHz with 16GB RAM) and MATLAB, we apply each of the three estimators in Section 4 to each trial to produce an estimateÂ (n) m,l (T ), where m denotes the trial and l denotes the estimator. The EM iteration is stopped when the fractional difference between the current estimate of A and the previous value is less than 10 −7 . For the averaging and radiometer estimators, true values for Ω, γ, and R are needed, and since we do not know them, we estimate them by applying the EM algorithm to the whole records. This is reasonable because T max ≫ 4 ms ≥ T , and we expectθ and compared with the SPLOT Cramér-Rao bound J −1 Note that the estimation error in general contains two components: whereĀ is the sample mean of the estimate, the first component is the sample variance, and the second component is the square of the estimate bias with respect to the true value A. Unlike Refs. [12,13], our error analysis is able to account for the bias component more accurately by referencing with the much more accurate long-time EM estimates.

Results
Applied to the two records, the EM algorithm produces the following estimates: The algorithm takes ≈ 3.3 hours to run for each record. These values are then used as references to analyze the estimators at shorter times.  (i) The averaging estimator is more accurate than the radiometer for short times but becomes much worse for longer times. We cannot explain the short-time errors because our analytic results rely on the long-time limit, although the errors there are so high relative to the estimate that they are irrelevant to real applications. The large long-time errors can be attributed to the bias and suboptimality of the estimator.
(ii) The radiometer beats the averaging estimator and approaches the Cramér-Rao bound for longer times. This is consistent with our SPLOT analysis, as we have chosen D =Â EM (T max ) and the radiometer should be near-optimal. (iii) The EM estimator beats the other estimators at all times and follow the Cramér-Rao bound more closely, even though we allow the averaging and radiometer estimators to have the unfair advantage of accessing more accurate values of Ω, γ, and R. This may be explained by the fact that the EM algorithm is formulated to perform ML estimation on discrete measurements for any finite T , unlike the other estimators that rely only on asymptotic arguments.
(iv) The EM estimator takes a much longer time to compute (computation time ≈ 200 s for one trial with J + 1 = 60, 000 points and T = 4 ms) than the other estimators (≈ 0.3 ms for the averaging estimator, ≈ 16 ms for the radiometer). If computation time is a concern, the radiometer estimator may be preferable, although its performance depends heavily on the accuracy of the other assumed parameters, and the EM method can still be useful for estimating such parameters in offline system identification.
To gain further insight into the finite gap between the errors and the Cramér-Rao bound, in Figure 2 we plot the raw spectrum of y (n) j , defined as The figure shows that our model does not exactly match the experiment in two ways: (i) The data show a second weaker resonance peak.
(ii) The noise floor of the data rolls off at higher frequencies due to the presence of an RF notch filter in the experiment prior to data acquisition.
Despite the mismatch, our results are in reasonable agreement with the theory. To improve the estimation accuracy further, the weaker resonance can be modeled by including another mode in our linear Gaussian model, while the noise-floor roll-off can be removed by a whitening filter before applying the estimators.

Outlook
In this paper we have followed the paradigm of orthodox statistics to investigate parameter estimation for an optomechanical system, focusing on unbiased and ML estimators and the Cramér-Rao bound. For detection applications [33] with uncertain parameters, the ML estimator can form the basis of more advanced hypothesis testing techniques, such as the generalized likelihood-ratio test [16]. The assumption of static parameters means that the presented techniques are most suited to system identification purposes. For sensing applications, the parameters are often time-varying, and Bayesian estimators, such as the extended and unscented Kalman filters for continuous variables [34], the generalized-pseudo-Bayesian and interacting-multiple-model algorithms for finite-state dynamical hypotheses [35], and particle filtering [36], may be more suitable.
Since the Gauss-Markov model often remains valid for quantum systems [17], a quantum extension of our study is straightforward. This means that the presented techniques are potentially useful for future quantum sensing and system identification applications, such as optomechanical force sensing [18,[20][21][22], atomic magnetometry [23,24], and fundamental tests of quantum mechanics [18,19,25]. We expect our parametric methods to lead to more accurate quantum sensing and control than robust quantum control methods [23,37], which may be too conservative for the highly controlled environment of typical quantum experiments. There also exist quantum versions of the Cramér-Rao bound that impose fundamental limits to the parameter estimation accuracy for a quantum system with any measurement [38][39][40], and it may be interesting to explore how close the classical bounds presented here can get to the quantum limits.
The continued improvement of optomechanical devices for applications and fundamental science requires precise engineering of the mechanical resonance frequency, dissipation rate and effective mass. This necessitates a deep understanding of how these mechanical properties depend on differing materials and fabrication techniques. The mechanical resonance frequency is easily predicted via a numerical eigenmode analysis using the geometry of the structure and the Youngs modulus of the material. It is much more challenging to predict the level of mechanical dissipation, where numerical models are not as well established and multiple decay channels usually exist. Effective experimental characterization of such dissipation channels requires high precision force estimation to accurately quantify the oscillators coupling to the environment. This is critical to advancing optomechanics in applications such as quantum memories and quantum information [41,42]. A more immediate application for high precision force estimation is that of temperature sensing and bolometry where small relative changes of the signal power are of interest, for example, in detecting submillimeter wavelengths in radio astronomy [43] or even to search for low energy events in particle physics [44]. Given the demonstrated success of our statistical techniques, we envision them to be similarly useful for all these applications.
for complex variables, we have where α does not depend on θ and is discarded. To compute the estimated log-likelihood function Q(θ, θ k ), we need which can be computed by the Rauch-Tung-Striebel (RTS) smoother [16,34]. Starting with stationary initial conditions forẑ +k −1 and Π +k −1 , the smoother consists of a forward Kalman filter:ẑ  16) until j = 0. We can then write Q(θ, θ k ) as Maximizing Q(θ, θ k ) with respect to θ, we find One can simply take the real part of Eq. (A.25) if c is known to be real. The complex EM algorithm turns out to be the same as the real version with all transpose operations ⊤ replaced by conjugate transpose † . The same algorithm is also applicable to the quantum Gauss-Markov model [17], as the RTS smoother is equivalent to the linear quantum smoother [24,45,46]. The possibility of using the EM algorithm for quantum systems is also mentioned in Ref. [47]. The complex model is more compact when the noises are phase-insensitive. With phasesensitive noises, there is no computational advantage with a complex model and one can just use a real model to describe the real and imaginary parts separately.