Quantum non demolition measurements: parameter estimation for mixtures of multinomials

In Quantum Non Demolition measurements, the sequence of observations is distributed as a mixture of multinomial random variables. Parameters of the dynamics are naturally encoded into this family of distributions. We show the local asymptotic mixed normality of the underlying statistical model and the consistency of the maximum likelihood estimator. Furthermore, we prove the asymptotic optimality of this estimator as it saturates the usual Cram\'er Rao bound.


Introduction
Measuring directly a small quantum sized physical system is done by letting it interact with a macroscopic instrument.This procedure can result in the destruction of the measured system.For example photons are absorbed to create an electronic signal.To avoid the destruction of the measured system, one relies then on indirect measurements.The system first interacts with an auxiliary system, or probe, that is then measured.The goal is then to infer the system state from the information obtained through this indirect measurement.Though, from the laws of quantum mechanics, this procedure induces a back action on the system that may change its state.Moreover, the measurement outcome being inherently random, the system state may become random itself.Hence, if one aims at measuring indirectly a physical quantity of the system, the indirect measurement must be tuned such that, if the system state corresponds to an almost sure value of the physical quantity of interest, then, indirectly measuring it will not modify its state.This kind of indirect measurements is called Quantum Non Demolition (QND) measurements.It has been introduced in the eighties as a technique for precise measurements [6].Maybe one of the experiment illustrating best QND measurements is Haroche's group one.By sending atoms through a supra conducting cavity containing a monochromatic electromagnetic field, and measuring the atoms, it is possible to measure the number of photons inside the cavity whithout destroying them [8].
To increase the amount of information on the system state obtained through a QND measurement, the procedure is repeated.The system evolution is then described by an unobserved Markov chain (φ n ) (see Section 5 for the complete description).The only handy observation is the sequence of the measurement results denoted by (X n ).Baye's law maps the information of (X n ) in the evolution of the Markov chain.
In general, the sequence of random variables (X n ) is not i.i.d.(even not Markovian).Therefore, statistical inference for QND measurement cannot fully rely on standard results on i.i.d.models.Efficient parameter estimation is of course crucial for these  experiments.Particularly if one hopes to have a faithful estimation of the system state Markov chain (φ n ).
In this paper, we show that the QND model enters perfectly in the framework of the usual statistical asymptotic theory.More precisely, we provide a complete study in terms of local asymptotic mixed normality (LAMN) of the model (we refer to [14] for the whole theory).It is worth noticing that a quantum analog of local asymptotic normality (LAN) called quantum local asymptotic normality (QLAN) has been developed in the context of quantum statistics [9,12].Similarly quantum extensions of classical notions such as Quantum Fisher Information and Cramér Rao bound have been developed [11,Section 2.2.5].Here, we will not follow this approach and will concentrate on more classical statistical properties.Our results rely on the fact that our model, thanks to the QND condition, is actually a mixture of i.i.d.statistical models.More precisely, it has been show in [2,3,4,5,1] that the probability space describing these experiments can be divided into asymptotic events (belonging to the tail algebra) such that (X n ) conditioned to one of such asymptotic event is a sequence of i.i.d.random variables.So that, the law of (X n ) is a mixture of i.i.d.laws.The weights involved into the mixture depends on the initial state of the system.
The conditioning making (X n ) an i.i.d.sequence is highly exploited in order to derive the LAMN property.To our knowledge this is the first time that the LAMN property is shown in this context.After proving the LAMN property we study the maximum likelihood estimation and prove that it is optimal in the sense that the Cramér Rao bound is achieved asymptotically.Note that parametric estimation for indirect measurements has been previously investigated in [10,7,13] with different assumptions.But, the theory of asymptotic likelihood has not been studied therein.
The paper is organised as follows.In Section 2, we discuss the model of multinomial mixture studied along the paper.In Section 3 we show the local asymptotic mixed normality.Section 4 is devoted to the results for the maximum likelihood estimator (consistency and saturation of the Cramér Rao bound).Finally in Section 5 we work on the QND model underlining the link with multinomial mixtures.Further, some numerical simulations illustrate our results on a QND toy model inspired by [8].
Let P be a probability distribution, we will use the notation L−P = to mean equality in distribution when the underlying probability space is endowed with P.This notation will also be used for convergence in distribution writing sometimes L−Pn = when a family of probability measures (P n ) is involved.In the paper the notation N (m, σ 2 ) is used for the Gaussian distribution of mean m and variance σ 2 .For any x ∈ R D and any subset A ⊂ R D , the set A + x will denote A + x = {y + x, y ∈ A}.
Let us now describe the probability model that we will study.Let Ω = A N and let F be the smallest σ-algebra containing the cylinder sets {ω ∈ Ω|ω k = j k , ∀k ≤ n}.All the measures introduced afterwards are defined on the measurable space (Ω, F) without mentioning it.
For any θ ∈ Θ and for each α ∈ P, let P θ|α be the multinomial probability measure built on the weights (p θ (j|α)) j∈A .Namely, for any n-tuple (j 1 , . . ., j n ) ∈ A n , Let (q(α)) α∈P be a probability measure on P, we denote by P q θ the probability measure defined as a convex combination of the measure P θ|α with weights (q(α)) α∈P : Without loss of generality, we shall always assume that q(α) > 0 for all α ∈ P. Indeed, one can reduce the set P if needed.
From now on, we will denote by θ * ∈ Θ the true value of the parameter θ.We assume that θ * is in the interior of Θ.The following definition will be usefull.When it does not lead to confusion we may omit the index θ * for both the sets Ω θ * |γ and the random variable Γ θ * .Remark: It is a direct consequence of Assumption ID that At this stage we need to introduce some quantities quantifying the information and proximity in our models.In particular, we shall use many times the Shannon entropy given a parameter θ and the Kullback-Leibler divergence given θ with respect to θ ′ .For α, β ∈ P and θ, θ be the Shannon entropy and (2) , are closely related to the one used in [3]).
Lemma 1. Assume that ID holds.
(1) Almost sure convergence Let (X γ n ), be a sequence of random variables depending on γ ∈ P. If for any γ ∈ P where Γ is a r.v.whose distribution is given by Pr(Γ = γ) = q(γ).
Proof.Assume that (X γ n ) converges almost surely towards X γ w.r.t P θ * |γ .Then, . This is true for any γ ∈ P. Since P θ * is convex combination of the measures P θ * |γ , for any continuous and bounded function f .Since That convergence yields (2).
In the sequel we shall use the process (N n (j)) j∈A where for all n ∈ N and all j ∈ A for all ω ∈ Ω, which counts the number of times the result j appears before time n.
Remark that j∈A N n (j) = n.The strong law of large number for i.i.d L 1 random variables involves that ( 4) for all α ∈ P, all j ∈ A and all θ ∈ Θ.

Local Asymptotic Mixed Normality
We first prove that the statistical model (P θ , θ ∈ Θ) is asymptotically equivalent to a mixture of Gaussian models.Let us first recall the definition of local mixed asymptotic normality that we shall prove for our model.Definition 2 (Local Asymptotic Mixed Normality(LAMN) [14]).A sequence of statistical models where the law of The LAMN property for our model follows from next lemma.It reduces the problem to local asymptotic normality for i.i.d.multinomials.Lemma 2. Assume that ID holds and that for any (j, α) ∈ A × P the function p .(j|α) is continuous in θ * .Let (θ n ) ∈ Θ be a sequence of random variables such that lim n θ n = θ * , (P θ * |γ − a.s).Then, Proof.We prove the stronger result that the convergence is exponential but we will only need the convergence at order o(n −1/2 ).
From the definition of P θn , The strong law of large numbers (4) and the continuity assumption on the functions p .(j|α) imply Recall that Assumption ID implies S θ * (γ|α) > 0 for any α = γ.Therefore, there exists s > 0 independent of γ, α and of the sequence (θ n ) such that,

Now,
so that the result follows.
Now we state our main result.

Maximum likelihood estimation
This section is devoted to the study of the maximum likelihood estimator.For ω ∈ Ω at step n the log likelihood is defined as We study the maximum likelihood estimator defined as θn := argmax θ∈Θ ℓ n (θ).Proof.Since Θ is compact and that for any (j, α) ∈ A × P the function p .(j|α) is continuous and positive, we get max Furthermore, the strong law of large numbers (4) implies that for any α ∈ P, The consistency of the maximum likelihood estimator follows now from standard arguments.
We now prove that the maximum likelihood estimator saturates this asymptotic bound.We prove it comparing θn with θγ n defined by θγ n := argmax θ∈Θ ℓ γ n (θ), where for all θ ∈ Θ and all ω ∈ Ω.
Lemma 4. Assume that ID holds and that for any (j, α) ∈ A × P the function p .(j|α) is twice continuously differentiable in a neighborhood of θ * .Assume further that for each γ ∈ P, I θ * (γ) is not singular.Then, Proof.Standard results of parametric estimation for i.i.d.random variables and our assumption ID imply lim n→∞ θγ n = θ * P θ * |γ -a.s.From the definition of the maximum likelihood estimators, (8) ∇ℓ γ n ( θγ n ) − ∇ℓ γ n ( θn ) = ∇ℓ n ( θn ) − ∇ℓ γ n ( θn ) Let O be a sufficiently small neighborhood of θ * (on which p . is regular).The consistency of θn and θγ n ensures that P θ * |γ -a.s., for n large enough, the likelihood estimators θn and θγ n belong to O. Hence, since by assumption ℓ γ n is twice differentiable in a neighborhood of θ * , there exists a sequence of random variable ξ n lying in the segment with extremity θn and θγ n such that P θ * |γ − a.s., for n large enough, The last equality comes from (8) and the Mean Value Theorem.Now note that using explicit derivation and the strong law of large number (4), we have lim Since I θ * (γ) is assumed to be non singular and θ → I θ (γ) is continuous in a neighborhood of θ * , the previous convergence implies that for n large enough Let us now prove that ∇ℓ γ n ( θn ) − ∇ℓ n ( θn )) = o(n −1/2 ) P θ * |γ -a.s.
Proposition 2. Assume that ID holds and that for any (j, α) ∈ A × P the function p .(j|α) is three times continuously differentiable in a neighborhood of θ * .Assume further that for any γ ∈ P, I θ * (γ) is not singular.Let Z ∼ N (0, I d ) and Γ be a random variable independent of Z and taking value in P such that Pr(Γ = γ) = q(γ).Then, for any Proof.From standard results in parameter estimation [14], under Assumption ID the statistical model (P θ|γ , θ ∈ Θ) is LAN.Moreover, for any h ∈ Θ − θ * , under the assumptions of the proposition, (10) Actually, the proof of (P θ|γ , θ ∈ Θ) being LAN and the weak convergence (10) are based on the same Central Limit Theorem.It follows that, It follows then from Lemmas 2 and 4 that, So that, Lemma 1 yields the proposition.

Applications to Quantum non-Demolition Measurement
As mentioned in the Introduction, the above development is motivated by some applications in quantum physics.In particular, as we will see, the above estimation results can be applied in the context of QND measurement.For the sake of completeness we recall briefly the QND model.For a complete overview of this model we refer to [2,3].
Let {e α | α ∈ P} and {ψ j | j ∈ A} be orthonormal basis of respectively C d and C l .These last spaces are endowed with their canonical Hilbert space structure.In the context of quantum physics, these basis will be associated with some physical quantities.Each vector of these basis describes the physical state corresponding to an almost sure value of said physical quantities.The Hilbert space C d describes the quantum system that one aims to measure indirectly.The Hilbert space C l describes a probe that will be used to measure indirectly.Now, we detail the usual setup of indirect measurement.It consists in measuring something on the probe after some interaction between the system and the probe.More precisely, the interaction is described through a unitary operator U on C d ⊗ C l .For QND measurements, this operator may be written as Here, π eα is the projector on the line Ce α and (U α ) α∈P are unitary operators on C l .(U α ) depends on the unknown parameters of the experiment.The state of the system is represented by the unit vector 1 φ 0 ∈ C d .This vector may be expanded on the first basis: The state of the probe is represented by a unit vector ψ ∈ C l .After the interaction the joint system-probe state is the unit vector U (φ 0 ⊗ ψ) ∈ C d ⊗ C l .This vector may be expanded as, U (φ 0 ⊗ ψ) = α e α , φ 0 e α ⊗ U α ψ.
We are now in position to see how multinomial mixtures encompass the law of sequence of measurement results in QND measurements.
To begin with, let us assume that φ 0 = e α for some α ∈ P.Then, Further, from the definition of U , This property justifies the denomination non demolition explained in the Introduction.
If the system state is e α before the interaction it remains e α after the interaction.Quantum mechanics tells that measuring a physical state in {ψ j | j ∈ A} has the following probability distribution 1 Actually the state corresponds to the line given by the direction φ0.Particularly two states are equivalent if they differ only by a phase.We no longer mention it and always mean equality up to a phase when comparing two states.
In others words, this is the new system state conditioned on the outcome ψ j (for the probe).This system state update leads to the update of q 0 (α), This procedure results in the definition of random variables φ 1 and (q 1 (α)) α∈P whose laws are images of the law P[observing ψ j ] = π 0 (j).Now, we repeat the previous steps.Let (X n ) be the resulting sequence of outcome (identifying ψ j and j).We have Using Kolomgorov's consistency Theorem we thus have defined the law of the random sequence (X n ).For α ∈ P, let P α be the probability measure such that (15) Then the law of (X n ) is the mixture of multinomials α q 0 (α)P α .Let turn now to the statistical model.For any α ∈ P, the unitary operator U α = U α (θ) depends on the unknown parameter θ.Hence, we wish to study the statistical model (P θ ) with where P θ,α is defined in (15) with Now, all the results developed in the last sections hold whenever the regularity assumptions are assumed directly on (U α (•)) α∈P .Let us now unravel what is the Fisher information for the QND measurement model.
Hence, dim Θ = 7.The ideal values of the parameters are θ 5 = θ 6 = 1, θ 4 = π/4 and θ a = (2 − a)π/4.Experimentally some imperfections imply that θ 6 is smaller than 1.It is close to θ = 0.674 ± 0.004.Besides this limitation in [8] the authors find parameters close to their target values using a best fit to the empirical distribution.Setting θ 6 = 0.674, it is easy to check that ID holds for the ideal parameters and in a small enough but sufficiently large neighborhood of the true parameters.Moreover, all the functions θ → p θ (x, a|α) are entire analytic.We limit ourselves to the estimation of θ 4 the other parameters are fixed to their true values except θ 6 .We set θ 6 = 0.674.Further, we take Θ = [π/8, 3π/8] and, 2 Remark that in [8] α = 0, . . ., 7 so Assumption ID is not verified for α = 0.More precisely θ4 may not be identifiable.Though for the ideal value of θ = π/4, α = 0 is equivalent to α = 8 so we use this value.Hence, we identify the zero photon state with the eight one at the opposite of what is done in [8].Fisher information is not singular at θ = π/4.Figure 1 depicts some simulations of ( θn ) for this toy model.