Statistical Inference with Quantum Measurements: Methodologies for Nitrogen Vacancy Centers in Diamond

The analysis of photon count data from the standard nitrogen vacancy (NV) measurement process is treated as a statistical inference problem. This has applications toward gaining better and more rigorous error bars for tasks such as parameter estimation (eg. magnetometry), tomography, and randomized benchmarking. We start by providing a summary of the standard phenomenological model of the NV optical process in terms of Lindblad jump operators. This model is used to derive random variables describing emitted photons during measurement, to which finite visibility, dark counts, and imperfect state preparation are added. NV spin-state measurement is then stated as an abstract statistical inference problem consisting of an underlying biased coin obstructed by three Poisson rates. Relevant frequentist and Bayesian estimators are provided, discussed, and quantitatively compared. We show numerically that the risk of the maximum likelihood estimator is well approximated by the Cramer-Rao bound, for which we provide a simple formula. Of the estimators, we in particular promote the Bayes estimator, owing to its slightly better risk performance, and straight-forward error propagation into more complex experiments. This is illustrated on experimental data, where Quantum Hamiltonian Learning is performed and cross-validated in a fully Bayesian setting, and compared to a more traditional weighted least squares fit.


I. INTRODUCTION
Developing tools for the characterization of quantum systems is an increasingly important problem. As more performance is demanded out of quantum devices, more knowledge about these quantum devices is also required. This applies as much to large-scale multi-qubit quantum information processors as it does to small single-qubit quantum sensors. Importantly, this knowledge must include not only estimates of relevant quantities, but also careful estimates of the uncertainty of these quantities. Indeed, if the application is, for example, metrology, then the very nature of the problem demands that one should be as confident in one's ability to produce meaningful error bars as one's ability to produce the estimate itself. Or, if one is using knowledge about a quantum system to design control sequences (such as unitary gates), then it is important to know how much system parameters are expected to vary through space and time. If the estimate of this parameter distribution is too tight, the control sequence will not meet specifications, and if it is too broad, the control sequence will not have optimal efficiency [1][2][3][4].
These considerations imply that data from quantum experiments should be analyzed on firm statistical footing. This, in particular, requires a detailed model that computes the likelihoods of experimental outcomes given a specified set of system parameters or hyperparameters. This is not to say that we need perfect statistical models, but rather, that models and methods should be well enough defined so that rigorous questions can be asked and answered unambiguously. Most of the widespread characterization protocols in quantum information have been described by statistical models. Quantum mechanics is ultimately a statistical theory and so this is usually a natural thing to do. State and process tomography have been studied extensively as matrix-valued inference [5][6][7], randomized benchmarking (RB) and derivative protocols are inherently statistical [8][9][10][11], and Hamiltonian parameter learning is often considered from a machine learning perspective [12][13][14][15]. These statistical models of characterization, however, usually stop short of platform dependent considerations; for example, the RB protocol does not tell you how to interpret the noisy voltage measurement of a superconducting qubit. Such divisions are drawn to achieve crossquantum-platform generality.
Specific platforms, however, really do need to think carefully about how to append their own messy details onto these well-established models. Incorrect models may lead to estimates biased in unknown ways, or, just as bad, to error-bars which are inaccurate given the actual data. This leads us to the purpose of the present paper: a detailed and explicit model of the measurement process of the negatively charged Nitrogen Vacancy (NV − ) center in diamond, along with an analysis of relevant estimators and methodologies.
NV − centers have been studied extensively due to a number of remarkable physical properties [16]. These include long coherence times at room temperature [17], the ability to address and readout a single defect in isolation [18,19], the ability to initialize to a (nearly) pure state on demand [20], and the ability to selectively interact with nearby nuclei [21][22][23][24][25][26]. Moreover, the NV − center's sensitivity to external macroscopic properties like magnetic fields, electric fields, and temperature, in combination with its nanoscopic spacial profile, have shown it to be highly suitable as a quantum sensor [27][28][29].
As such, it is of value to carefully examine precisely what one learns from the NV − center when a measurement is made. If the application is quantum sensing, then it is crucial to be able to accurately report error bars of the quantity of interest. Or, if an application requires high-fidelity control, it is important to be able to quantify one's knowledge of the system parameters so that control sequences can be designed to be robust against uncertainty. If these same control sequences are being tested in a tomography or benchmarking protocol, then sensible estimates of figures of merit also presuppose a thorough understanding of the measurement data.
These demands all reduce to statistical inference, which is the process of inferring values of interest from a model, given a dataset of measurements. However, in order to trust an inference one must first be able to trust the sufficiency of the model. Therefore, we begin by taking the accepted physical description of the NV − measurement process and use it to derive a statistical model describing the expected distributions of experimental measurements, including annoyances such as limited visibility, dark counts, imperfect state preparation, and reference drift.
This paper is organized as follows. Section II introduces a phenomenological model of the optical dynamics of an NV − center in terms of Lindblad jump operators. Section III provides a model for the measurement process, including photon loss and dark counts. Section IV describes the state initialization process and derives the pseudo-pure state and its contrast-reduced references. Section V introduces a stochastic model of the reference drift process. Section VI uses the derived description of the measurement process to state it as a statistical inference problem -readers who are less interested in the physical derivation of the model may choose to begin with this section; though it is the culmination of the preceding sections, it has been written to be comprehensible without them. Section VII provides a couple of estimators for this inference problem, and compares them. Section VIII gives an example of the model being used with experimental data for Quantum Hamiltonian Learning [12]. All raw data and code necessary to reproduce the results of this paper are openly available online [30].

II. OPTICAL DYNAMICS OF THE NV − CENTER IN DIAMOND
Here we briefly summarize the optical dynamics of the NV − center in diamond, setting up our notation for future sections. More complete descriptions can be found elsewhere, see, for example, the review article by Doherty et al. [31] and references therein.
The NV − defect is an impurity in the diamond lattice consisting of a nitrogen atom adjacent to an empty lattice site, replacing two carbon atoms that would normally be in those positions. This defect, when negatively charged, has six bound electrons whose spacial wavefunction extends on the order of a dozen lattice sites before becoming negligible compared to nuclear dipolar interactions [26,32,33]. These six electrons form an effective spin-1 electron in the electronic ground state [16]. It is standard practice to pick two out of these three spin energy levels to define a qubit.
Importantly, the NV − center also has optical properties; it can absorb and emit photons. The energy difference between the optical ground state and the first excited state is 637 nm (red). It is experimentally convenient to optically excite the defect from the ground state to the first excited state using light with a wavelength outside of the emission spectrum, for example 532 nm (green). This is possible due to the presence of higher energy states above the first excited state which very quickly decay to the first excited state, while preserving spin populations. Separating red and green allows a confocal microscope to be set up so that optical cycling of a single isolated NV − center can be studied [18]: incident green light is delivered to a small region inside of a diamond, roughly a cubic micron in volume, and red light is extracted from the same region. Assuming the use of a diamond whose impurities are sparse enough, this region can be chosen to contain a single NV − center.
The dynamics of the NV − center are usually described for the spin system alone, with the optical degrees of freedom assumed to be in the ground state. However, since we are interested in the measurement process, we must include both degrees of freedom. We describe the dynamics using a seven level system: three levels for the optical ground state spin system, three levels for an optical excited state spin system, and one level for an optical inter-system-crossing (ISC). It is known that more levels exist [31], but adding them to this discussion will not change our model of the measurement process.
We decompose our Hilbert space as the direct sum where dim(H ground ) = dim(H excited ) = 3 and dim(H isc ) = 1. We define a basis for H as where {+1, 0, −1} are spin labels corresponding to the eigenvalues of S z = diag(1, 0, −1), and (g, e, s) refer to optical ground, excited and singlet, respectively. In this way we can write, for example, the spin-1 z operator in the optical ground state as S z ⊕ 0 ⊕ 0 = |g, +1 g, +1| − |g, −1 g, −1|.
Given an external magnetic field applied along the z-axis with strength ω e (in angular frequency units), the Hamiltonian of the system is given by The terms ∆ g ≈ 2.87 GHz and ∆ e ≈ 1.4 GHz denote the zero-field splittings. As the name implies, they are energies that enter into the Hamiltonian without the application of an external field; they result from the couplings between the electrons constituting the NV − center.
Absorption of a photon takes the system from H ground to H excited , and vice versa for spontaneous photon emission. These processes are known to be spin-conserving. Although coherent optical control is possible, we restrict our attention to the more commonly used dissipative regime. Therefore, we describe excitation and spontaneous emission using spin-conserving Lindblad operators, √ γ eg (|g, +1 e, +1| + |g, 0 e, 0| + |g, −1 e, −1|) L 2 = k · γ eg (|e, +1 g, +1| + |e, 0 g, 0| + |e, −1 g, −1|) where γ eg is the rate of spontaneous emission, about 77 MHz. Without the additional dynamics described below, this would imply an average excited state lifetime of about 1/γ ea = 13 ns. The dimensionless parameter k corresponds to the laser power and in our experiments is typically on the order of unity when the laser is on. It can be set to 0 in periods where the laser is off. Spin selective measurement is possible because of an additional decay path that is not spin-conserving and emits photons with a wavelength outside of the 600 nm-800 nm emission spectrum. This route preferentially allows the excited |e, −1 and |e, +1 states to decay to the ground states through the ISC to the state |s . It can be modeled using the Lindblad dissipaters L 6 = γ sg /3 |g, 0 s| (10) The first two move support on the excited |e, −1 and |e, +1 states to the ISC state. The last three spread ISC population approximately evenly (and incoherently) over the ground space. This may seem counterintuitive given  The quantum state is initialized to fully mixed state in the optical ground spin-1 manifold, which is approximately the Boltzmann distribution at room temperature and low magnetic field. Therefore at t = 0, the combined |g, +1 , |g, −1 manifold has twice as much population as |g, 0 . A laser pulse of duration 5µs is applied starting at 1µs. The populations of the subspaces spanned by the pure states in the legend are tracked through time. The values used are k = 0.3, γ eg = 77 MHz, γ es = 30 MHz, γ sg = 3 MHz, and γ 01 = 0.5 MHz. At 6µs when the laser pulse is turned off, the excited states quickly decay to the ground states, and the singlet state slowly leaks back to the ground state; normally we wait around 5 or 10 times the characteristic decay time before resuming activity. Note that full polarization will never be reached due to the nonzero mixing rate γ 01 .
that optical excitation of sufficient duration results in high spin state polarization; the resolution is that the selective decay from |e, +1 , |e, −1 to |s is what actually drives polarization, as measured by Robledo et al. [34]. The rate of the spin-selective decay is roughly γ es = 30 MHz, and comparing this to γ ea , shows that the excited ±1 states take the ISC path roughly 1/3 rd of the time. The lifetime of the singlet is quite long, with a decay rate of roughly γ sg = 3 MHz; this is the time scale that will end up dominating the optimal measurement time. Small non-spin conserving transitions are estimated to have a rate of about γ 01 = 1 MHz [31]. They can be modelled as the Lindblad operators L 8 = γ 01 /4 |g, +1 e, 0| L 9 = γ 01 /4 |g, −1 e, 0| L 10 = γ 01 /4 |g, 0 e, +1| Solving the Lindblad master equation, with some initial state ρ(0) allows us to track the populations and coherences of the quantum system through time.
If we start with spin-state coherences, they will quickly die off due to the mismatch in zero-field splittings between the optical ground and excited states. Since we have no coherence generating terms in our internal Hamiltonian, describing the optical dynamics in a fully quantum setting is overkill. If we were to simultaneously turn on a resonant microwave field and the green laser, the simplifications we will make in Section III B would not apply.

III. MEASUREMENT DYNAMICS
It is possible to gain information about the spin state of the NV system by simultaneously illuminating it and counting the photons it emits in the process. This works because the ISC is spin selective. States initially with support on the subspace span(|g, −1 , |g, +1 ), once excited, have a decay path which does not emit a detectable photon. Therefore, states initially with support in this space, on average, appear dimmer. We now formalize this.

A. Measurement Description
A single measurement of an NV − center consists of turning the laser on for some amount of time (on the order of 500 ns) and counting the spontaneously emitted photons in this duration. We assume in our model that no information about arrival time or spectral properties is recorded.
The probability of a quantum system spontaneously emitting a detectable photon within a short duration dt at time t is given by the product of the rate of spontaneous emission, the length of the time window, and probability of being in the excited triplet manifold [35]: where P e = |e, +1 e, +1| + |e, 0 e, 0| + |e, −1 e, −1| is the projector onto the excited subspace H excited and ρ(t) ∈ D(H) is the state of the system at time t. This defines an inhomogeneous Poisson process, where the rate of events is time-dependent, given by This is a generalized version of the more common homogeneous Poisson process, where the event rate λ is constant, and the probability of k events in the time duration t is given by Pr (k) = e −λt (λt) k /k!. An inhomogeneous Poisson process has a similar formula given by where n e is the number of emitted photons during the interval [0, ∆t]. The expected number of emitted photons is then With our typical parameters, the expected number of emitted photons is on the order of a dozen. Optimal measurement times ∆t will be discussed in Section VI F. Notice that we are conditioning on what we call the pre-measurement state, ρ(0). If the parameters in the dynamics are known and fixed, then so too is ρ(t) for 0 ≤ t ≤ ∆t given the pre-measurement state. It follows that the inhomogeneous rate function µ(t) is conditioned upon the pre-measurement state, µ(t) = µ(t|ρ(0)), but we often omit this for notational simplicity, writing simply µ(t).
Given the parameters of the Hamiltonian and Lindblad dissipaters, the pre-measurement state, and an integration time, we now have a concrete method of calculating the expected number of photons emitted.

B. The Rate Equation Simplification
Due to the absence of coherence, a full open quantum system simulation of ρ(t) is unnecessary to compute µ(t). Instead, we can reduce our dynamics to a rate equation picture without further approximation.
To this end, we define the probabilities as well as the vector p(t) = (p g0 (t), p g1 (t), p e0 (t), p e1 (t), p s (t)) T ∈ [0, 1] 5 . Here, ρ(t) is the solution to the Lindblad master equation (13). Notice that the components of this vector sum to unity, 5 i=1 p i (t) = 1 because the projectors used in the definitions of its components sum to the identity.
Combining the master equation with the definitions from Equation (18), we can compute the time evolution of each component of p(t). For example, we havė p g0 (t) = Tr[|g, 0 g, 0|ρ(t)] where we have skipped a few lines of algebra. In this way we end up with a set of coupled linear differential equations involving rates from the Lindblad operators which can be described by the matrix DE which we write in condensed notation as˙ Notice that the columns of R sum to 0 which ensures that p(t) remains a probability vector as it evolves. This condensation of the Lindblad master equation into a rate equation of probabilities is possible because, in our special case, the Hamiltonian commutes with the projectors and the Lindblad dissipaters have no dynamics within these subspaces. This assumption would break if the NV were placed in a magnetic field with off-axis field components comparable to the zero-field splittings, or if near-resonance microwave fields were turned on during the laser pulse. The solution of Equation (21) is p(t) = e tR p(0). Thus the inhomogeneous Poisson rate from Equation (15), µ(t) = γ ea Tr [P e · ρ(t)], can be simplified to µ(t) = γ ea (p e0 (t) + p e1 (t)), or in terms of the initial state, where m = (0, 0, 1, 1, 0) T is the projector onto the excited space, and where we have assumed that the pre-measurement state ρ(0) has support only on H ground . This new expression is much more tractable as it involves just a 5 × 5 matrix exponential, instead of a 49 × 49 matrix exponential in superoperator space. It also makes it simpler to derive the following relationship: µ(t|ρ(0)) = γ ea m · e tR p(0) = γ ea m · e tR (Tr[|g, 0 g, 0| ρ(0)], Tr [(|g, −1 g, −1| + |g, +1 g, +1|)ρ(0)] , 0, 0, 0) T = Tr[|g, 0 g, 0| ρ(0)]γ ea m · e tR (1, 0, 0, 0, 0) T + Tr [(|g, −1 g, −1| + |g, +1 g, +1|)ρ(0)] γ ea m · e tR (0, 1, 0, 0, 0) T = Tr[|g, 0 g, 0| ρ(0)]µ(t, |g, 0 g, 0|) + Tr [(|g, −1 g, −1| + |g, +1 g, +1|)ρ(0)] µ(t, |g, +1 g, +1|) = p · µ(t, |g, 0 g, 0|) + (1 − p) · µ(t, |g, +1 g, +1|) (23) where p = Tr[|g, 0 g, 0| ρ(0)], again assuming ρ(0) has support only on H ground . Note that the choice of using |g, +1 rather than |g, −1 (or any superposition of both) was arbitrary. Therefore the rate of photon emission during measurement given the pre-measurement state ρ(0) is the convex combination of the the photon emission rates of the states |g, 0 and |g, +1 , where the convex combination parameter is the overlap of ρ(0) with |g, 0 . It is this relationship that ultimately justifies the notion of taking reference measurements to calibrate the signal measurement. An immediate corollary of this is that Equation ( which says that the expected number of emitted photons during measurement for the pre-measurement state ρ(0) is the convex combination of the expected number of photons for the pre-measurement states |g, 0 and |g, +1 . We already know that n e is always a Poisson distribution for any pre-measurement state (Section III A), and that a Poisson distribution is characterized completely in terms of its expected value, therefore once µ 0 := E[n e | |g, 0 g, 0|], µ 1 := E[n e | |g, +1 g, +1|], and p = Tr[|g, 0 g, 0| ρ(0)] are known, everything about Pr (n e |ρ(0)) is also known: which is a more tractable version of Equation (16).

C. Measurement Visibility and Noise
There are two mechanisms that will affect our photon counting: photons can get lost along the way to the detector, or photons can be detected that did not originate from the NV. In Section III A we derived the probability of the NV emitting some number of photons during a measurement, Pr (n e ), and in Section III B we provided a simpler formula for the same quantity.
In this section we introduce two new variables, Γ and η. Let Γ be the rate of dark counts per unit time, due both to noise in the photon counter itself and to stray photons. Similarly, let η be the probability that a photon emitted by the NV center will be collected by the detector. This parameter is largely determined by the quality of the confocal microscope and the solid angle of emitted photons in view.
It is useful to define n dc as the random variable representing those detected photons which were dark counts, and n td as the random variable representing those detected photons which originated from the NV, dubbed 'true detections'. We have the relationship n d = n dc + n td (26) where n d is the random variable representing all detected photons during a single measurement window. Note that n dc and n td are independent random variables, and also, that they are not in practice distinguishable. Assuming the rate of dark counts is constantly equal to Γ over the a measurement duration ∆t we simply have The true detections are slightly more complicated. Each photon emitted by the NV has an (independent) probability η of arriving at the detector and being detected. We therefore have the conditional distribution n td |n e ∼ Binom (n e , η) where we are conditioning on a particular number of photons being emitted by the NV, n e . Recall that n e is Poisson distributed with a mean which we label µ = En e := E[n e |ρ(0)] for now. We can therefore use the law of total probability to compute Pr (n td = n|n e = m) Pr (n e = m) which shows that n td is also Poisson distributed, with a mean given by ηµ. Since the sum of two independent Poisson variables is also Poisson, we conclude that the random varibale of interest, n d = n td + n td , is Poisson distributed, This, in combination with (24), gives where µ 0 := E[n e | |g, 0 g, 0|], µ 1 := E[n e | |g, +1 g, +1|], and p = Tr[|g, 0 g, 0| ρ(0)]. Therefore, just as µ 0 and µ 1 served as the maximum and minimum reference counts for the expected emitted number of photons, E[n e ], we see that in the case of measurement visibility and noise, the numbers (Γ∆t + ηµ 0 ) and (Γ∆t + ηµ 1 ) serve as the new references for E[n d ].

IV. STATE INITIALIZATION
Simply illuminating the NV center for a period of time with a laser, and then waiting for the system to settle, results in a highly polarized spin state. This can be shown by analysing the steady state solution to the rate equation. Importantly, the equilibrium state is unique, so that no matter what the initial conditions are to the initialization procedure, the final state is the same.

A. The Steady State Solution to the Rate Equation
The rate equation derived in Section III B, while much simpler than the Lindblad master equation it was derived from, is still not analytically solvable unless certain terms like γ 01 are assumed to be zero. In order to estimate, for example, the time required to initialize the NV center, it will be helpful to consider the steady state solutions to the rate equation.
The rate matrix R from Equation (21) has only non-positive eigenvalues, and must have a non-trivial null space. To see this, examine the solution to the rate equation, p(t) = e tR p(0). If R had positive or complex eigenvectors, the probability vector p(t) would either blow up or gain complex entries, neither of which are allowed under a Lindblad master equation. Moreover, if all of the eigenvectors were strictly negative, p(t) would asymptote to 0, and would therefore violate conservation of probability, hence at least one of the eigenvectors must be 0.
The null space completely specifies the steady state solution. Indeed, suppose that we have an initial set of populations p(0) and decompose this into an eigenbasis v 1 , ..., v 5 for R, where it holds that v 1 , .., v k all have eigenvalue zero, and v k+1 , .., v 5 have strictly negative eigenvalues λ k+1 , ..., λ 5 . This gives p(0) = 5 i=1 a i v i for some real coefficients a 1 , ..., a 5 . The evolution is now described by which is to say that only the population that was originally within the null subspace can remain in the steady state. This analysis also shows that the non-zero eigenvalue with the smallest absolute value largely determines the rate of decay -populations in this subspace take the longest to die. As a technical aside, note that the rate matrix R is not normal, so that its eigenspaces are not orthogonal. This means that the population of the null space cannot be naively computed by projecting onto that subspace; a full linear inversion must be performed.
When the laser is off, we have k = 0 and the null space is two dimensional, spanned by (1, 0, 0, 0, 0) T and (0, 1, 0, 0, 0) T , which correspond to the populations of the optical ground state -this is not surprising. The three remaining eigenvalues are −(γ eg + γ 01 /2), −(γ eg + γ es + γ 01 /2), and −γ sg . The first two die off quickly -these correspond to population escaping from the optically excited states. The last rate, γ sg , is the most important, having an eigenvector (1/3, 2/3, 0, 0, 1) T , corresponds to population slowly seeping out of the singlet state and entering the three optical ground state levels. State initialization is complete once the singlet state is sufficiently depleted. The process is exponential, so one need only wait some small multiple of 1/γ sg before the population of |s is vanishingly small.
When the laser is on, so that k > 0, the eigenstructure of the rate matrix is not as tractable, but we can still make progress. Firstly, it can be shown that the null space is now only one dimensional. We denote the probability vector spanning the null subspace as p ss = (p ss g0 , p ss g1 , p ss e0 , p ss e1 , p ss s ) T . This is extremely important because it validates the NV − initialization procedure: no matter what the quantum state was before the laser is turned on, if you let it equilibrize under optical illumination, it will reach a unique steady-state.
Under the approximation that γ 01 = 0, we have p ss = (1/(1 + k), 0, k/(1 + k), 0, 0) T which has support only on the m S = 0 spin state. We see that if k = 1 so that the rate of spontaneous emission is equal to the rate of optical excitation, the ground and excited m S = 0 populations equilibrize to the same value of 1/2. The null eigenspace is much more complicated when γ 01 > 0. Though an analytic expression can be derived, it is a large unhelpful mess of divisions and multiplications of the various rates which we have been unable to simplify. To gain some insight, in Figure 2 we plot the components of p ss as a function of γ 01 . As γ 01 increases, two properties are apparent: the steady-state population of the m S = ±1 spin states increases, as expected, but also, the steady-state singlet state population increases dramatically. The latter effect occurs because the singlet state's relatively long lifetime makes it a good storage location, and now due to γ 01 , there is always some population to feed it. Indeed, expanding p ss in a γ 01 power series about 0, the steady state population of the singlet is p ss s = 3kγ01 2(k+1)γsg + O(γ 2 01 ). Due to small but important spin-non-conserving terms in the master equation during continuous optical excitation (Equation (12)), the steady state density matrix must have non-zero support on the states |g, −1 , |g, +1 , |e, −1 , and |e, +1 . This can be seen in Figure 2. This means that the initialization process does not asymptote to the pure state |g, 0 , but rather to a mixed state ρ 0 which is mostly pure. Since our optical model does not distinguish between m s = +1 and m s = −1, they must have equal population in the pumping steady-state, and therefore ρ 0 can be written as a pseudo-pure state [36], where the purity parameter q depends in a non-trivial way on all of the parameters of the rate equation, but generally decreases as the rates of spin-non-conserving processes increase. It is pseudo-pure in the sense that any unitary (and in general, unital) process only acts on the first term, so that its lack of complete purity, in practice, serves only to limit the contrast of measurement.
To see this, we begin by computing the expected number photons emitted during the measurement of our preparation procedure ρ 0 using Equation (24), Next, if prior to measuring ρ 0 we perform an operation |g, 0 → |g, +1 , which may be implemented as an adiabatic inversion with a microwave pulse, we have a pre-measurement state which when measured emits an expected number of photons Previously we had considered µ 0 and µ 1 as the quantities that define the reference measurements in the absence of noise. However, given that ρ 0 is the best achievable initial state using standard techniques, it is in practice more convenient to use µ 0 and µ 1 . Indeed, if we prepare the state ρ 0 and perform any unital operation resulting in a pre-measurement state then some simple algebra shows that where p = Tr[ρ ψ |g, 0 g, 0|]. Finally, if we take into consideration finite visibility η and dark count rate Γ as discussed in Section III C, we may define to arrive at which is analagous to Equation (31) but for our pseudo-pure state preparation. The quantity γ, which we will call the signal, is by definition conditioned on the pre-measurement state ρ, and will henceforward generically refer to the expected number of detected photons given the pre-measurement state of interest or, equivalently, the experiment of interest which when performed on the preparation state ρ 0 yields ρ. We will call the quantities α and β our references because they bound the expected values of detected photons for arbitrary states of the form in Equation (38). More specifically, α is the bright reference and β is the dark reference since α > β.

V. DRIFTING REFERENCES
One of the main complications of experimental NV measurement is drifting of the references α and β in time. Though there are many mechanisms which can cause this, for us, the most prevalent is due to relative movement between the NV center and the focal spot of the confocal microscope. This region of focus has a 3D Gaussian profile, and as temperatures or other properties of the lab change in time, movement of the center of this region off of the point-sized NV center causes both a drop-off of delivered laser power, k, and collection efficiency, η. Other mechanisms may include fluctuation of the laser power, and quality of the confocal microscope's alignment, which will affect η, k, and Γ. If left unchecked, the drift would eventually cause the NV under study to no longer be in the focal region at all. To avoid this, a tracking procedure is periodically run, whose purpose is to recenter the NV with the focal region by taking a series spatial of images and using feedback to realign.

A. Experiment Ordering
Experiments typically have a set of parameters that are varied. For the sake of concreteness, we choose the specific experiment and repetition ordering described in Figure 3; variations of this ordering may require a modified (though similar) analysis to that which follows. We denote the list of experiment parameter configurations as a 1 , ..., a S . For example, in the case of a Ramsey experiment for magnetometry [37], the distance between two π 2 pulses is varied, so that a s = (t s ) where t s is the pulse spacing. If additionally the phase of the second Ramsey pulse is varied in proportion to the spacing, we have instead a s = (t s , ωt s ) for some angular frequency ω. A parameter configuration a ∈ { a 1 , ..., a S } is fixed and a large number, N , of back-to-back repetitions with this parameter configuration is performed before moving on to the next. The choice of N is motivated, for instance, by the time required for experimental control hardware to switch between choices of configuration a. Each time all parameter configurations have been dealt with, the tracking procedure is run, and then the entire procedure is repeated R times, which we call averages.
A signal measurement γ depends only on the current value of α and β, the current parameter configuration a, and any noise operations acting on a. We denote the true value of α, β, and γ at the n th repetition of the s th Figure 3: The considered experimental ordering. Another popular ordering transposes the inner two levels. Given a particular parameter configuration of the experiment, a ∈ { a 1 , ..., a S } (in the above example, the parameter is the distance between the two last microwave pulses), N repetitions are performed of both the experiment, γ, and the references, α and β. The bright reference α is measured by initializing with a laser pulse, waiting for metastable optical states to decay, and taking a measurement by opening the APD counting gate while the laser is on. The dark reference β is similar, except an inversion pulse is applied prior to measurement. The pulse sequence prior to the reference measurement γ depends on the current parameter a. Each time N repetitions have been made of all S parameter configurations, the system decides whether to track or not, and this is all repeated R times. A sketch of the resulting data is shown, averaged over both N and R.
experiment a s in the r th average as α n,s,r , β n,s,r , and γ n,s,r , for 1 ≤ n ≤ N , 1 ≤ s ≤ S, and 1 ≤ r ≤ R. It holds that γ n,s,r = p s α n,s,r + (1 − p s )β n,s,r where we have used our assumption of the independence of p from the drift processes to label p by only s. This yields the random variables X n,s,r ∼ Poisson (α n,s,r ) Y n,s,r ∼ Poisson (α n,s,r ) Z n,s,r ∼ Poisson (α n,s,r ) with the corresponding variates (x n,s,r , y n,s,r , z n,s,r ) N,S,R n,s,r=1 .

B. Combining Experiments
Due to low visibility, the repetition number N in Figure 3 is often quite high. It is usually cumbersome to store the results of the experiment for each individual measurement. We therefore assume that the data is summed over the N repetitions. Because of the additive property of the Poisson distribution, if we define X s,r : where α s,r := N n=1 γ n,s,r , β s,r := N n=1 γ n,s,r , and γ s,r := N n=1 γ n,s,r . In a slight abuse of notation, in Section VI and on, when we talk about unscripted α, β, and γ, we are referring to α s,r , β s,r , and γ s,r for a particular index (s, r) where some N ≥ 1 is implicitly understood. Furthermore, in this context δ will refer to N n=1 Γ n,s,r ∆t, that is, the total contribution to the dark counts. Similarly, ν will refer to N n=1 η n,s,r (µ 0 ) n,s,r and κ to 1 ν N n=1 η n,s,r (µ 1 ) n,s,r . The fraction 0 < κ < 1 represents how much dimmer the reference state ρ 1 is as compared to ρ 0 . This results in the relationship

C. Correlations of Model Parameters
The three quantities µ 0 , µ 1 , and η all depend on the quality of the coupling between the confocal microscope and the NV defect; as the relative displacement between the defect and the center of focus drifts, η decreases. This could be due to, for example, temperature changes in the lab which expand or contract the components on the optical table. However, µ 0 and µ 1 also change with decreased optical coupling because the preparation state ρ 0 depends on the laser power which is coupling dependent. Therefore we expect correlations between µ 0 , µ 1 , and η.
Nominally Γ∆t should be independent of each of the quantities η, µ 0 , µ 1 , and p. However, this may break if the power of the laser varies in time and a significant portion of the dark counts are caused by unwanted reflections of, or excitations due to the laser; we may end up with correlations between Γ∆t and each of η, µ 0 , and µ 1 .
Finally, and perhaps most importantly, the quantity p will normally be independent of the quantities η, µ 0 , µ 1 , and Γ∆t. However, it is also possible for this to fail. For example, if the process that takes the preparation state ρ 0 to the pre-measurement state ρ 0 is non-unital, we will not end up with exactly the form qρ φ + (1 − q)I/3 (as seen in Section IV B) leading to errors when inferring p from pα + (1 − p)β. Non-unitality could occur due to T 1 relaxation, or leakage of laser light when it is supposed to be off.

D. A Stochastic Model of Drift
The references α and β are best viewed as stochastic processes with autocorrelations in time. As discussed above, they will also be correlated with each other. They will undergo a discontinuous jump every time a tracking operation is performed.
As derived in Section IV B, conditioned on set of parameters, including k, Γ, ∆t, η, γ es , γ sg , γ eg , and γ 01 the references are given by α = Γ∆t + ηµ 0 and β = Γ∆t + ηµ 1 at a particular instance in time. Given the number variables and unknowns that likely go into the drift process itself (which will in turn affect k, Γ, and η) on top of the already complex conditional model stated above, writing down an analytic model for the stochastic processes α and β would be difficult, if not impossible.
We therefore restrict our attention to simpler effective models which still well describe expected and observed behaviour. We assume that the stochastic process (α, β) is a Gaussian process. This is a weak assumption especially given that in later sections we will only make use of the first two moments.
We may relate this continuous stochastic process to the discrete variables defined in Section V A with a standard discretization as follows. Consider the r th average of the experiment and a particular realization of the stochastic process (α(t), β(t)) during this average. Then we have that α n,s,r = α(t n,s ) and β n,s,r = β(t n,s ) where t n,s is the time of the n th repetition of the s th experiment relative to the start of the r th average. This gives, for example, the discrete Cox process X n,s,r ∼ Poisson (α(t n,s )).
We take the time value t = 0 to mean the time directly after performing a tracking operation. The tracking operation has the effect of drawing the initial values α(0) and β(0) from a fixed normal distribution where the variances in Σ 0 are determined by the noise and error in the tracking procedure, and the mean is a property of the confocal microscope's quality and the NVs optical properties. Assuming a Gaussian stochastic process leads to the distribution at time t ≥ 0. As a concrete example, consider the stochastic model Here, Γ∆t are the expected dark counts in a single measurement, which we have assumed to be deterministic and constant for simplicity. However, the expected number of photons due to the bright state ρ 0 , denoted an ν, is an Ornstein-Uhlenbeck process with long-time mean 0, volatility σ ν , mean reversion speed θ ν , and initial value α 1 − Γ∆t, where the initial value is marginalized over Normal (α 0 , σ α ) representing imperfections in the tracking process. This implies that, on average, α(0) = α 0 and α(∞) = Γ∆t with average decay rate µ ν , where 'shakiness' in getting there is determined by σ ν . In order to correlate α with β, and also to help enforce the physical constraint β ≤ α, we relate the two components with another Ornstein-Uhlenbeck process κ. Note that this model does not guarantee, for example, that α(t) > 0 or that 0 ≤ κ(t) ≤ 1. We can only make these scenarios improbable by choosing low volatilities. This is the compromise of having such a simple model in terms of the well known Ornstein-Uhlenbeck Gaussian process.
Solving for the moments of this stochastic process we arrive at the expressions These calculations can be found in Section B. In Figure 4, these moments are plotted along with a single random trajectory of the stochastic process defined above. The purpose of this section has been to demonstrate that it is possible to meaningfully model the drift process as a stochastic process, which leads to reference count variances and covariances at each time step. These variances will correspond to the moments of the hyperparameter distribution for the references described in Section VI A.

VI. STATISTICAL MODELS OF MEASUREMENT
In this section we state the measurement of an NV center as a statistical inference problem. This statement is a direct result of the derivations of the previous sections, however, it is intentionally written so that previous sections may be ignored by those presently uninterested in the physical derivation of the model.

A. The Statistical Model of Measurements
Consider the inner-product space H = C 3 with the canonical basis |1 = (1, 0, 0) T , |0 = (0, 1, 0) T , and |−1 = (0, 0, 1) T . We call a state of the system prior to the measurement procedure the pre-measurement state. Suppose that the pre-measurement state of interest is given by the density matrix ρ ∈ D(C 3 ). Define p = Tr[ρP 0 ] where P 0 is the projector onto |0 . In the case of a strong quantum measurement we would have access to random variables drawn from the distribution Bernoulli (p). Instead, however, we have access to the random triplet (X, Y, Z)|α, β defined by where α is the expected number of photons collected in N independent measurements of the pre-measurement state |0 , β is the expected number of photons collected in N independent measurements of the pre-measurement state |1 [50], and γ is the expected number of photons collected in N independent measurements of the pre-measurement state ρ. It is always true that 0 ≤ p ≤ 1 and 0 ≤ β ≤ γ ≤ α.
The references α and β are in turn random variables drawn from the distribution with σ αβ > 0. The normality of this distribution is typically irrelevant, and is stated as such just to be concrete. We are usually only interested in its first two moments. The 'true' distribution is almost certainly quite complicated, and arises from the stochastic processes described in Section V. In later sections this multinormal distribution will be replaced with a product gamma distribution, or a mixture of product gamma distributions, as discussed in Section F. To be clear, note that when a variate (x, y, z) is sampled from this distribution, all three variates are conditional on the same values of α and β.
This creates a hierarchical model with nuisance hyperparametersᾱ,β, σ α , σ β , and σ α,β . We will see that this second layer is rarely useful, so that the conditional model (X, Y, Z)|α, β should usually be used in practice. However, it is important to remember that the second layer exists, because it makes it clear that multiple identical samples cannot be taken from the conditional model. Indeed, the values of α and β will change each time the set of N measurements are made.

B. Moment Calculations
We first work out the first two moments of the random variables X, Y , and Z, both conditional and unconditional on the hyperparameters α and β.
The conditional moments are trivially given by using basic properties of the Poisson distribution. The law of total expectation can be used to compute which shows that the variances have two parts, one due to the usual finite sampling error of a Poisson variable, and one due to the underlying fluctuation of the Poisson parameters.

C. Three Advantages of the Conditional Model
If the measurement model is not stated as concretely as it was in Section VI A, there can be a slight subtlety in the interpretation of random variates. Misunderstanding this point could result in reporting incorrect error bars or confidence intervals/credible regions. In an attempt to be as clear as possible, we illustrate with an example.
Suppose Yves and Zoey together collect R variates of the random variable (X, Y, Z), sampled identically and independently, giving the results (x 1 , y 1 , z 1 ), (x 2 , y 2 , z 2 ), ..., (x R , y R , z R ). They go back to their separate offices and try to analyse the data. Yves calculates the sample mean and variance of x 1 , x 2 , ..., x R as x samp = 1 R R r=1 x r and σ 2 x,samp = 1 He gets x samp = 200 and σ x,samp = 20. Looking at Equation (52), and using the standard error of the mean, this informs his rough belief thatᾱ ≈ x samp ± σ x,samp / √ R = 200 ± 20/ √ R. Zoey recalls that each variate x r was obtained by summing N independent measurements. She wonders why they bothered batching the results into R sets, and why they didn't just take N × R measurements to begin with. She therefore does a sum to get the new quantity x sum = R r=1 x r . She knows that for each 1 ≤ r ≤ R, x r was drawn from the distribution Poisson (α r ) for some specific but unknown value of α r . Therefore, knowing the summative property of Poisson distributions, she correctly deduces that x sum was sampled from Poisson R r=1 α r . The Poisson distribution's standard deviation is the square-root of its mean, she therefore adopts the rough belief that R r=1 α r ≈ x sum ± √ x sum , and therefore that 1 R. Zoey and Yves get back together and compare their results, and wonder why Zoey is more confident than Yves about the quantity she has estimated, when, naively, it seems that they have estimated the same thing with the same data.
The discrepancy comes down to the fact that they are estimating parameters using different models. Yves is estimating the hyperparameterᾱ from the hierarchical model in Section VI A, justified by the moment calculations in Section VI B. Zoey is foregoing the hyperparameter layer of the model and making a direct inference about the sum of the particular references α 1 , α 1 , ..., α R they happened to draw in their measurements. Neither is wrong, they are simply estimating different but related quantities.
We saw in Section VI B that Var[X|α] =ᾱ and Var[X] =ᾱ + σ 2 α . Yves was assuming he made R independent measurements of X and Zoey was assuming she made a single measurement of X|α with a combined α = R r=1 α r . Therefore the difference between their error bars is due to σ α .
The end goal, of course, is not to estimate the reference α or its mean, but to estimate quantities related to the quantum state like p = Tr[ρP 0 ]. However, the better one's accuracy in estimating the references, the better one's accuracy in p. Therefore, given the above discussion, it is apparent that there is no advantage to drawing multiple samples of (X, Y, Z) when it is possible to increase the number of measurements N instead, often by adding samples together as Zoey did. There are, however, special circumstances where drawing multiple samples is desirable. This occurs in cases where p is not fixed shot to shot, but is instead drawn from a distribution on each shot.
There is a second and less obvious advantage to Zoey's method. Yves' model assumes that the multinormal distribution on (α, β) is a good approximation to the moments of the stochastic process governing α and β discussed in Section V D. This will usually be good enough. But if, for example, there are daily temperature patterns in the lab which significantly affect optical alignment, this could cause the normal approximation to be inaccurate. Zoey's model, however, makes no assumptions about the nature of the drift.
Finally, the third advantage of the conditional model is that it is simpler and therefore more tractable. Just solving for the MLE of the hierarchical model analytically would be difficult or impossible; it is painful enough for the conditional model.

E. Generalized Inference Problems
In the previous subsection, the inference problem was stated such that the survival probability p = Tr[ρP 0 ] was the quantity of interest. We may of course modify this if some other quantity is preferred.
For example, suppose we are interested in state tomography. We define the ideal unitary operators {U 1 , ..., U 9 } ∈ U(H) in such a way that {U n |0 0| U † n } 9 n=1 is a basis for L(H). Our scheme is to prepare ρ, implement the gate U n for some n, and measure the resulting state, so that p n = Tr[U n ρU † n P 0 ] with corresponding random variables Then our likelihood function becomes L(ρ, α, β|x, y, z 1 , z 2 , ..., z 9 ) = pdf P ois (x; α) · pdf P ois (y; β) · 9 n=1 pdf P ois (z n ; p n α + (1 − p n )β) and we are interested in inferring ρ ∈ D(H). This will be similar for the hierarchical model. We have taken one set of reference measurements for all Z 1 , ..., Z 9 . We could also choose to take one for each, resulting in the data (x 1 , y 1 , z 1 ), ..., (x 9 , y 9 , z 9 ) with a similar likelihood function of the form L(ρ, α 1 , β 1 , ..., α 9 , β 9 |x, y, z 1 , z 2 , ..., z 9 ). These sorts of details come down to the particulars of the experimental implementation. It is clear that any measurement inference problem can be stated in a similar way, such as process tomography, and Hamiltonian parameter inference, as will be seen in Section VIII.

F. Fisher Information and the Cramér-Rao Bound
The average curvature of the likelihood function provides a measure of how informative data are. Generally, a highly curved (unimodal) likelihood function implies a tight region of support, so that data will tell you a lot about the parameters of interest. This is formalized by Fisher information and the Cramér-Rao bound [38]. Given a likelihood function L( θ| d) of parameters θ given data d, the Fisher information is the average curvature of L, more specifically, it is the negative expected Hessian matrix of the log-likelihood, The Cramér-Rao bound asserts that no unbiased estimator of θ can outperform this intrinsic curvature. Namely, ifθ is any unbiased estimator, which takes data d and outputs estimates of the true value of θ, then its covariance is lower-bounded by the inverse Fisher information matrix, As good estimators will often make this inequality nearly tight, the inverse Fisher information sets a benchmark for estimators to aim at. There is a generalized inequality for biased estimators that will be used in Section VII A. The Fisher information matrix of the conditional model (X, Y, Z)|α, β can be computed exactly as with an inverse matrix given by See Section C for the calculation. The Cramér-Rao bound for the top left entry states that for any unbiased estimatorp given the data triple of photon counts (x, y, z). This bound is plotted in Figure 5 for the slice β = α/2. In Section VII C we will see that this bound is very close to the average error incurred by a few important estimators, including the widely used maximum likelihood estimator, and in many disparate but relevant parameter regimes. This means that this inequality is perhaps better thought of as an approximation, except in exceptionally low contrast regimes. We can use this result to derive a formula that tells us roughly how much data we will need to collect in order to lower the error bars on p to a specified level. We approximate thatp ∼ Normal p, (I(p, α, β) −1 ) −1/2 1,1 with α and β known (whereas we will generally only have estimates of them). This gives the 100(1 − ζ)% confidence interval forp where c ζ/2 = √ 2 erf −1 (1 − ζ) with erf(x) = π −1/2 x −x e −t 2 dt the error function. Supposing a reference contrast of C = α−β α+β , we need to makep ≈ p ± ∆p a 100(1 − ζ)% confidence interval [51]. To derive this formula we have assumed the worst case, p = 1. These calculations show that, for example, if we desire a 95% confidence interval of ±0.01 for p, then we need to do at least enough experiments N so that α is on the order of ∼ 170, 000.
This heuristic holds for any optical transition rates and choice of measurement time ∆t in the measurement protocol detailed in Section III. However, certain choices of measurement time are better than others. We can use the Cramér-Rao bound to estimate the optimal such time, namely, we wish to choose the measurement time ∆t that maximizes the temporal information density of p. Begin by supposing that the total runtime of a fixed experiment, including taking bright and dark reference counts, is T = N (T e + 3∆t), where N is the number of repetitions, and T e is the amount of time per repetition not spent counting photons (initialization, wait periods, pulse sequences, etc.). We must multiply ∆t by 3 to account for all three of the signal, bright reference, and dark reference counting windows. Writing α = N α and β = N β, with α and β the average per-shot reference values, again at the worst case p = 1, gives ∆p 2 = 2α (α−β) 2 , or rearranging, gives We have written α = α(∆t) and β = β(∆t) to emphasize their implicit dependence on ∆t. These two functions are easily estimated experimentally by sweeping the length of the measurement window. Then for any given T e , the the quantity ∆p √ T can be be minimized, visualized in Figure 6. As the experiment time T e grows, it becomes increasingly worthwhile to lengthen the duty cycle of measurement. This formula and its units are analogous to widely used magnetometry sensitivity formulas; see Taylor et al. [37] or Hirose et al. [39] for two examples out of many.
Of note is the steep increase of ∆p √ T as ∆t → 0. While the slope is relatively gentle as ∆t gets larger, causing little harm even if ∆t is twice as big as the optimal value for a given T e , there is a large penalty for choosing a measurement of ∆t which is too short. Long refocusing sequences like CPMG are especially at risk of falling into this trap.

VII. SIMPLE ESTIMATORS
The previous section discussed inference models of NV measurement in some detail. In this section, we introduce two estimators for the basic inference problem of the conditional model defined in Section VI D. We also compare their relative strengths and weaknesses.

A. Maximum Likelihood Estimator
The most obvious estimator turns out to be quite a good one, and the one that has been used almost universally in practice. Suppose (x, y, z) is a variate of (X, Y, Z)|α, β. Equation (51) shows that x, y, and z are unbiased estimates of α, β, and pα + (1 − p)β, respectively. We invert the equation pα + (1 − p)β for p substituting in our estimates above to get the estimatorp for p. Although appearing quite simple, it is difficult to work with this estimator analytically; it is the ratio of two correlated Skellam distributions which does not have many nice properties. However, it can be shown with Lagrange multipliers that this is the maximum likelihood estimator (MLE) of the model, that is, z−y x−y , x, y is the (unique) maximum of the function L(p, α, β|x, y, z) on the domain 0 ≤ p ≤ 1, 0 < β ≤ α, for any x, y, z ≥ 0. See Section D 1 for details.
There is always a finite probability that x = y, in which case this estimator will divide by zero. With sufficient magnitude of and contrast between α and β this is highly unlikely. It still poses a problem if we wish to prove anything about it, for example, if we wish to find its expectation value. To avoid this situation we define the slightly modified estimatorp for some non-integer value of . One might consider using this estimator instead of the MLE if x = y has a significant probability. After quite a bit of work, shown in Section D 2, it is seen that the bias ofp MLE, is non-zero, a linear function of p, and exactly given by the integral  . It is seen that in this regime the payoff of using the optimal measurement time is rather slim.
Taking the limit as approaches zero gives an expression for the bias of the original estimator, This integral can be solved numerically to find the exact bias. If α+β 1, we can derive an asymptotic approximation to this integral, which through numerics can be shown to be valid for α + β 300. Note that the bias vanishes at p = β α+β , and that if the contrast C = α−β α+β is fixed, then the worst case bias scales as α+β (α−β) 2 = O (α + β) −1 . Although estimators with no bias are generally preferred, we see that this estimator has the more important property of being consistent, meaning that Bias[p MLE ] → 0 as α → ∞ with fixed contrast. We can make an even stronger statement by using the Cramér-Rao bound for biased estimators, which is a generalization of (58) known as the van Trees inequality or the Bayesian Cramér-Rao bound [40], stating that where Jθ(p, α, β) is the expectation value of the Jacobian matrix of the possibly biased estimatorθ. Using the approximation from (69), this gives us the inequality for the maximum likelihood estimator, again assuming that the contrast is fixed as α and β increase. This is approximately the same bound as the unbiased Cramér-Rao bound discussed earlier. Indeed, this is generic, as the van Trees inequality approaches the Cramér-Rao bound for large data sets, such that incorporating prior information can be thought of as an important correction for finite data sets [41].
This sequential break down is useful because a conjugate prior can be found for the likelihood Pr (x, y|p, α, β) in a couple of useful cases, meaning the posterior π * (α, β) can be computed exactly. Formulas for two different conjugate priors for the references are given in Section F. The full likelihood L, however, almost certainly does not have a conjugate prior. Given the posterior distribution π * , there are many choices for the estimator. The most common is the mean square error (MSE) Bayes estimator which simply takes the expectation value of π * . For the particular parameter p, we havê This is known to be the estimator which minimizes the expected MSE of the estimate over all possible estimators, (76)

C. Comparing Estimators: Risk
We wish to compare the quality of the above estimators. Since one estimator is frequentist and one is Bayesian, we choose to accept the idea of a 'true value' of p. Given an estimatep(x, y, z) of p which depends on the data (x, y, z) The estimators under study are the maximum likelihood estimator,p MLE , the bias corrected estimator (see Section E),p BCE , and the Bayes estimator,p Bayes , with two different priors. These priors are denoted by "Bayes" and "Bayes-10", with the latter being a more conservative prior corresponding to a ten-fold increase in the assumed covariance, as explained in the main body. Sharp peaks for the Bayes estimators are artefacts of the coarse sampling along the x-axis; risk was evaluated at p ranging from 0 to 1 in steps of 0.05. The risks ofp MLE andp BCE are much bigger than 1 for the low-contrast regime due to the common occurence of y > x, and are therefore not plotted.
drawn from (X, Y, Z)|α, β, there are many ways of quantifying the distance between the estimate and the true value. We consider the mean-squared-error (MSE) loss function It is reasonable to assume that the operators of a given experimental setup will have a rough idea of what to expect as their reference counts. Or, given the results of a set of experiments, all of the reference counts can be pooled to empirically construct a distribution of reference counts. As such, we assume the existence of a probability distribution P S (α, β) = Pr(α, β|experimental setup S) which characterizes a particular setup called S (assuming a fixed number of shots, N , per experiment).
Given an estimatorp, we can define its associated risk with respect to the true values (p, α, β) as the average value of the loss, R(p, p, α, β) = E x,y,z|p,α,β [L MSE (p, p)] = ∞ x,y,z=0 L MSE (p(x, y, z), p) Pr(x, y, z|α, β, p). (78) With a particular setup S we can then quantify its overall risk, by marginalizing over our knowledge about it. If we were to additionally marginalize over a distribution of p, this would be the Bayes risk, and an estimator minimizing this quantity would be a Bayes estimator. Therefore, the expression above can be seen as a hybrid between Bayes risk and frequentist risk, where we marginalize over α and β, but not p.
We will treat the prior of a Bayesian estimator as an implicit property of the estimator. We can then compare, for example,p Bayes with itself under different priors. The prior on α and β does not need to have any relationship with P S (α, β). A large part of choosing a prior has to do with assessing one's level of paranoia, and one may be more paranoid about generating a fair risk comparison than about giving the estimator an over-informed prior, or vice versa. However, setting them equal to each other will often be the most sensible thing to do.
We study a few regimes of experimental setups. We consider a high-data regime, S hd , where α = 100, 000, a middata regime, S md , where α = 10, 000, and a low-data regime, S ld where α = 1, 000. In these three cases the contrast is the same, C = α−β α+β = 0.6. We additionally consider mid-data regimes of varying contrast, where α = 10, 000 for C = 0.05, 0.33, 0.82. This defines the respective low, medium, and high contrast setups S lc , S mc , and S hc . The above setup descriptions only supply the mean values of their respective distributions P S , that is, E PS [(α, β)] = (α, β). To keep things simple, we take these distributions to all be binormal with super-Poisson standard deviations σ α = 2 √ α and σ β = 2 β, and covariances defined by σ α,β = 1.5β.
In Figure 7, R S (p, p) is plotted for each of the setups described above, and for each of the estimatorsp MLE ,p BCE , andp Bayes . The square root was taken so that both axes have the same units. Two different priors are used for the Bayes estimator on each setup, both are product gamma distributions, discussed further in Section F 1. The first, 'Bayes', uses the same mean value and diagonal covariance elements as P S . The second, 'Bayes-10', uses the same mean value as P S , but standard deviations which are ten times larger than P S , corresponding to a rather uninformative prior. The more sophisticated prior discussed in Section F 2, which allows for correlations between α and β, should in theory be strictly better than the ones used in these calculations, but were found to be too computationally expensive for naive implementations of risk computation. For all setups and estimators, risk is computed by Monte Carlo sampling; for each value of p, many pairs (α, β) are sampled from P S (α, β), for each pair many variates (x,y,z) are drawn from the likelihood distribution, and the loss L MSE is computed for each. The average of these loss values for this value of p forms an estimate of R S (p, p).
These plots show that the Cramér-Rao bound (61) is an excellent estimate of the risk of the MLE in most regimes. Further, under our loss function, the Bayes estimator never has more risk than the MLE, and has superior performance especially near the boundaries of [0, 1], even for the rather uninformative Bayes-10 prior.

VIII. EXAMPLE: HAMILTONIAN LEARNING WITH BAYESIAN INFERENCE
One of the primary advantages of using a Bayesian approach to NV measurement is that it can be used as an overlay model on other estimation problems. This results in seamless propagation of error bars to the final quantities of interest. In frequentist settings, it is usually a pain to justifiably propagate error bars near the boundaries of an interval like [0, 1] because the usual normality approximations are dubious. To illustrate this Bayesian approach, in this section we provide a thorough example of Quantum Hamiltonian Learning (QHL) [12] using experimental data. In a recent and related experimental work, QHL was shown to be a powerful method of characterizing quantum systems [15]. Raw data and reproducible code for this paper can be found in our online repository [30].
Other advantages of the Bayesian inference algorithm we employ, though not used by us here, include the ability to adaptively choose the next experiment to maximize information gain, and the ability to treat reference drift as a time dependent stochastic process [42].

A. Hamiltonian Model
We assume that our spin-1 Hamiltonian (in the optical ground state) is of the form where we are in a frame rotating near the ground state ZFS, 2.87GHz, and have invoked the secular and rotating wave approximations. Here, δ∆ is the mismatch between the applied microwave frequency and the ZFS, ω e is the projection of the static magnetic field onto the z-axis, A N is the hyperfine splitting due to 14-Nitrogen, m I is the spin number of the nitrogen atom, and Ω is the microwave nutation strength. All of the parameter units are angular frequency, 2π·MHz, except m I which is unitless. Our goal is to learn these parameters, as well as the T 2 decay time.
In particular, we would like good error bars on ω e since this is the quantity of interest in magnetometry, considering other parameters as nuisances. At room temperature, the nitrogen atom is equally likely to be in each of its three energy states |m I ∈ {−1, 0, +1} . Since the nitrogen T 1 is much longer than a single experiment, we assume that for each experiment, the state of the nitrogen is fixed. Hence in the Hamiltonian above, we simply treat the axial hyperfine coupling between the NV − and the nitrogen as a small m I dependent shift in the static magnetic field.
With an initial state ρ i = |g, 0 g, 0|, at time t the density matrix is described by is the evolution superoperator under the Hamiltonian H I , along with a single dephasing Lindblad term L = √ T 2 S z . The superoperators S m I can be computed by exponentiating the supergenerator derived from the Lindblad master equation. We will only consider constant or piecewise constant values of Ω(t), which simplifies simulation (finite rise-times are ignored). This convex combination approach is valid because we will be summing over many trials of the same experiment, and hence will see an equal mixture of all three nitrogen states on average.

B. Experiment Choices and Data
The time dependence of the nutation envelope Ω is controlled by the experimentalist. To learn the parameters of this system, we choose to do two types of experiments. The first is the Rabi experiment, where Ω is finite and constant for a period t r and the state is subsequently measured. Rabi experiments are primarily sensitive to nutation frequency. The second is the Ramsey experiment, where there is a wait period with Ω = 0 of length t w between two identical unitary gates which are created by turning Ω on at full power for a duration t p . Ramsey experiments are sensitive to fields along the z-axis. These two experiments are sensitive to roughly orthogonal regions of parameter space.
Experiments were performed on a microscope with relatively poor optical characteristics; for a single repetition (N = 1) we had an average number of detected bright reference photons α ≈ 0.006, and an average number of detected dark reference photos β ≈ 0.004, giving a contrast value of 0.2. The static magnetic field acting on the NV − center was just the ambient stray field in the laboratory; some combination of Earth's field, building characteristics, and nearby electronics. We took R = 400 averages of N = 30000 repetitions for both the Rabi and Ramsey experiments. Rabi flops were sampled at 100 linearly spaced points between t r = 8ns and t r = 800ns. The Ramsey wait times were sampled at 200 linear spaced points between t w = 0.01µs and t w = 2µs, with a pulse time t p = 44ns. Raw (summed) data is plotted in Section H.

C. QHL Likelihood Function
As we did for the tomography model sketched in Section VI E, we now write down a model for our QHL problem. We label a given experimental configuration as c = (t r , t w , t p , k), where the three timing parameters were defined in Section VIII A and k ∈ {RABI, RAMSEY} selects the experiment types explained in Section VIII B. Similarly, we denote a hypothetical parameter set as a vector x = (ω e , δ∆, Ω, A N , T −1 2 ). A specific pair, ( x, c), provides enough information to do a full quantum simulation of the spin-1 manifold, resulting in the probability of a projective |0 measurement given by p x, c = Tr (|0 0| S x, c (|0 0|)) .
Here, S x, c is the solution to the Lindblad master equation under the Hamiltonian model described in Section VIII A. This yields the conditional model X, Y, Z|α, β, x; c with X ∼ Poisson (α), Y ∼ Poisson (β), and Z ∼ Poisson (β + p x, c (α − β)). The goal of the inference problem is to deduce the true values of x, and in particular, ω e , given a dataset of photon counts.
, is is used in a simulation of the Hamiltonian model (Section VIII A), and shown on top of the normalized raw data. The raw data was normalized using the MLE in Equation (65), and the 95% error bars are computed with Equation (62) for comparison. The posterior distribution is tight enough that simulations from randomly sampled values are visually indistinguishable. In (c), the expectation value of the SMC posterior is shown as a function of the number of Bayes' update steps in SMC. It is seen that very little happens to the mean, at least at the zoom level shown, after about 50 updates. In (d), however, we continue to see in their standard deviations decrease, where the square root of the diagonal of the covariance matrix is plotted on the same x-axis as (c). In (e) and (f) posterior marginal distributions are shown for the parameters ω e and Ω, respectively. Each of the broad curves (coloured) correspond to results from the same data-processing algorithm run on completely disjoint subsets of experimental data. The narrow curve (black) is the result from the algorithm being run on the amalgamation of these datasets. For comparison, the dashed curves show a Gaussian distribution with mean and variance computed through a weighted least-squares fit (WLSF) of the entire dataset.

D. Bayesian Inference with Sequential Monte Carlo
We begin with a prior distribution π( x) describing our knowledge of the system before any measurements, given by the following product distribution: Additionally, we empirically choose a prior for the references α and β by computing the sample moments of the experimental reference count data, multiplying the standard deviations by 4 to be conservative, and choosing a product gamma distribution with these moments. This distribution is discussed in Section F 1. We label this distribution as π k (α, β) where k ∈ {RABI, RAMSEY}. The prior distribution is now sequentially updated through Bayes' law one triple (x c , y c , z c ) at a time.
Here, x c = N,R n=1,r=1 x n, c,r , y c = N,R n=1,r=1 y n, c,r , and z c = N,R n=1,r=1 z n, c,r are the photon counts for a particular experiment c = (t r , t w , t p , k) summed over all repetitions and averages (see Section V B). The distribution Pr(α, β, x) is stored as a so-called particle approximation consisting of a finite list of hypothetical values, called particles, labeled as Typically on the order of 10000 particles are used for numerical stability.
The distribution for the references α and β is reset to π k (α, β) before each triple of data is used. As discussed in Section VII B, we may process the reference pair (x c , y c ) first and subsequently process the signal count z c . The reference pair is processed by replacing the reference prior with the analytically derived posterior, discussed in Section F 1. This is done by replacing the (α, β) coordinates of each particle (α i , β i , x i ) with a random variate drawn from the posterior π * (α, β|x c , y c ). The signal is processed using the Sequential Monte Carlo (SMC) algorithm, as implemented by the software package QInfer [42].
While Bayes' update rule is agnostic to the order in which we enter data since each data triple is statistically independent, the order is relevant to the numerical implementation. An intuitive explanation is as follows. Suppose one is interested in determining the frequency of a cosine wave using amplitude data sampled at various time points, and that we assign a flat prior distribution over a wide range of frequencies. If we first update our prior with the data from a late time point, the posterior will have many peaks because every divisor of the measurement time will correspond to a period consistent with the observed data. Subsequent updates will eventually inform us about which peak contains the true frequency. However, if we first update our prior with data from a time point early enough that we know (according to our prior) that less than a full period has had time to take place, then the posterior will be a very broad but unimodal. Subsequent chronological Bayes updates will tend to shift and narrow this peak. Since the SMC algorithm -specifically, the Liu-West resampler [43] -implicitly assumes unimodality, the first approach will usually fail and the second approach will usually succeed, assuming the ansatz that there is only one true value in parameter space. Given this data processing constraint, we fed the data to the Bayes updater in strictly increasing times t w and t r , shuffling the Rabi and Ramsey data together randomly. Alternatively, an algorithm without a unimodality constraint could be considered [44].
One nice feature of the SMC algorithm is that it typically heralds its own failure through the effective sample size criterion [45]. Such failures can result from multi-modalities, as discussed above. Another common failure path is through overly-informative data, where a single Bayes update causes only a handful of particles to remain relevant. We mitigate against this partly by using a conjugate prior for the reference indices, as discussed in Section VII B, and also by a technique called bridging the transition, discussed in Section G.

E. Results and Validation
Bayesian inference with the SMC algorithm was run on the entire dataset to obtain a posterior distribution (twoparameter marginals are plotted in the appendix, Section H). Recall that our entire dataset consists of 400 averages of 30000 repetitions for each of the 300 different experimental configurations, corresponding to roughly 24 hours of experiment time given our particular optical efficiency. This number of averages was chosen to be large to allow for more convincing validation of our techniques. To this end, the 400 averages were divided into ten disjoint and chronological batches of 40 averages each, and the SMC algorithm was run independently on each batch. Since each batch has strictly less data than the entire dataset (effectively lower values of α and β), wider posteriors are expected for these than for the entire data set.
The main results are shown in Figure 8 and Table I, with supplementary figures found in Section H. The top two plots, Figure 8(a-b), show that the SMC posterior corresponds to a sensible traditional fit of the data; the posterior is used to obtain a point estimate of each of the parameters, and these parameters are then used in a simulation spanning the experimental configurations. Since the posterior distribution is tight enough that simulations from randomly sampled values are visually indistinguishable, these fits can be interpreted as a visual posterior predictive check, where data simulated according to the posterior is compared with actual data [46]. The middle two figures show convergence properties of the SMC algorithm. Finally, the bottom two figures show that the disjoint data sets result in posteriors that are consistent with each other, and consistent with the posterior of the amalgamated data set. Keep in mind that the parameters could be fluctuating slightly over long time scales.
For comparison, we also analyzed the data using a weighted least-squares fit (WLSF) of of the parameters x = (ω e , δ∆, Ω, A N , T −1 2 ). This was done by using the SciPy [47] function optimization. curve fit to minimize the quantity where the sum is taken over all experiment configurations c = (t r , t w , t p , k) that were performed, p x, c is the simulation of hypothesis x under conditions c defined in (81),p c is the MLE of p given the data (x c , y c , z c ), and the formula for the estimated variance σ 2 c is derived from the Cramér-Rao bound (61). For simplicity, the initial guess of the WLSF function was taken to be the SMC point estimate. The WLSQ fit is shown in Figure 8(e-f) on top of the SMC marginal posteriors. Table I provides a more comprehensive comparison, where WLSQ fits are also performed on each of the ten batches. For the smaller batch sizes, the WLSQ confidence intervals are more comparable in size to the SMC credible regions. We suspect this is because SMC did not have enough data to significantly reduce posterior correlations between parameters, especially between δ∆ and ω e .

IX. CONCLUSIONS
We started with a standard phenomenological model of NV − optical dynamics and derived random variables corresponding to photon emission during measurement. Noise, limited visibility, and imperfect state preparation were considered in detail and added to the model, which did not affect it very much; one still ends up with three conditionally independent Poisson random variables, just with lower count rates and contrast.
This material will have all been review for those familiar with this quantum system. However, we deemed it important to include it explicitly so that our subsequent statement of the NV − measurement process as a formal statistical inference problem would be entirely self-contained and justified. We were able to give a straight forward argument for why summing up all of the photon counts for repeated experiments of the same type is almost always the best thing to do, regardless of data-processing techniques, and despite our discussion about the headaches involved in drifting reference counts.
We derived both frequentist and Bayesian estimators for this inference problem. We found that the most commonly used frequentist estimator is exactly the maximum likelihood estimator, and its mean-squared-error risk is well approximated by a simple equation derived from the Cramér-Rao bound. This fact produced a useful formula, (63), for estimating the amount of data required to reduce the error bars of p = Tr(ρ |0 0|) to a specified value ∆p. Quantitatively comparing the estimators revealed that the Bayesian estimator for p is superior to the MLE with respect to mean-squared-error risk, especially near the boundaries of [0, 1].
Finally, in order to convince the reader that a Bayesian approach is tractable in practice, we perform quantum Hamiltonian learning on experimental data in a fully Bayesian setting using the sequential Monte Carlo inference algorithm. In addition to giving a good fit even in the case of a very wide prior, cross validation gives convincing evidence that the posterior distribution over Hamiltonian parameters accurately reports its confidence in their values. Credible regions reported by SMC are generally better than confidence intervals due to weighted least squares fitting, but in some cases, not by much.   We wish to compute where θ 1 = p, θ 2 = α, θ 3 = β, and from Equation (54) we have Listing 3. The results of this code for the Fisher information matrix are with inverse Further, the higher order cumulants turn out to be l a g r a n g e E q u a t i o n s = D[ Φ,#]==0 & /@ {α , β , γ , λ , p} // FullSimplify ; ( * Remove λ from t h e s e t o f e q u a t i o n s * ) l a g r a n g e E q u a t i o n s = Rest [ l a g r a n g e E q u a t i o n s ] / . F i r s t @ S o l v e [ F i r s t @ l a g r a n g e E q u a t i o n s , λ ] // FullSimplify ; 13 ( * Remove γ from t h e s e t o f e q u a t i o n s * ) l a g r a n g e E q u a t i o n s = l a g r a n g e E q u a t

Bias
Throughout this section, if an integral is performed or a series is summed without explanation, the reader can assume it was done using Mathematica. For a variate (x, y, z) of (X, Y, Z)|α, β consider the estimator for p = γ−β α−β where is any non-integer. We are ultimately interested in the limiting case as → 0 since this gives the maximum likelihood estimator. We wish to compute the bias ofp MLE, , which is given by This triple sum is not straight forward to compute. Our strategy is to first sum over z and x resulting in where Γ(x) = ∞ 0 t x−1 e −t dt is the gamma function, and Γ(s, x) = ∞ x t s−1 e −t dt is the upper incomplete gamma function. Note that the non-integer value of allows us to avoid poles of the gamma function. The first two terms of the sum are seen to be purely imaginary. Since the expectation value is known to be real, we may ignore them.
To proceed, we make use of the complex integral form of the incomplete gamma function, which holds for any integration path in C from −1 to R which does not cross the negative real axis. For the third term we get for the last term. Therefore Consider the integration path which is a semi-circle from −1 to 1, and then a straight line to R along the positive real axis, avoiding the singularity at t = 0. Note that along the positive real axis, the integrand is purely imaginary. We are therefore only interested in the real part of the integral along the unit semi-circle. Making the change of variables t = e iφ we are left with From this expression we can see that E x,y,z [p MLE ] is exactly linear with respect to γ, and therefore the bias will be exactly linear with respect to p. This integral is best done numerically. What follows is a closed form approximation which will usually be more practical. We see that the integrand falls off exponentially with rate α + β as φ decreases from π. Since it will usually hold that α + β 2, the integrand will have most of its support in a region close to π. This justifies a low order series expansion of trigonometric functions about φ = π, giving where F (x) = e −x 2 x 0 e t 2 dt is the Dawson function and f (x) = √ 2xF (x/ √ 2) − 1. Numerics show that these approximations are accurate to O (α + β) −2 for α+β 700. For large x we have the series f (x) = 1 We see that the estimator is unbiased at the single point p = β α+β and that worst bias happens at p = 0 or p = 1 and scales as O 1 α+β . Note that we have assumed that the contrast α−β α+β stays fixed to make the above asymptotic arguments.
However, recall from Section VI F that the Cramér-Rao bound gives us the lower bound on the variance of our estimator. If we take the average of this bound at p = 0 and p = 1 we get α+β (α−β) 2 which happens to be equal to the worst-case bias derived above. We conclude that when α + β 300, the bias of the MLE is negligible since it will be of O (α + β) −1 , well contained within a single standard deviation ofp MLE Moreover, the estimate of the bias p MLE −β α+β α+β (α−β) 2 used above is itself likely to have a variance and bias which outweigh that which it is trying to correct. Therefore, this estimator is not useful in practice.
To this end we consider a bivariate Poisson model inspired by Equation (44). In that equation it is clear that variations in δ will cause correlations between α and β. A bivariate Poisson random variable is defined as (A, B) ∼ BP (θ 0 , θ 1 , θ 2 ) where A = C 0 + C 1 and B = C 0 + C 2 with C i ∼ Poisson (θ 0 ), i = 0, 1, 2. This produces the probability density pdf BP (x, y; θ 0 , θ 1 , This distribution has marginal distributions A ∼ Poisson (θ 0 + θ 1 ) and B ∼ Poisson (θ 0 + θ 1 ), and covariance Cov(A, B) = θ 0 . As discovered by Karlis and Tsiamyrtzis [48], it has an exact family of conjugate priors given by the mixture distributions where G is the probability density of the gamma distribution, a i , b i > 0, r ∈ {0, 1, 2, ...}, 0 ≤ w j ≤ 1, and r j=0 w j = 1. For the prior that interests us, fixing r = 0 and b 0 = 1 will suffice; this leaves us with five hyperparameters (a 0 , a 1 , a 2 , b 0 , b 1 ) which we will bijectively map onto the more intuitive hyperparameters (µ α , σ 2 α , µ β , σ 2 β , σ α,β ). Indeed, make the change of variables α = θ 0 + θ 1 and β = θ 0 + θ 2 and consider the prior π(α, β, θ 0 ) = G(α − θ 0 ; a 1 , b 1 ) · G(β − θ 0 ; a 2 , b 2 ) · G(θ 0 ; a 0 , 1) (F8) where It then holds that, for example, Since this prior is conjugate to the likelihood function of (X, Y ) ∼ BP (θ 0 , α − θ 0 , β − θ 1 ), if we receive the iid data (x 1 , y 1 ), ..., (x n , y n ) sampled from this distribution, the posterior will have the form of Equation (F7) with updated parameters. Let x = n k=1 x k and y = n k=1 y k . First we have the simple updated posterior parameters a * 0 = a 0 , a * 1 = a 1 + x, a * 2 = a 2 + y, b * i = b i + n for i = 0, 1, 2, which is very similar to the uncorrelated case of the previous section. The new weights w * j of the posterior (recall we had a single weight w 0 = 1 in the prior) also has a closed form, but it is cumbersome to write down. Defining s i = min(x i , y i ) and S n = n i=1 s i , then for each 0 ≤ k ≤ S n we have w k =p k / Sn m=0p k wherē p k = c k , and s * m = min(s m , S m−1 ). For generality, we have provided these formulas given n random samples, when in practice, because of the drift discussed in the main body of this article, only one random sample will usually be taken. With n = 1, the recursive definition is unnecessary and the weights are given by w * k = x!y! sin(π(α 1 + x)) sin(π(α 2 + y))Γ(k + α 0 )Γ(x − k + α 1 )Γ(y − k + α 2 ) is the regularized generalized hypergeometric function and p F q ({c 1 , ..., c p }, {d 1 , ..., d q }, z) is the generalized hypergeometric function. However, the only factors in (F13) which are are relevant to numerical implementations are those which involve k. Everything else can be implicitly calculated by demanding the normalization Sn k=0 w * k = 1. An example of posterior mixture weights is shown in Figure 9.
Depending on the size of S n , the posterior may be expensive to compute the value of at a given coordinate. However, once the posterior weights ω * k have been computed, drawing a sample is essentially the same cost as drawing a sample from three Gamma distributions.

Appendix G: Bridged Updater for the Referenced Poisson Model in SMC
For us, there are two main mechanisms that can cause the SMC algorithm to become unreliable. Both have to do with the finite particle approximation and its reliance on having enough effective particles near the true model parameter values, known as importance sampling. The first mechanism is that periodicities in the model have the effect of creating temporary multimodalities in the posterior as we analyze the data. If certain early data happen to cause disproportionate support on an incorrect mode, we lose particles where we need them, and this might lead to a runaway effect where all particle weights become zero. We mitigate against this by processing the data in ascending order, as discussed in Section VIII D.
The second cause of instability is that some data points are overly informative. From a learning perspective, informative data are great. However, from SMC's perspective, very informative data have the tendency to drastically where the update data is depicted by red dots. In each of the four cases, the same data is given to both the correlated and uncorrelated priors described in Section F 1 and Section F 2, respectively. Black dashed ellipses represent 90% confidence regions; their centers are at the mean of the distribution, and their eccentricity matrix is equal to 4.6 times the covariance matrix of the distribution. Cases (a) and (b) represent a low data scenario, whereas cases (c) and (d) represent a high data scenario. Cases (a) and (c) represent correlated measurement data, whereas cases (b) and (d) represent anti-correlated measurement data.
reduce the weight of most particles, causing the effective particle count, to become a tiny fraction of the actual particle count, N . For example, it is not unusual (nor is it common) for data, with our QHL model, to reduce the effective particle count to below 100 while there are 16000 actual particles.  Figure 11: A particle distribution was initialized to the prior of (86) with 16000 particles, and separately updated with the data from a single point of a Rabi experiment in six different ways. We show a slice through the posterior for each case. On the left are bridged and un-bridged updates with no resamples allowed, The final effective particle count was about 1800 for all three of these updates. This demonstrates the bridging technique works in practice. On the right are bridged and un-bridged updates with resamples taken whenever the distribution was detected to have fewer than 8000 effective particles. These two bridge cases maintained at least 8000 effective particles at all times.
Since the posterior is far from normal, we can expect the resampler to introduce distortions.