Bayesian estimation of switching rates for blinking emitters

Single quantum emitters of light are valuable resources for engineered quantum systems. They can function as robust single-photon generators, allow optical control of single spins, provide readout capabilities for atomic-scale sensors, and provide interfaces between stationary and flying qubits. Environmental factors can lead to single emitters exhibiting"blinking", whereby the fluorescence level switches between"on"and"off"states. Detailed characterisation of this blinking behaviour including determining the switching rates, is often a powerful way to gain understanding about the underlying physical mechanisms. Bright low-noise emitters lead to high-contrast blinking, and simple thresholds can be used to extract the"on"and"off"intervals from a time-series of photon counts (and hence determine the switching rates). However, such approaches become difficult for emitters fluorescing at low levels, high noise, or fast switching rates. We develop a Bayesian approach capable of inferring switching rates directly from the time-series. This is able to deal with high levels of noise and fast switching in fluorescence traces. Moreover, the Bayesian inference also yields a robust picture of the parameter uncertainties, providing a benefit also for bright emitters. Finally, our method is applicable to a broad range of systems that do show behaviour analogous to a single blinking emitter.


Introduction
A wide variety of natural and artificial systems exhibit an unwanted stochastic switching between states, sometimes called telegraph noise [1]. To remove this unwanted behaviour it is necessary to understand its causes, and to gain an understanding of the causes it is necessary to be able to measure the characteristics of the switching. A prominent example of a system exhibiting this random switching of state is provided by quantum emitters where the behaviour is called blinking due to large intermittent changes in the observed fluorescence. Such emitters are valuable resources for engineered quantum systems, and the presence of blinking limits their applicability. If fact, most quantum light emitters exhibit intermittency in their fluorescence under continuous excitation, including diamond colour centres [2][3][4][5][6], quantum dots [7][8][9][10][11], nanowires [8,12], nanorods [12], organic semiconductors [13], molecules [14][15][16], and other systems [17,18]. In these quantum emitters, the changes of state are typically caused by physical mechanisms that fluctuate randomly (as opposed to periodically) and often without regard for their history. The observed fluorescence provides a convenient connection to the underlying state and it is important to understand observed blinking statistics in order to analyse the cause of the state switches. Key insights can be distilled from the switching rates and how these depend on relevant parameters, and yet these rates are often the hardest to extract from the raw time series of photon counts in particular for low-intensity light signals or noisy data.
Blinking typically leads to step-like switches in the fluorescence time trace, as illustrated in figure 1(a). The most common method used to analyse the on and off states from a blinking time series is threshold analysis [2,19,20], illustrated for simulated data in figure 1(a). However, the choice of threshold intensity is essentially arbitrary, and it can significantly influence the statistics of the on and off states [21][22][23][24]. The threshold technique becomes difficult to use for emitters fluorescing at low signal-to-noise ratios as depicted in figure 1(b), where the distributions of counts from the on and off states to overlap. Thresholds are not at all helpful if the blinking switches at rates high relative to the detector interval as shown in an extreme case in figure 1(c). Here we present a method utilising Bayesian inference to find the blinking rates directly from the photon-count data series, without requiring a threshold. The method works for blinking quantum light emitters independent of their brightness or speed of switching. This method is shown to be more accurate than threshold analysis, more broadly applicable, and more capable of assessing the measurement uncertainty. Hence our method allows for a better understanding of the mechanism behind the actual blinking phenomenon.
We model the blinking emitters on a hidden, discrete and continuous-time Markov chain. The accumulated photon counts over a detector interval obscure or hide the underlying process which is assumed to be memoryless. This naturally models a system whose time spent in each on and off state decays exponentially [1,2,17]. The Bayesian inference on this model then allows us to determine the underlying switching rates of these hidden states. Some more complex physical systems have been found to have on and off intervals instead distributed according to a power law [19,[25][26][27], indicating non-Markovian behaviour, or memory effects in the system. This precludes the independencies that lead to elegant factorisation in the derivations presented here, and so the computational processes would need to be significantly extended to tackle such data. However, the problems identified here for threshold analysis certainly also apply to non-exponential distributions. In fact, reliable discrimination between exponential and power law dependence is only possible by careful and reliable analysis of the blinking data.
We have modelled three behaviours for the state relative to the detector interval. In section 2 we analyse a simple blinking process using a discrete time Markov chain (DTMC), where the emitter is assumed to be either on or off during each detector interval and switching of state can occur only at the interval boundaries. The extreme case of continuous switching within the detector intervals is analysed in section 3 using a continuous time Markov continuous time Markov chain (CTMC). In real experiments only a finite number of switching events are likely to occur within any given detector interval rather than no switching or arbitrary switching events. Such a process is discussed in section 4 using a discrete time Markov chain and considering subintervals inside the detector interval. Finally in section 5 we show how to use the Bayesian analysis to infer the state at each time step. 2. DTMC single-step model 2.1. Model description The critical step in making an inference on the blinking rates from the observed fluorescence data is constructing a model of how the counts arise given the structure of the problem. We consider that the emitter has only two relevant states that affect the fluorescence levels: an on state and an off state, and there is some mechanism that causes the emitter to switch between these states. We model the switching as a Markov, where the current state is sufficient to determine the future dynamics. For simplicity, we begin by considering the state to be fixed for the entire detector interval and it can only switch at the boundaries with switch-on and switch-off probabilities α 1 and β 1 respectively, where the subscript represents the allowed number of state changes per detector interval. This notation anticipates a DTMC which allows for multiple switching events per detector interval, which is discussed in later sections. The Markov chain for this more general model is depicted on the left in figure 2, where d=1 corresponds to the single switching opportunity case.
The blinking model also contains the fluorescence and background rates λ and μ. If the emitter is in an offstate then the detector can still register counts (known as dark-counts) due to noise processes in the detector or electronics. These counts are well modelled by a Poissonian process with a rate μ. While fluorescing, it is assumed that the emitter is being driven by a strong classical pump so the statistics of the photon counts are also well modelled by a Poissonian process (i.e. shot noise) with an overall rate λ that includes detector inefficiencies. The observed counts are then Poissonian with a rate of either μ or m l + since the combination of two independent Poissonian processes is again Poissonian. Given λ, μ, and the state s t 1 -at the beginning of timestep t, the probability of seeing c t counts is where s t−1 =1 represents the on state and s t−1 =0 represents the off state for interval t, I 1 tags the single interval model as the background information, and c c c is the probability of observing c counts given a Poisson distribution with rate ℓ.
Clearly, if we knew the state of the emitter at each data point it would be a simple matter to infer the switching rates. Unfortunately these states are not directly observable, and worst still, because the states are not observed the switching probabilities become dependent on the entire history of the count data. This makes the inference considerably more involved.
We can summarize the dependencies amongst the variables by the Bayesian network (BN) [28] shown in figure 3. The BN can be used to determine conditional independencies between the problem variables, using the property of d-separability. In the BN, s t−1 and s t represents the state of the emitter at the boundaries of the t th detector interval, but for the single-step model I 1 the initial state extends for the whole duration so the dashed arrows in the figure should be omitted. That is, equation (1) depends only on s t−1 (this will change for multi-step models). The key variable independencies are the following for compactness. We will make use of these independencies in the following section.

Bayesian inference
Ultimately, the quantity we want to infer is P c I is the set of count data obtained over N detector intervals. Using Baye's rule, the posterior probability distribution of 1 a and 1 b can be written as, P c I P I P c I P c I . 5 1 1 1 The rates of fluorescence and background counts, λ and μ can be added then removed by marginalisation, P c I P I P c I P c I d d 6 1 1 1 Note that if μ and λ are known independently then the integrals in (7) are unnecessary and this will be true in all the models we develop. Given (2) we can factor P I 1 1 1 a b l m ( | )= P I P I P I . We will go further and take them all as constant over some initial range so that P c I P c 1 d d 8 and with the normalisation factor  to be determined at the end. As stated earlier, if all the states s s s s N were known together with Ω 1 the probability of all the counts would be easily determined. We can use the same trick of adding these parameters then marginalising over them so that, P c P c s P c s P s , 9 s s Figure 3. A partial Bayesian network representing the joint probability distribution of problem parameters. The pattern of nodes and connections inside the central grayed region representing detection interval t are repeated for each data value. We are estimating parameters r a , r b , λ, and μ given the observed counts c t . For the discrete time models r a and r b represent the probabilities d a and d b respectively. Enough of the full network is drawn to be able to easily determine the variable independencies. In the case of a single step over the detector interval, s t−1 is the state over the entire interval and the dashed arrows are absent. In all other cases s t−1 and s t are the states at the boundaries of the detector interval. where s and each term is determined by (1). The remaining P s 1 W  ( | )term cannot be simply factorised over the states despite being a Markov chain, as observing Ω 1 introduces possible dependencies. We can however expand using the product rule and simplify by making use of the independencies in (3): where we have chosen to expand in temporal order. Finally, the inference becomes where we have used (4) to write the joint distribution between c t and s t . The problem with (12) is that the sum over s  contains 2 N terms each of which has N products. This will rapidly become intractable as the size of the data grows. Fortunately it is possible to rewrite (12) as a single term with N 2×2 matrix products. Consider the following matrix: where the product on the right hand side is read as decreasing in t to the right. The equivalence in (15) is readily verified by expanding a few terms out. The matrix multiplication will sum over the columns of R t which is a summation over the states s t−1 . Each successive multiplication sums over another state and the final multiplication by the row vector [1 1] will sum over s N . With this equivalence, the inference reads which is easily computable. Note that care must be taken to avoid underflow or overflow in calculating the matrix products. As a demonstration of the algorithm, figure 4(a) shows the credible regions of 1 a and 1 b for the data given in figure 1(b). The marginal distributions of all unknown parameters ( 1 a , 1 b , μ and λ) are given in figure 4(b). It should be noted that the inference contains no approximations or arbitrary choices, and makes use of all the data. The key point of contention with observed data is whether the model is realistic to the system. If it is, then the inference faithfully converges on the underlying values regardless of what they are. For example in figure 5 we generated count data assuming high probabilities of switching ( 0.8 1 a = and 0.9 1 b = ) so the data looks very unlike the usual blinking trace. The inference quickly converges on the correct probabilities despite this.

Comparison with threshold analysis
In this subsection we compare our Bayesian inference technique with the conventional threshold analysis technique for extracting the on and off rates from a blinking time trace. To make threshold analysis possible, data is taken from the simulated count trace from figure 1(a) which has well separated on and off states. Given a threshold I threshold , the emitter is taken to be in the on state if the intensity I n in the n th time bin satisfies I n >I threshold and it is off if I I n threshold  . A variety of methods have been employed for selecting the threshold,  Once the on and off periods are identified they can be binned by duration into histograms [1,2,20,21]. Figure 6(b) shows the on duration histogram for a given threshold, highlighting the way that arbitrary choice of binwidth impacts the perceived distribution. A separate histogram can be produced for the off periods, and exponentially decaying fits to these histograms yield the characteristic on and off lifetimes from which the switching probabilities or rates are obtained [1,2]. This analysis was performed with a range of thresholds and of duration binwidths to explore how the outcomes depend on these choices, and the results are displayed in figure 6(c). Results for the four marked thresholds are marked at the 10-duration-bins case because this was the default option in the package used for data processing (numpy).
Threshold analysis does not provide any mechanism for estimating the uncertainty in the extracted swithching rates. This is because the fundamental measurement uncertainty arises from the difficulty in distinguishing short switching events noise spikes in the count time trace. It is possible to acknowledge a kind of 'processing uncertainty' by considering that various thresholds and bins are plausible, and accounting for the range of corresponding results. The grey error-bar mark in figure 6(c) indicates the mean and standard deviation of all results obtained for I 17 2 1 threshold   and histograms with 8-12 bins. The credible regions obtained from our Bayesian inference on the same blinking time trace is also shown in the figure. It is obvious from the picture that our Bayesian inference has converged close to the true value (the red dot in the figure 6(c)), and provided a meaningful assessment of the uncertainty. None of the values obtained from threshold analysis are very close, and the Bayesian inference works better even for this time trace with very discernible blinking.

CTMC model
In the previous section the emitter could only switch states at the boundaries of the detection interval. Though this is a hidden Markov model, the inference was simple as the state is only obfuscated by the Poissonian distribution of counts. In this section we go to the other extreme and assume the emitter can change state at any point in time, possibly multiple times in a single detection interval. The previous single-interval model will be a good approximation to this continuous time model in the limit that the switching rates are very small. In that limit the occasional switch during a detector interval introduces negligible error and most intervals are either entirely on or entirely off.

Model description
The emitter can switch at any time, so that the state is a function of time, i.e. s(t). This situation is modelled by a CTMC as depicted on the right of figure 2. At any given time, the state can still only have one of two values but can switch arbitrarily often in any time interval. The detection events are as before with the detectors reporting the accumulated count from a time window of constant duration T. For clarity we shall take time to be expressed in units of the detector interval so that T=1.
In a CTMC the time spent in any state is exponentially distributed, and since there are only two states, the next state is deterministic. The characteristic lifetime of the off -intervals will be inversely proportional to the rate of switching out of the off state which will be the switch-on rate r a . Similarly the on-intervals are inversely proportional to the switch-off rate r b . In this model, these two parameters r a and r b , are the parameters we want to infer, replacing 1 a and 1 b .
The CTMC can be solved by the forward Kolmogorov equations. Representing the off state by the vector [1 0] T and the on state by [0 1] T the transitions dynamics are encoded by the equation This has the formal solution P t Q t exp = ( ) ( ). From this solution we can determine the transition probabilities P ab (t) from state a to state b in time t:

Bayesian inference
The basic inference is as before, we have a list of N detector counts c  and we want to infer r a and r b while optionally marginalising over the rates μ and λ leading an equivalent expression to equation (8) but for r a and r b : similar to previously, and the subscript denotes the continuous-time model, and  is suitably redefined.
To get traction on the problem, we add in the states at the boundaries of the detection interval s j =s(t j ), where the n th detection interval goes from t n−1 to t n and t n =n since T=1.

All that remains is to calculate
where we have used l f 2 = and l l f 1 1 3 + = -. In general the interval will be partitioned into a set of on states of total duration f and a set of off states of duration 1−f so that the exponentials combine and will not depend on the durations l j . For example with six switches there are seven regions with three switch-on and three switchoff events, and the probability is  Alternatively, these integrals can be evaluated using a Laplace transform as shown by Jaynes [31]. Summing over all switch events gives

The integrals in
is the Bessel function of the first kind. A similar argument yields the other possible fraction probabilities:

So in summary we have
n n n c n s s 1 0 which is an integral that needs to be done numerically. The extreme case data in figure 1(c) was used as an example of the inference and is presented in figure 7. The initial range of parameters was assumed to be equally likely anywhere in , and 0 , 18   l m , and a four-dimensional grid where each range was divided into 70 values was used in evaluating the posterior probability in (29) for each data point to allow for temporary normalization and control of underflow.

Relation between single interval model and continuous-time model
The single interval model and the continuous time model make different assumptions about the underlying switching mechanism, nevertheless if the switching rates r a and r b are low, we would expect the single interval model to yield very similar results. This is because the detector intervals that contain internal switching events will be relatively rare and will not contribute significantly to the inference. Additionally, the discrete time model assumes that the emitter remains in a constant state over each detector interval, which is a reasonable approximation for small switching rates. It might be supposed that, if the rates are small, the single interval model could be used as a more computationally efficient approximation to the full inference. We now explore the validity of this approximation.
The two models can be connected by the time spent in a given state. Consider a time interval Δt=T/d which is a fraction of the measurement time. In the continuous-time model the probability of a state having a lifetime Δt is exponentially distributed, e.g P t ). So for a model where the state is fixed for duration Δt, we can relate the probability of switching state to the probability of the state not having lifetime Δt, e.g. α d =1−P 0 (Δt). Consequently, Hence at a low rate of switching, we can expand first order in r a , r b to obtain and R f r where, is the probability of obtaining c n counts averaged over all durations spent on in the interval. The first two probabilities are identical to those obtained in the single interval model. The last two differ, as in the single interval model we made the arbitrary decision to take the switch event as happening at the ends of the intervals, whereas in the correspondence between models we allow the switch to occur anywhere. However, detector intervals that contain a switch event become rare when the rates are small. So the continuous time model becomes indistinguishable from the single step model in this limit as expected. The single interval model will lose accuracy as the rate of switching increases and this can be explored numerically. In figure 8 we summarise such an investigation by plotting the difference between the true switching rate (for data simulated from a continuous switching model) and the result of a single-interval model inference. The colour map represents the Euclidean distance from the true value to the maximum value of the posterior probability distribution. The black region corresponds to a small distance, and is therefore the most applicable region for the single-interval inference. As the (continuous) rates increase the single-interval model starts failing. An approximate threshold for applicability of the single-step model is when r r 0.1 < a b , which corresponds roughly to a relative error 25%  . The continuous time model performs well throughout this entire region (and beyond) as shown in figure 7, even though the traditional threshold technique cannot even be applied to the data, as illustrated in figure 1(c).
A grid of 150×150 simulations were run to produce the colour map in figure 8, and the posterior distributions for switching rates are shown for four illustrative cases. Note that the scales are different for each contour plot. The lower left plot shows a good convergence of the inference to the true value, where the rates are Figure 8. Accuracy of the single-step inference to data simulated using a continuous model with various switching rates. The inference error taken as the Euclidean distance between the true r r , a b and the mode of the posterior distribution. For small switching probabilities the inference gives good agreement with the known parameters, but for higher switching probabilities the inference is less accurate. The cyan line indicates an approximate threshold for applicability of the single-step model of r r 0.1 < a b , which corresponds approximately to a relative error 25%  .
,0 0, 0 which is the sum of the two discrete convolutions of count probability distributions corresponding to the two possible mid-interval states. This offers a fundamental recursion relation between the probability distributions of counts over successive halvings of the interval. For a given interval t, f i, j (t,·) is function of counts (a slice through a joint probability distribution). Since in any given experiment there will be a maximum count in an interval, the function f i, j (t,·) can be recorded as a single dimensional array. In light of this, equation (52) shows that values of the functions f i, j (t,·) can be calculated numerically from knowledge of the function values for half the interval, f i, j (t/2,·).
This recursion will apply in any system where the count distribution over a single time interval is independent of adjacent intervals given knowledge of the boundary states; in cases where a closed form for the count distribution is intractable this can be exploited to estimate the fully continuous distribution at low computing time cost. As a demonstration of this process we now show that repeated application of equation (52) facilitates fast and accurate estimates of the functions f i j , arising from the CTMC model.

Model description
The CTMC photon counts distribution has been shown to limit to Poissonian probabilities in the case when the switching probabilities are small, and the state of the emitter is constant for most measurement windows. Even when switching probabilities are high, however, the emitter state will be (nearly always) constant over time windows sufficiently small. We construct a model in which the detector interval is subdivided into multiple subintervals, and the emitter is allowed to switch state only at the boundaries of these subintervals (similar to the discrete model in section 2). To leverage the intrinsic 'halving' suggested by the recursion relation discussed above, suppose then that a number of subintervals d=2 m is chosen such that m is integer. If 1/d is a short enough duration that the functions f i, j (1/d,·) are well-approximated by weighted Poissonians, then it is expected that this model will give close agreement with the CTMC inference. For example, the region of applicability identified in figure 8 suggests that m should be chosen at least large enough so that r α r β <0.1×2 2m for every r α and r β under consideration. Since this bound increases exponentially with m, large switching rates may be accommodated by modest increases in m.
In this model the total photon count observed in a detector interval is the sum of all counts that arise in each subinterval within the detector interval. This kind of system which switches its state within the detector interval can also modelled by DTMC. The switching probabilities over a subinterval are d a (probability of switching on) The four cases of f i, j (1/d,·) may be stored as arrays indexed by the counts κ. Equation (52) may then be applied m times to numerically approximate the values of f i, j (1,·).

Bayesian inference
As in the continuous time case, we are inferring the rates of switch-on r a and switch-off r b given the data counts c  . The inference proceeds almost identically, adding the unknown fluorescence λ and background μ rates to the problem, using the independencies and finally converting into a matrix form equivalent to the one given in (15), were generated, and inferences were calculated for d=1, 2, 4, 8. Euclidean distance was again used to measure the error in the inference, and figure 9 shows the error increasing as r r , a b increase. Taking higher numbers d of subintervals into consideration reduces the error, and d=4 is almost sufficient to keep the entire r r , 2  a b region from figure 8 Figure 9. Increasing the number of subintervals considered in the inference reduces the error of the obtained switching rates. Simulated data with r α =r β were taken from the set generated for figure 8 up to r α =r β =2 and additional datasets were generated for even faster rates. The multi-step inference was performed for various numbers of subintervals d and the euclidean error determined by comparison of the known rates with the mode of the posterior distributions. The inset shows the 50%, 90%, and 99% credible contours for the posterior distributions corresponding to d=2, 4, 8 for r α =r β =1.9966 (the contours for d = 1 for this dataset are shown in figure 8). The shaded region represents a relative error of 25%. enclosed inside the 25% relative error threshold. The inset illustrates the posterior distributions that correspond to the third case in figure 8, except the plot scale is different. As d increases the location of the inference improves, but the width of the distribution converges less tightly. This reflects the fact that for increasing d there are higher numbers of intermediate states j z considered by the model but the amount of information provided by the dataset is not increasing. In figure 9 each d eventually leads to the error increasing with about the same slope, which is r 2 a . For a given d the inference seems to remain effectively bounded, and for large r r , a b the inference differs from the true values by approximately the diagonal distance on the plane. This investigation suggests that d=8 is roughly on the 25% relative error threshold for rates approaching 3. To tackle the count data from figure 1(c) as before, we therefore choose d=16 and the results are presented in figure 10. The inference is quite accurate, but not as good as the CTMC model. Increasing d will improve the accuracy of this inference at the cost of increased time to recursively compute the discrete convolutions.

Inferring the state
An obvious extension of the work presented above is to determine the underlying state behind a data point given all the data observed. In the case where the switching is happening many times per detector interval, this task becomes finding what fraction of the interval was spend in a given state. Here, we illustrate how to solve this task for the case of the single interval model. In this model the system has a fixed state throughout the detector interval and the task is to determine the state of interval k making use of all observed data, but without knowledge of λ, μ, α 1 , or β 1 . Specifically, the task is to determine P s a c I , where a is either 0 or 1. For a continuous switching model the posterior distribution will be of the proportion of time spent in a given state underlying each data point.
In order to do the inference we add λ, μ, 1 a , β 1 and the rest of the states and marginalise over them.
where Ω 1 =α 1 β 1 λμI 1 as before and the prior probability distributions where taken as constant with the normalisation  to be determined at the end. The summation is over all states s j where j k ¹ denoted in short as s s k ¹  . Expanding the last term in temporal order and making use of the independencies in (3) and (4) leads to, The probability can be efficiently computed using the matrices R t (13) and D 0  (14), and either R k 0 ( ) or R k To demonstrate the algorithm in action, data was generated and analysed assuming known μ and λ, by both a threshold and the full inference and is shown in figure 11. Determining the state by a threshold is shown in figure 11(b) with the result being either state '0' or state '1'. Using the inference based on the whole data yields the probability distribution over the state. Figure 11(d) plots the probability that the state is '1', P(s k =1), for each data point k. Note how small fluctuations past the threshold do not significantly affect the probability of the state and are consequently filtered out. Additionally, the probability of state can convey the confidence in the state assignment-a value of 0.5 means it is equally likely to be either state, this information is lost when a threshold is applied.

Conclusions
We have derived efficient methods for obtaining the posterior distribution of blinking and emission rates given observation of accumulated detector counts for several blinking models. The methods do not require approximations to make the calculation tractable, and make use of all the observed data; the main fitting task is the selection and assumptions that go into the models. Moreover, the advantage of obtaining the posterior distribution is that rigorous error bounds can be placed on the inferred parameters for example by use of credible regions. We have also demonstrated how to use the data to determine the underlying state, and unlike a threshold technique the result is automatically carries error bounds in the form of a probability distribution. It would be an interesting extension to compare this application with other proposals for directly extracting the state e.g. [18,23,32]. Though it should be noted that since the rates and the states are dependent on each other the analysis cannot decouple into first determining the states, then from those determining the rates of transitions.
In general we expect the methods presented to apply to a wide variety of hidden Markov models, far beyond the blinking of quantum emitters. For systems where there is a well-defined state time, the single and multi-step models will be a close fit depending on how the data is accumulated. These models are then capable of determining the model parameters for high or low switching probabilities. In systems that can switch at any time the CTMC model is a better fit, and though it requires the numerical evaluation of an integral as part of the routine we have found that this is not a hindrance. In these type of systems the stepped models can be thought of as approximations that are more efficient for evaluation when the switching rates are low. Similarly we expect the methods to readily generalize to more complex Markov models with more states and other methods of indirect observation.