Bayesian inference of kinetic schemes for ion channels by Kalman filtering

Inferring adequate kinetic schemes for ion channel gating from ensemble currents is a daunting task due to limited information in the data. We address this problem by using a parallelized Bayesian filter to specify hidden Markov models for current and fluorescence data. We demonstrate the flexibility of this algorithm by including different noise distributions. Our generalized Kalman filter outperforms both a classical Kalman filter and a rate equation approach when applied to patch-clamp data exhibiting realistic open-channel noise. The derived generalization also enables inclusion of orthogonal fluorescence data, making unidentifiable parameters identifiable and increasing the accuracy of the parameter estimates by an order of magnitude. By using Bayesian highest credibility volumes, we found that our approach, in contrast to the rate equation approach, yields a realistic uncertainty quantification. Furthermore, the Bayesian filter delivers negligibly biased estimates for a wider range of data quality. For some data sets, it identifies more parameters than the rate equation approach. These results also demonstrate the power of assessing the validity of algorithms by Bayesian credibility volumes in general. Finally, we show that our Bayesian filter is more robust against errors induced by either analog filtering before analog-to-digital conversion or by limited time resolution of fluorescence data than a rate equation approach.

The rate matrix is parametrized by the absolute rates , the ratios between on and off rates (i.e. equilibrium constants) and , the ligand concentration in the solution. The units of the rates are s −1 and M −1 s −1 respectively. The liganded states are 2 , 3 , 4 . The open state 4 conducts a mean single-channel current = 1. Note, that absolute magnitude of the single channel current is irrelevant regarding this study what matters is its relative magnitude compared with op and ex . The 100 kHz data is used for investigating two related questions regarding the influence on the parameter inference: On the one hand we use 100 kHz data to mimic an analog current signal to investigate the effect of Bessel-filtering before the analog-to-digital conversion. On the other hand, we explore the influence of the finite integration time of the fluorescence signal to create one digital data point. Note, throughout this study the KF analysis frequency ana for cPCF data is in the range of (200 − 500) Hz while pure current data is analyzed at (2 − 5) kHz. b-c, Normalized time traces of simulated relaxation experiments of ligand concentration jumps with ch = 10 3 channels, b = 0.375 mean photons per bound ligand per frame and single-channel current = 1. The current curr and fluorescence f lu time courses are calculated from the same simulation run to mimic the experiment.  Two major problems for parameter inference for the dynamics of the ion channel ensemble ( ) are: (I) that currents are only low-dimensional observations (e.g. one dimension for patch-clamp or two for cPCF) of a high-dimensional process (dimension being the number of model states) blurred by noise and (II) the fluctuations due to the stochastic gating and binding process cause autocorrelation in the signal. Traditional analyses for macroscopic PC data (and also for related fluorescence data) by the RE approach, e.g. Milescu et al. (2005) ignores the long-lasting autocorrelations of the deviations (Box 1 Fig. 1 a) blue and orange curves from the mean time trace (black) that occur in real data measured from a finite ensemble. To account for the autocorrelation in the signal, an optimal prediction (Box 1 Fig. 1 b) of the future signal distribution ℙ( ( 2 )) should use the measurement ( 1 ) from the current time step 1 to update the belief about the underlying ( 1 ). Based on stochastic modelling of the time evolution of the channel ensemble, it then predicts ℙ( ( 2 )). To demonstrate the difference how the two algorithms analyse the data, we compute the autocorrelation of the residuals of the data. After the analysis with either the RE approach or the KF we can construct from the model with the mean predicted signal . (1) Filtering (fitting) with the KF (given the true kinetic scheme) one expects to find a white-noise process for the residuals. Plots of the autocorrelation function of both signal components (Box 2 Fig. 2 a)) confirms our expectation. The estimated autocorrelation vanishes after one multiple of the lag time (the interval between sampling points), which means that the residuals are indeed a white-noise process. In contrast, the residuals derived from the RE approach (Box 1 Fig. 2 a) display long lasting periodic autocorrelations.

161
Simulation of ligand-gated ion-channel data 162 Here we treat an exemplary ligand-gated channel with two ligand binding steps and one open-  probability Ghahramani (1997) (Fig. 2a). We define the hidden ensemble state vector which counts the number of channels in each state (see Methods). To make use of the KF, we 175 assume the following general form of the dynamic model: The evolution of ( ) is determined by a 176 linear model that is parametrized by the state evolution matrix where ∼ means sampled from and  (⋅| , ) is a shorthand for the multivariate normal distribution, 178 with the mean and the variance-covariance matrix . The state evolution matrix (transition ma-179 trix) is related to the rate matrix by the matrix exponential = exp( Δ ).
given what is known about at time − 1. The KF as a special Bayesian filter assumes that the 199 transition probability is multivariate normal according to Eq. 3  For all data sets in the figure the analysing frequencies ana ranges within (2 − 5) kHz. a, Posterior of the rate matrix of the classical deterministic RE (blue), 2007 Kalman filter (red) and our Bayesian filter (green) for a dataset with ch = 5 ⋅ 10 3 and op ∕ = 0.05. The blue crosses indicate the true values. All samples are normalized by their true values which is indicated by thẽabove the parameters. b, The Euclidean error is plotted vs. op ∕ (low axis) for the Bayesian filter, the classical Kalman filter algorithm and the rate RE approach. c, Scaling of the Euclidean error vs. ch follows ∼ ( ch ) −0.5 for ch > 2 ⋅ 10 3 for the RE and the Bayesian filter approach. The data indicates a constant error ratio (orange) for large ch . For ch < 2 ⋅ 10 3 samples of the posteriors for many data sets suggest an improper posterior. For the instrumental noise we used ex ∕ = 1 op ∕ = 0.01. 7 of 59 or even particle filters which avoid the normal approximation.

209
Each prediction of (Eq. 6) is followed by a correction step, The second noise term op is defined in terms of the first two moments ( op ) = 0 and var( op ) = ( 2 op ) = 2 op 4 ( ). To the best of our knowledge such a state-dependent noise makes the integration of the denominator of Eq. 8 (which is also the incremental likelihood) intractable This is because the state distribution  ( | ( ), ( )) as the prior also influences the variance param- and leak currents are necessary for consistent kinetic scheme inference.

257
The Bayesian posterior distribution encodes all information from model assumptions and experimental data used during model train- Besides the covariance matrix of the parameter to express parameter uncertainty, the posterior 264 allows to calculate a credibility volume. The smallest volume that encloses a probability mass is called the Highest Density Credibility Volume/Interval (HDCV/HDCI   Benchmark for PC data against the gold standard algorithms 276 We compare the posterior distribution (   √ ch for larger ch which is the expected scaling since the intrinsic noise divided by the mean signal scales in the same way. With lighter green we indicate the Euclidean error of patch-clamp data set. For smaller ch < 500 the error is roughly the same indicating that limitations of the normal approximation to the multinomial distribution dominate the overall error in this regime. b, HDCI and the mode of the 3 and 3 plotted vs. ch revealing that the maximum is a consistent estimator (converges in distribution to the true value with increasing data quality). While the KF (green) 0.95-HDCI includes usually the true value, the RE HDCI (blue) is too narrow and, thus, the real values are frequently not included. c, Bayesian estimation of true success probability for the event that all 6 HDCI include the respective true values at the same time by a binomial likelihood. Since the data sets have different ch and the model approximations become better with increasing ch , we use a cut-off for ch = 200. The two posteriors of the true success probability were updated from (1, 1) prior (red) which is a uniform prior on the open interval (0, 1). d, Comparison of 1-D and all combinations of 2-D marginal posteriors of both algorithms calculated from a ch = 10 3 simulation. For visual clarity we suppressed all other inferred parameters. Blue lines indicate the true value. We depict that in two dimensions the disproportion of the deviation of the mode and the spread of RE (grey) approach is worsened while KF (green) posterior includes the true values with more reasonable probability mass.
10 of 59 similar to a regular model ( Fig. 4c) which, however, means that the euclidean error from both algo-305 rithms stays different also for large ch . For data with ch < 2 ⋅ 10 3 the samples from the posterior 306 typically indicate that the posterior is improper which is defined as We consider this case as unidentified parameters. In that case more data sets need to be collected 308 or better priors to be applied.  By "high signal-to-noise assumption," we refer to an experimental situation with a standard devi-  (1, 1) seems sufficient. The estimated true success rate by the RE approach (blue) is ≈ 0.15 and 369 therefore far away from the true success probability. In contrast, the posterior (green) of the true 370 success probability of the KF resides with a large probability mass between the lower and upper 371 limit of the success probability of an optimal algorithm (given the correct kinetic scheme). Mah . The y axis denotes the probability mass which is counted by moving away from the maximum before an ellipsoid with distance Mah is reached. The different colours represent the changes of the cdf with an increasing number of rate parameters. The blue cdf at Mah = 1 represents how much probability mass can be found from ∫ − normal( , 0, ) , see inset. In one dimension we can expect to find the true value within 2 around the mean with the usual probability of = 0.682 for univariate normally-distributed random variables. The six parameters (brown) of the full rate matrix will almost certainly be beyond Mah = 1.0. The higher the dimensions of the space the less important becomes the maximum of the probability density distribution for the typical set which is by definition the region where the probability mass resides. The mathematical reason for this is that the probability mass = ∫ ℙ( ) is the integrated product of volume and probability density. b, The two methods to count volume in units of probability mass for the KF (green) and the RE (blue). For the optimistic noise assumptions ex = 0.25 ⋅ , op = 0.025 ⋅ and a mean photon count per bound ligand per frame b = 5 the RE approach (blue) distributes the probability mass such that the HDCV never includes the true rate matrix. From ch > 100 both HDCV estimates of the KF posterior (green curves) include the true value within a reasonable volume and show a similar behaviour. c, Binomial success statistics of HDCV to cover the true value vs. the expected probability constructed from the data of b). Calculated for = 0.25 and op = 0.025 and b = 5 and minimal background noise. posterior distribution has its mode much closer to the true value for various parameter combina-378 tions and it seems that the posterior is approximately multivariate normal. Further, we recognize 379 that the probability mass of the reasonably sized HDCV of the KF posterior includes the true val-380 ues whereas the HDCV from the RE does not. In 6 dimensions we lack visual representations of 381 the posterior. Since we showed that both algorithms are consistent for a given identifiable model, 382 we are looking for a way to ask whether the posterior is accurate (has the posterior distribution 383 the right shape). We can answer that question by asking, how much probability mass around the 384 mode (or around multiple modes) needs to be counted to construct a HDCV Eq. 14 which includes 385 the true values. Then we can ask for set data sets how often did we find the true values inside a 386 volume ( ) of a specific probability mass of the posterior distribution 387 success ∼ binomial( set , ( )).
An algorithm which estimates the parameters of the true process should fulfill this property si-388 multaneously to being consistent. Otherwise credibility volumes or confidence volumes are mean-389 ingless. We explain (Appendix 8) in detail how to quantify the overall shape and -dimensional 390 posterior and comment on its geometrical meaning. One way is to use an analytical approxima-391 tion via the cumulative Chi-squared distribution (Fig. 5a,b) The other way is to count the probability 392 mass of -dimensional histogram bins starting with the highest value until the first bin includes the 393 true values (Fig. 5b).

394
Knowing how much volume/probability mass is needed to include the true rate matrix allows Error ratio a 4 states 1 conducting state cPCF PC 10 3 10 4 10 5 10 0 10 1 10 2 b 5 states 2 conducting states cPCF PC 10 3 10 4 10 5 10 6 10 0 10 1 10 2 c 6 states 1 conducting state Plots in each row share the same x-and y-axis respectively. All columns share the same abscissa. a-c, Error ratio for PC data (blue) and cPCF data (red). The dashed lines indicate the mean error ratio under the simplifying assumption that the error ratio does not depend on ch . The vertical bars are the standard deviations of the mean values. Theses values were calculated from the Euclidean errors shown in Fig. 3c and 4a for a, and panels d-e, for b-c, respectively. Results from the KF algorithm (green) and the RE algorithm (blue) are compared for PC (lighter shades) and cPCF (strong lines). The diagonal gray areas indicate a ∼ ( ch ) −0.5 proportionality. For simulating the underlying PC data, we used standard deviations of op = 0.1 and = 1 and for the cPCF data additionally a ligand of brightness b = 5. To facilitate the inference for the two more complex models, we assumed that the experimental noise and the single channel current are well characterized, meaning ∼  ( |1, 0.01), ∼  ( |1, 0.01) and op ∼ gamma( op |1, 100). In the models containing loops (last 2 columns), a prior was used to enforce microscopic-reversibility and set to ⋆ 25 ∼ beta(100, 100) multiplied by 1 = 5 6 7 8 ( 2 3 4 ) −1 ⋅ 0.995 + 0.01 ⋅ ⋆ 1 .
Statistical properties of both algorithms for more complex models. 436 We have seen in (Fig. 3c) and (Fig. 4a) that the RE and the KF algorithm are consistent estimators, 437 while their error ratio (Fig. 7a) seems to have no trend to diminish with increasing ch . Adding a 438 second observable increases parameter accuracy and adds identifiability for both algorithms since 439 less aspects of the dynamics need to be statistically inferred (Fig. 4a). Furthermore, the second ob-440 servable takes away much of the tendency (compare Fig. 4b 1-6 with 8) of the RE approach to 441 overinterpret (overfit) which leads to a shrinking of the error ratio 5.6 ± 1.4 for PC data to smaller 442 values for cPCF data (Fig. 7a) (red) which are on average still bigger than one, while the Euclidean 443 error is reduced (Fig. 4a). If we then keep the amount and quality of the PC/cPCF data but increase 444 the complexity of the model which produced the data (Fig. 7b,d) from a 4-state to a 5-state model 445 (see kinetic schemes above Fig. 7a-c) we see that for cPCF data the error ratio stays roughly the 446 same (difference between Fig. 7a and b). For PC data instead both algorithms deliver an unidenti- Only the interval ch > 2 ⋅ 10 3 in which all parameters are identified is displayed. The data are taken from the KF vs. RE benchmark of Fig. 3c and 7a. The first row corresponds to three rates the second row to the equilibrium constants . All parameters are normalized by their true value. The insets show the error ratios of the respective single parameter estimates. Note that the error ratios on the single-parameter level can be much larger than the error ratios calculated from the Euclidean error if the errors of the respective parameters are small compared to other error terms in the Euclidean error Eq. 15. the risk of unidentified parameters. To calculate the mean error ratio we exclude the values were 450 some of the parameters are unidentified in total that still amounts to 6.8 ± 2.7 thus the advantage 451 of the KF (given all parameters are identified) might increase with model complexity for PC data.

452
The lowest Euclidean error for this kinetic scheme has cPCF data analysed with the KF. (Fig. 7d). A 453 6-state-1-open-states model with cPCF data has again an error ratio of the the usual scale (Fig. 7c).

454
As expected, the Euclidean error continuously increases with model complexity (Fig. 7d,e). For PC 455 data of the 6-state-1-open-states model even the likelihood of the KF is that weak (Fig. 7e) that it 456 delivers unidentified parameters even for ch = 10 4 and we can detect heavy tailed distributions 457 up until ch = 10 5 . Using RE on PC data alone does not lead to parameter identification, thus no 458 error ratio can be calculated.

459
Consistent with our findings, fluorescence data itself, should lower the advantage of the KF com-460 pared to PC data simply by signal-to-noise arguments. The stochastic aspect of the ligand binding is  Besides analyzing what causes the changes in the Euclidean error (Fig. 7a, b) at the single parame-  Fig. 7b,d. The first row corresponds to three rates the second row to the equilibrium constants . All parameters are normalized by their true value.̃ 25 is because of the microscopic-reversibility prior a parameter which is strongly dependend on the other rates and ratios. Refer to the caption of Fig. 7 for details about the prior that enforces microscopic-reversibility. Thus the deviations of̃ 25 are inherited from the other parameters. errors in the overall error ratio. Thus the advantage (error ratio) of the KF over RE approach for 476 a single parameter can be much larger or lower compared to the error ratio derived from the Eu-477 clidean error if the respective parameter is contributing less to the Euclidean error. The posterior 478 of the KF (green) seems to be unbiased after the transition into the regime ch > 2 ⋅ 10 3 where all 479 parameters are identified. Similarly, for the RE algorithm there is no obvious bias in the inference. 480 If we use the RE algorithm and change from the 4-state to the 5-state model (PC data from Fig. 7b) , 481 bias occurs (Fig. 9) in many inferred parameters, even for the highest ch investigated. Milescu et al. 482 (2005) showed that one or the reason of the biased inference of the RE approach is its ignorance of 483 autocorrelation of the intrinsic noise. We add here that the bias problem clearly aggravates with 484 an increased model complexity. It is even present in unrealistically large patches which in principle 485 could be generated by summing up 10 2 time traces with ch = 10 3 . In contrast, the KF algorithm 486 reveals that its parameter inference is either unbiased or at least much less biased in the displayed    On the data side of the inference problem adding cPCF data eliminates the bias, reduces the vari-505 ance of the position of the HDCI and eliminates unidentified parameters (Appendix 9 Fig. 1 and 2) 506 for both investigated algorithms. This advantage increases with model-complexity.

507
For the 5-state and 6-state model, we applied microscopic-reversibility Colquhoun et al. (2004). 508 We enforced it by hierarchical prior distribution (Methods Eq. 60) whose parameters can be chosen 509 such that they allow only arbitrarily small violations of microscopic-reversibility. But the prior distri-510 bution can also be used to force some softer regularization around microscopic-reversibilty. Thus,  Prior critique and model complexity 516 In the Bayesian framework the likelihood of the data and the prior generate the posterior. Thus, 517 the performance of both algorithms can be influenced by appropriate prior distributions. We used 518 a uniform prior over the rate matrix which is not optimal. Note, that uniform priors are widely 519 used by several reasons. They appear to be unbiased, and are assumed to be a "no prior" option 520 (which they are not). This is true for location parameters like mean values. In contrast, for other 521 parameters, such as scaling parameters like rates or variances, a uniform prior adds bias to the in-522 ference towards faster rates Zwickl and Holder (2004). We suspect, that for the PC data even in the 523 simplest model discussed here the lower data quality limit below which we detected unidentified 524 parameters (improper posteriors) is caused by the uniform prior. This lower limit for the KF also 525 18 of 59 increases with the complexity of the model from ch < 2 ⋅ 10 3 for the 4-state model till ch ≦ 2 ⋅ 10 4 526 for 6-state-1-open-state model. Note, that it is hardly possible to fit the 6-state-1-open-state model 527 with the RE approach for the same amount of PC data. We observe cPCF data eases this problem 528 because the likelihood becomes more concentrated for all parameters. The likelihood dominates 529 the uniform prior. Nevertheless, for most parts of the paper we used a uniform prior over the rates 530 and equilibrium constants to be comparable with the usual default method: a plain ML which in-531 fluences our results in data regimes in which the data is not strong enough to dominate the bias 532 from the uniform prior. Thus both algorithms perform better with smarter informative or at least 533 unbiased prior choices for the rate matrix.

534
In principle, to rule out an influence of the prior, unbiased priors should be used for the rates.

535
The standard concept for unbiased least informative priors is to construct a Jeffreys prior Jeffreys 536 (1946) for the rate matrix which is, however, beyond the scope of the paper.  (Fig. 10a). All median estimates of nine different cPCF data sets (Fig. 10b) differ by 548 less than a factor 1.1 from the true parameter except 3,2 , which does not profit as much from the 549 fluorescence data as 2,1 (Fig. 10c). The 95th percentiles, 95 of ℙ( 2,1 ) and ℙ( 1 ) follow 95 ∼ 1∕ √ b .

550
Thus, with increasing magnitude of ligand brightness , the estimation of 2,1 becomes increasingly 551 better compared to that of 3,2 (Fig. 10c).
with the mean predicted signal [ ( )], for PC or cPCF data is 2 ( ) ≈ 1 (Fig. 10d)   white noise but also distorts the dynamics (Fig. 11 a)  10 1 10 0 10 1 10 2 f cut /kHZ 10 1 10 0 10 1 10 2 f cut /kHz 10 1 10 0 10 1 10 2 f cut /kHz 10 1 10 0 10 1 10 2 f cut /kHz 10 1 10 0 10 1 10 2 f cut /kHz 10 1 10 0 10 1 10 2 f cut /kHz 10 1 10 0 10 1 10 2 f cut /kHz Figure 11. The influence of analog filtering on the accuracy and precision of the maxima of the posterior for cPCF data. In order to mimic an analog signal before the analog-to-digital conversion we simulated 7 different 100 kHz signals which were then filtered by a digital 4th-order (4 pole) Bessel filter. The activation curves were then analyzed with the Bayesian filter at 125 Hz and the deactivation curves at sampling rates between 166 − 500 Hz. The slower frequency at which the Bayesian filter analyses the data is necessary because the applied Bessel filter has caused additional time correlations in the originally white noise of the signal. Thus, an all-data-points fit would immediately violate the white noise assumption of Eq. 4 which we restore by analyzing at a much lower frequency. We then let the time scales of the induced time correlations become larger and larger by decreasing cut . We chose for the analog signal exp ∕ = 10, op ∕ = 0.1, thus a stronger background noise, and we set the mean photon count per bound ligand as b = 5. For the ensemble size we choose ch = 10 3 . a, Time trace filtered with different cut . Except for 100 Hz (red) the signal distortion is visually undetectable. Nevertheless, the invisible signal distortions from analog filtering are problematic for both algorithms. b, Estimate of the distribution mean Euclidean error of the median of the posterior vs. the cut-off frequency of a 4 pole Bessel filter (upper scale is in units of kHz) or scaled to the channel time scale (lower scale, see text). The fastest two ratios 1,2 ∕ 2 for the highest ligand concentration are indicated by the black vertical lines. The fastest ratios 1,2 ∕ 2 for the next ligand concentration are indicated by the red vertical lines. The slowest eigenvalue ratio 3 ∕ 2 at the highest ligand concentration is beyond the left limit of the x-axis. The solid line is the mean median of 5 data sets of the respective RE posterior (blue) and KF posterior (green). The green shaded area indicates the 0.6 quantile (ranging from the 20th percentile till the 80th percentile), demonstrating the distribution of the error of the posterior median due to the randomness of the data.c 1-3, Accuracy (bias) and precision of the maxima of the posterior̃ max, of the posterior maxima of the rates vs. the cut-off frequency of a Bessel filter. The shaded areas indicate the 0.6 quantiles (ranging from the 20th percentile till the 80th percentile) due the variability among data sets while the error bars show the standard error of the mean. The deviation of the mean from the true value is an estimate of the accuracy of the algorithm while the quantile indicates the precision. . For this reason we normalize in the following (Fig. 11) the cut-off frequencies by 2 at the highest 584 ligand concentration. We analyze the arithmetic mean from 7 different data sets of the median of 585 the posterior of the rate matrix. The mean Euclidean error of the median (Fig. 11b) and a series 586 of quantiles demonstrate that overall the error of the mean median of the posterior KF (green) is 587 smaller than that obtained by the RE. For unfiltered data, the accuracy of the mean median of the 588 KF is increased by ≈ 1.6. Based on the Euclidean error both algorithms benefit slightly from careful 589 analog filtering for cut ∕ 2 ≥ 1 while the offset remains rather constant. A strong negative effect 590 of analog filtering starts for both algorithms around cut ≈ 1 kHZ. This is induced by cut → ana 591 (see, Appendix 7) . In contrast, based on the level of each individual parameter of the rate matrix 592 (Fig. 11c 1-6) the bias induced by analog filtering immediately starts with cut = 70 kHz (Fig. 11c 1-3). 593 Note, that visual inspection of the signal (Fig. 11a)   The total bias in the estimate is reduced for 21 with the additional bias from the analog filtering 604 but increased for 32 which for the Euclidean error leads at first to a small increase in accuracy.

605
The KF is more robust towards analog filtering, as the results alter less with cut (given a reasonable 606 cut ), and less biased for unfiltered data in the estimates of these parameters. On the one hand, 607 the Euclidean error shrinks for cut > 10kHz (Fig. 11b). On the other hand, on the single-parameter 608 level (Fig. 11c 1-6), the parameter estimates pick up bias due the analog filtering even for high fil-609 ter frequencies, in particular for the RE approach. Only for 43 the KF is more biased than the RE the KF at a high ana with even much higher cut .

624
By theoretical grounds a further argument for doing less analog filtering is that this benchmark 625 analyses data of a finite state Markov process, which is a coarse proxy for the true process. In by summing with a sliding window the 100kHz signal including the white noise to obtain data at an 641 effectively lower sampling frequency (Fig. 12a). Additionally we set the Bessel filter for the current 642 data to cut ∕ 2 = 4.59 or cut = 90 kHz. The fastest used analysing frequency is ana = 500 Hz. We scale 643 mean photo brightness b and background noise down such that the signal-to-noise ratio of the 644 lower integration frequency data is the same as of the high frequency data b ∕ int = . We do 645 that in order to separate the bias from the finite integration time from other effects such as a better 646 signal to noise ratios for each integrated point. Note that we only analyzed the plot until int = ana .

647
Both algorithms incur very similar bias due to the finite integration time (Fig. 12b). The KF (green) 648 is more precise for high integration frequencies cut ∕ 2 until cut ∕ 2 ≈ 0.08 then the RE approach 649 becomes more robust. Similar to Bessel-filtered current data (Fig. 11b) on the single parameter 650 level the systematic deviations start early for example int = 10kHz for 21 (Fig. 12c4). Possibly the 651 systematic deviations start (Fig. 12c2) already at int = 50kHz for 32 . The sudden increase of the 652 Euclidean error (Fig. 12b) of the mean median at cut ∕ 2 ≈ 0.2 occurs in this case not due to int 653 approaching cut but due to int ⪅ 1,2 for many ligand concentrations. To show this we plot the 654 results of the fitting of 5 different data sets without including the highest 4 ligand concentrations 655 (red) which means the largest eigenvalues are much smaller (Fig. 12b,c1-6). Additionally we keep 656 int = . Though fluctuations of the posterior medians are higher, the KF becomes more robust.

657
Note, that the fastest eigenvalues of these reduced data sets are indicated by the blue bars (Fig. 12b,   658 c4). Based on the Euclidean error ( Fig. 11a and Fig. 12a) the robustness of both algorithms against 659 the cut-off frequency is compared with the robustness against the integration frequency found to 660 be about an order of magnitude higher. That is related to a specific detail of the model used: the 661 binding reaction, corresponds to the fastest time scales of the overall dynamic (difference between 662 Fig. 1b and c), which is exposed by the fluorescence signal. Thus, kinetic analysis of any data should  10 1 10 0 10 1 10 2 f int /kHz 10 1 10 0 10 1 10 2 f int /kHz 10 0 10 1 f int /kHz 10 1 10 0 10 1 10 2 f int /kHz 10 1 10 0 10 1 10 2 f int /kHz 10 1 10 0 10 1 10 2 f int /kHz 10 0 10 1 10 2 f int /kHZ We simulated 5 different 100kHz signals. All forms of noise were added and then the fluorescence signal was summed up using a sliding window to account for the integration time to produce one digital data point. The activation curves were then analyzed with the Bayesian filter at 125Hz and the deactivation curves at 166 − 500kHz, see caption of fig. 8. We plot the 0.6−quantile (interval between the 20th and the 80th percentile) to mimic ± one standard deviation from the mean as well as the mean of the distribution of the maxima of the posterior for different data sets. (Note, this is not equivalent to the mean and quantiles of the posterior of a single data set.) The quantiles represent the randomness of the data while the error bars indicate the standard error of the mean maximum of the posterior. Blue (RE) and green (KF) indicate the two algorithms with the standard data set while red (KF) shows examples that use only the six smallest ligand concentrations for the analysis in order to limit the highest eigenvalues. a, Instantaneous probing of the ligand binding (blue) compared with a probing signal which runs at int = 1kHz. The integrated brightness of the bound ligand is b = 5 photons per frame. Although the red curves seem like decent measurements of the process except for the highest two shown ligand concentrations, the mean error is roughly an order of magnitude worse than for int = 10kHz. Note, that for visualization we plot at a higher frequency than the Kalman filter analyzed the data. b, Estimate of the distribution of the (Euclidean error of the mean median of the posterior) vs. the scaled integration frequency int ∕ 2 = 1∕( 2 ⋅ integration ). We use integration frequency instead of the integration time to make the plot comparable to the Bessel filter plot. The solid line is the mean median of 5 data sets of the respective KF posterior (green, red) and RE posterior (blue). The shaded areas indicate the 0.6−quantile which visualizes the spread of the distribution of point estimates. The two fastest time scales (eigenvalues) at the highest ligand concentration are indicated by the vertical black lines, the time scales of the next lower ligand concentrations with the red vertical lines. c 1-3, Accuracy (bias) and precision of the maxima of the posterior , rates vs. the integration frequency. can be found on the git hub page https://github.com/JanMuench/Tutorial_Patch-clamp_data and for 673 cPCF data https://github.com/JanMuench/Tutorial_Bayesian_Filter_cPCF_data.

674
Compared to previous attempts Moffatt (2007), this mathematical generalization is necessary 675 (Fig. 3b) in order to use Bayesian filters on macroscopic PC or cPCF data. With our algorithm we 676 demonstrate ( Fig. 3c and 7) that the common assumption that for large ensembles of ion channels 677 simpler deterministic modeling by RE approaches is on par with stochastic modeling, such as a KF, 678 is wrong in terms of Euclidean error and uncertainty quantification (Fig. 5a-c and Fig. 6a-b). 679 Enriching the data by fluorescence-based ligand binding reveals two regimes. In one regime the 680 two-dimensional data increase the accuracy of the parameter estimates up to ≈ 10-fold (Fig. 4a,c).

681
In the other regime of lower channel expression, enriching the data by the second observable, 682 makes non-identified parameters to identified parameters. The second observable in cPCF data 683 decreases the overfitting tendency (Fig. 4a,b,d) of the RE approach on the true process. Thus, in of signal-to-noise ratios that we tested). This even holds true for very pessimistic signal-to-noise 690 assumptions Fig. 6b. If HDCVs based on an RE approach cannot be trusted, the same applies to 691 confidence volumes based on the curvature of the likelihood. This is not the case for the KF which 692 delivers properly shaped posteriors (Fig. 6a-c and Fig. 5a-c). Increasing the model complexity, at 693 unchanged PC data quality (Fig. 7) shows that the RE approach displays unidentified rates even for 694 large ion channel ensembles while our approach identified all parameters for the same data. 695 We also investigated the robustness of both algorithms against the cut-off frequency of a Bessel 696 filter (Fig. 11) and showed the overall superior robustness of the KF against errors of analog filter-697 ing compared to the RE approach. Analog filtering has its limitations due to distorting the higher 698 frequencies of the Fourier spectrum of the signal. Thus, one should let the KF sample as fast as 699 possible, with a cut-off frequency of at least one order of magnitude higher than the sampling fre-700 quency of the KF.

701
Similar to the Bessel filter, the KF is more robust than the RE approach against errors due to the fi-702 nite integration time. Nevertheless, it is crucial for both algorithms (Fig. 12), that the intrinsic time  vector.

776
Assuming that the state transitions can be modeled by a first order Markov process, the path prob-777 ability can be decomposed as the product of conditional probabilities as follows: Markov models (MMs) and rate models are widely used for modeling molecular kinetics (Appendix where exp is the matrix exponential. We aim to infer the elements of the rate matrix , constituting The probability vector obeys the discrete-time Master equation 793 We model the experimentally observed system as a collection of non-interacting channels. A sin-

Time evolution of an ensemble of identical non-interacting channels
In general, the initial ensemble will not have only one but multiple occupied channel states. Be- Evidently, the total number of channels is conserved during propagation. The distribution of ( +1), 817 defined by Eqs. 30 and 31, is called the generalized multinomial distribution: While no simple expression exists for the generalized multinomial distribution, closed form ex-819 pressions for its moments can be readily derived. For large ch each ℙ( ( ) ( + 1) | ) can be 820 approximated by a multivariate-normal distribution such that also general−multinomial( ( ), ) has a 821 multivariate-normal approximation. In the next section we combine the kinetics of channel ensem- where ∶, denotes the column number of the transition matrix and diag( ∶, ) describes the diag-  One can now ask for the variance of a data point ( ) given the epistemic and aleatory uncertainty 887 of ( ) encoded by ( ) in Eq. 38d. By using the law of total variance the signal variance follows as: The total emission rate Fl can be modeled as a weighted sum of the specific emission rates The larger Fl the better is the approximation. We assume, that the confocal volume is equally 897 illuminated. For our model of ligand fluorescence, we assume for a moment that there is no signal 898 coming from ligands in the bulk. We will drop this assumption in the next section. With these 899 assumptions, we arrive at the following observation matrix The matrix is a diagonal matrix. To derive the covariance matrix cov( ( )) we need to additionally 914 calculate var( f luo ( )) and cov( f luo ( ), patch ( )). By the same arguments as above we get We assumed that the Poisson distribution is well captured by the normal approximation. In cPCF 918 data the ligand binding to only a sub-ensemble of the channels is monitored, which we assume 919 to represent the conducting ensemble such that ch,FL = ch,I . For real data further refinement 920 might be necessary to model the randomness of the sub-ensemble in the summed voxels. With we derived a conditional prior which allows at maximum a ±0.005 relative deviation from the strict 944 microscopic-reversibility. The ±0.005 micro-reversibility constraint is applied in (Fig. 7b-d). The ratio of the number of used samples to the number of likelihood evaluations can serve as a simple efficiency measure. However, this ratio is a too optimistic measure as revealed by a look into the internal dynamics of Hamiltonian Monte Carlo sampler (HMC). HMC treats the current parameter sample as part of a phase space point of a (dynamic) Hamiltonian system. The force acting on the system is the derivative of the negative logarithm of the posterior density Betancourt (2017). Each suggested sample is derived from a ballistic movement with random kinetic energy in the augmented parameter space. Calculating this movement is done by integration of the corresponding Hamiltonian equations. Therefore, one suggested sample requires many evaluations of the gradient of the posterior distribution. How many evaluations were needed in turn depends on the shape of the posterior and the length of the ballistic movement. Strong changes of the gradient of the posterior (high curvature of the posterior) require more integration steps. Setting all those sampler parameters requires either expert knowledge about the statistical model or a sampler which automatically learns optimal parameter settings (warm-up) before it starts the actual sampling. Nevertheless, HMC generates samples ( Fig. 1a) from high-dimensional, correlated posteriors more efficiently than many other classical Markov chain Monte Carlo methods Betancourt (2017). By more efficient we mean the product of the speed of drawing one correlated sample with the amount of iterations needed for the autocorrelation function of those samples to vanish sufficiently. By construction, the samples from any Markov chain Monte Carlo method are correlated (Fig. 1b). The quality of the adaptive NUTS HMC sampler provided by Stan can by seen in Fig. 1b. The sample traces show a tiny anticorrelation whose absolute value is hardly larger than 0.05. This is achieved only by letting the sampler adapt to the geometry of the posterior in the warm-up phase. The samples collected during the warm-up phase are discarded.
Hereinafter, we show that likelihood and sampler considerations are one aspect of computational efficiency. The other is the implementation of a parallelized algorithm. The blue curve (Fig. 2) represents the scaling of the KF with an implementation of the algorithm which saves besides the sampled parameters the predicted mean and the covariance of the signal. This means that for each parameter sample 5 ⋅ traces ⋅ data ∕ CPU more memory operations are needed on the computer node. Note, since we assigned each ligand concentration jump and the following data points to its own CPU the following relation holds: data ∕ CPU = data ∕ trace. Skipping saving those derived quantities (orange curve) creates a speedup of roughly 2 orders of magnitude. The derived quantities are redundant quantities which we suggest to calculate by feeding a small subsample of the posterior samples to the KF filter which then reanalyzes the traces by the given draws but does not redraw from the posterior. See Git-Hub: https://github.com/JanMuench/Tutorial_Patch-clamp_data We compare the computation time for the KF (green) and the RE approach (blue) for the 4state model (Fig 3a) versus the the data quality by ch . The calculation time calc rises for both algorithms roughly like ∼ √ ch . For each curve the likelihood evaluations stay constant for the whole plot. Thus this scaling relates to the integration time of the HMC sampler. Taking the average of the last 5 calculation times shows that the KF is about 2.7 ± 0.2 times slower than the RE approach. The computation time for cPCF data of the 5-state-2-open states model (Fig 3b) shows a similar ∼ √ ch -regime until it seems to become independent of ch . Note, the different ch interval which is displayed. Taking the average of the last 4 computation times shows that the KF is about 2.24±0.01 times slower than the RE approach. Surprisingly, with increasing model complexity the ratio of the computation time becomes not necessarily larger. The computation time for cPCF data of the 6-state-1-open state model (Fig 3c) shows a similar ∼ √ ch -regime until it becomes independent of ch . Taking the average of the last 4 calculation times reveals that the KF is about 3.3 ± 0.3 times slower than the RE approach. With increasing model complexity, the ratio of the computation times becomes larger. Note, that there is little variation of the absolute calculation time of the algorithm across model complexity which is a good prospect for more complex kinetic schemes. Nevertheless, to achieve this computational speed we used in total 72 CPUs on one node, 20 CPUs per chain for 20 time traces. Thus, nodes with more then 80 CPUs would enable a better performance. Overall, the more complex likelihood calculations of the KF require 2-3 times more calculation for the tested models and cPCF data. The green curve belongs to the results from the KF, the blue curve to the results from the RE approach. a, 4-state model with PC data. In contrast to the cPCF data (Fig. 3), ana is increased by an order of magnitude. Thus, 10 times more data points per time trace and CPU are evaluated. These results were obtained by 4 independent chains which drew 9,000 samples per chain, including a large fraction of 7,000 warm-up samples per chain.b, 5-states model with PC data. Note that here ana equals the analyzing frequency of cPCF data (Fig. 3). Surprisingly, for most part of the displayed interval the KF is faster than the RE approach (Fig 4a) for PC data. Considering the more expensive likelihood evaluation (prediction and correction) of the KF, this result reminds that this computational time benchmark is a downstream test that integrates all aspects of the sampling. One possible explanation could be the following: We describe in the main text ( Fig. 4b1-6) for the RE approach, that due to the treatment of every data point as an individual draw from a multinomial distribution, HDCVs (Fig. 5-6) are too narrow. This problem gets exacerbated (Fig 3c) as we used 10 times more data points. The narrower the posterior is the more integration steps from the sampler are needed to calculate the proposed trajectory with sufficient precision. Another example displayed (Fig 4b), has an increased kinetic scheme complexity. Note that we reduced the analysing frequency 10 times to be comparable with the analyzing frequency of cPCF data. Under this condition, the intuition from likelihood calculation complexity is confirmed that the KF is slower. Taking the average of the last 4 calculation times shows that the KF is about 1.8 ± 0.2 slower than the RE algorithm for PC data.  (2021) ensuring that the sampled traces represent the typical set, is beyond the scope of this paper. In short, we usually worked with 4 independent sampling chains of the posterior Gelman and Rubin (1992). On the one hand, we visually checked that all 4 chains do report the same typical set to verify that the sampler is fine. This convergence can be monitored by thê statistics Gelman et al. (1992); Vehtari et al. (2021).̂ evaluates differences between the within-and between-chain parameter estimates. A bad mixing is indicated by large values of̂ . Perfectly converged chains would be reported witĥ = 1. Additionally, the effective sample size ef f of each object of interest needs to be monitored. Both of them are reported by the "summary" method from the "fit" object in PyStan. We show (Table 1) an edited version of the output from this method. The last two columns would indicate signs of missing convergence. The second column reports the Markov sampling error telling if we drew enough samples in total. Note that any quantity far out in the tails would have a much larger error. There were cases when the sampler was not fine for all chains, meaning that 3 chains showed the same posterior but one showed something different. In such cases we simply discarded that chain. This occurred in particular more often with the RE approach. As we are benchmarking methods and know the true parameter values, it was rather obvious which chain did not work well. In particular, the successful binomial test of the HDCV is a downstream test that confirms besides the assumptions of the algorithm also whether the sampler worked sufficiently. For the Bayesian filter, sampler problems occurred at very low ion channel counts below ch = 200.  Table 1. Typical summary statistics of the posterior as printed by the "summary" method from the "fit" object in Pystan. We deleted the 25% and 75% percentiles to decrease the size of the table and changed some of the names with respect to the used name in the algorithm to the symbol used in this article. Note that we assumed to measure 2 ligand concentrations (which includes 4 jumps) from one patch. Thus, we assumed for 10 ligand concentrations that they are coming from 5 patches which leads to a 5 dimensional channels per patch vector ch . The bias of ch discussed in the main article, is also observed here.  (2014). The three parameters, discretization time , the metric and the number of steps taken can be predefined or/and are adapted in the warm-up phase of the sampler. A typical number of iterations warm in the warm-up phase for each chain is between warm = 3 ⋅ 10 3 … 7 ⋅ 10 3 . Since a sufficient warm depends on the correlation structure of the posterior, which depends on the quality and quantity of the data, we did not engage to optimize the number of warm-up steps. We rather checked whether for a set of different data (for example with different ch ) all chains are fine. Stan is a probabilistic programming language that provides the research community with easy access to HMC sampling, variational Bayes, and optimization procedures like Maximum posterior/ML approaches. Stan uses automatic analytical differentiation of the statistical model such that no analytical treatment by hand has to be done for any tested model. The only requirement from the user is to define a statistical model of the data. The Stan compiler calculates the analytical derivatives of the model and then translates the statistical model to C++ code and after that into an executable program. Furthermore, Stan uses adaptive HMC such that the length of the ballistic trajectory as well as the variance of the random kinetic energy to create the ballistic trajectory and the number of integration steps are optimized automatically. Then the program can be started from any high-level data processing language of the user's choice such as Python, Matlab, Julia, R or from the command line. and a reference dye was present. The laser intensities covered 1.6 orders of magnitude at constant detection settings. The data points were obtained from 1.4 ⋅ 10 6 pixel. The red and blue lines indicate the theoretical prediction for a Poisson and Gamma distribution, respectively, assuming = 1. a, Variance vs. average. The linear relation allows to relate the measured a.u. (top, right axis) to photons (bottom, left axis). b, Skewness. c, Excess kurtosis. The higher moments are small but the values are slightly larger than theoretically predicted. The insets provide a corresponding log-log plot. Important for the KF algorithm is that skewness and excess kurtosis is small. Background noise statistics 1512 In cPCF measurements with fluorescence-labeled ligands, the signals of the ligands bound to the receptors overlap with the signals from freely diffusing fluorescence-labelled ligands in the bulk. This bulk signal is subtracted from the total signalBiskup et al. (2007). While the mean difference signal f l,k ( ) of the confocal voxel represents the bound ligands in that voxel, its noise , ( ) originates from both bound and bulk ligands. The additional bulk signal, e.g. the fraction of bulk solution inside that voxel, varies from voxel to voxel and can hardly be described theoretically. Nevertheless, it can be determined experimentally Biskup et al. (2007). At low expression levels or at ligand concentrations above low nano-molar levels, this background signal is not negligible. It scales linearly with the ligand concentration, while the signal from bound receptors depends on the affinity, as estimated by the concentration of half maximum binding 50 , and the number of ion channels in the membrane of the observed volume. The binding signal saturates at high concentrations (Appendix Fig. 2). Thus, both high affinity (low 50 ) and high expression reduce the relative contribution of the background to the overall signal, improving the signal to noise ratio. Practically, the bulk signal is estimated by counter-staining the solution with a spectrally distinct reference dye Biskup et al. (2007). The spatial distribution of this dye mimics the spatial distribution of the freely diffusing ligands. The bulk absolute concentration as well 49 of 59 as the molecular brightness of the reference dye and the labeled ligand differ. Hence, the binding signal is calculated as the average pixel intensity of the scaled difference image between the signal of labeled ligand and reference dye according to assuming that the dark count signal has been correctly subtracted. Then we add the bulk signal to the bound ligand signal. In this way we produce a time trace with colored noise by the Gillespie algorithm and add white noise to time traces as it is observed in real experiments.
Deriving the moments of the background noise for the difference signal For the KF the variance, skewness and kurtosis arising from the background noise has to be calculated. Skewness and excess kurtosis of the distribution have to be small compared to the total variance of the signal including all noise sources because only in this case the KF algorithm can be considered as the optimal solution for the filtering and inference problem ( 1 ) ] are zero since is independent of and [ m ] = 0. Our derivation is equivalent to marginalization over the predicted normal prior of the ensemble state ℙ( ( )| −1 ) at the time of the measurement except that the prior distribution could be any probability distribution with some mean and variance. Eq. 84 is the classical KF variance prediction of a signal. The first term in Eq. 84, describes the variance from stochastic gating and that the ensemble state is hidden. Notably, by Methods Eq. 30 we realize that var( ( )) contains information about and ( − 1), which we can exploit with the KF framework.
The second noise term op is defined in terms of the first two moments ( op ) = 0 and therefore var( op ) = ( 2 op ) = 2 op 4 ( ). To the best of our knowledge such a state-dependent noise makes the following integration intractable When assuming that the relative fluctuations of ( ) are on average small then 4 in the denominator is close to ( 4 ) of the state. Thus, the incremental likelihood can be written 52 of 59 as in the standard KF, with the only difference that the measurement noise is the sum of two components.
( ) ∼ normal( ( ), 2 m + 2 op 4 ( ) + ⊤ ) To see that this approximation of the variance is correct, we apply the law of total variance decomposition Weiss (2005). The terms ( ) ⊤ + 2 are the standard output covariance matrix. Again ( ) contains information about , ( − 1) while the additional variance term includes information about the current ( ). The information contained in the noise influences likelihood in two ways. By the variance or covariance of the current ( ) but also for ( + 1) in correction step by the Kalman gain Kal matrix defined in the next section.
In Fig. 4c we compare the true against the expected success probabilities of finding the complete true rate matrix within an -dimensional HDCV for different expected success probabilities of a perfect model. By "expected" and "perfect" we refer to a fictitious ideal algorithm which exactly models exhaustively all details of the true process, including all measurement details. The first way assumes a multivariate normal posterior distribution with ℙ( | ) post ≈ normal( | [ ], ). Then the ellipsoid of a constant probability density is exactly the surface of a HDCV of given probability mass . For a two-dimensional representation see Fig. 4d below the diagonal. In one dimension, the ellipsoid consists of the two points with a distance Mah = √ ( − [ ]) 2 ∕ from the mean (inset Fig. 5a). In general, the -dimensional ellipsoids around the mean of a multivariate normal distribution can be described by points which have the same Mahalanobis distance Mah from the mean.
reveals that all points with Mah = have the same probability density. The random variable 2 Mah is -square distributed. Thus the -square distribution (Fig. 5a) is the probability density of drawing an in dimensions and finding it on the ellipsoid's − 1-dimensional surface, at a distance Mah from the mean. This allows us to use the cumulative 2 -square distribution function to calculate the probability mass inside the ellipsoid which is the desired HDCV. By evaluating Mah = √ ( true − [ ])Σ −1 ( true − [ ]) and 2 cdf ( 2 Mah ) we specify how much volume, in units of probability mass, has to be counted until the volume includes the true value. Note, that with increasing dimensionality (Fig. 5a) the probability mass is shifted away from the mode of the probability density. For general probability densities, this is also true in higher dimensions where one only rarely finds the true value within a small distance to the maximum of the probability density. Note, that there is no mathematical theorem for singular models Watanabe (2007) saying that the posterior approaches asymptotically a multivariate normal with increasing data quality/quantity. Consequently, the underlying assumption that the posterior distribution is multivariate normal, is situation dependent valid or highly questionable. Thus the displayed method should be validated additionally in an independent way. To determine the probability mass needed to include the true value into the HDCV for such a posterior, we can also use a histogram-based method. One starts by constructing an -dimensional histogram from the samples of the posterior and by initializing a global variable with zero. Then we start counting from the bin with most counts and check whether the true value falls inside this bin. If not, we add the probability mass inside that bin to the global variable. Repeating this for next lower bins eventually leads to the detection of the bin with the true rate matrix inside. On detection, the global variable contains the sought-after value . While this procedure does not depend on a multivariate normal assumption, it is prone to errors due to the discrete bins and the finite samples from the posterior. Nevertheless, both procedures show (Fig. 5b) good agreement when plotting the volume/ 56 of 59 probability mass needed to include the true parameter value vs. ch . Again ch ≈ 200 seems to be a reasonable data quality to trust the statistics of the posterior using the KF. In contrast, the posterior of the RE approach never includes the true values with a reasonable HDCV. Note, that a probability mass of ≈ 1 to reach the true value means that qualitatively speaking the estimate of the RE approach is infinitely far away from the true value in terms of the Mahalanobis distance. Further, due to the finite sampling the method of histogram counting is not qualified for the largest HDCVs approaching ≈ 1. The two developed methods contrast the Euclidean distance from the true values to the maximum mode, median or even the mean of the distribution against all higher moments of the distribution. Thus it tests the overall shape of the posterior.  The effect of the second observable on the single parameter level for the 5-state model can be seen in App. Fig. 1. The biased estimates and the unidentified parameters of the RE approach for PC data (Fig. 9 of the main text) are eliminated by the fluorescence data.  Fig. 7 e. Compared to the main text we switched rows with columns of the panels, the first column represents now the rates and the second column the equilibrium constants. The prior that enforces microscopic-reversibility is identical to the prior mentioned in the Fig. 7 of the main text. Even the over-confidence problem seems to be decreased (compare with Fig. 4b1-6). In contrast, the HDCIs of the 6-states-1-open-state model for cPCF data (Appendix 9 Fig. 2) display again a much increased over-confidence problem of the rate equation approach. This indicates that not simply counting of states but the actual topology of the kinetic scheme and the structure of the observation matrix determines the scale of the over-confidence problem. Apparently, the more information comes from the fluorescence data relative to the information from the current data the less grave is the over-confidence problem and the less different are the Euclidean errors.