A novel, fast, HMM-with-Duration implementation – for application with a new, pattern recognition informed, nanopore detector

Background Hidden Markov Models (HMMs) provide an excellent means for structure identification and feature extraction on stochastic sequential data. An HMM-with-Duration (HMMwD) is an HMM that can also exactly model the hidden-label length (recurrence) distributions – while the regular HMM will impose a best-fit geometric distribution in its modeling/representation. Results A Novel, Fast, HMM-with-Duration (HMMwD) Implementation is presented, and experimental results are shown that demonstrate its performance on two-state synthetic data designed to model Nanopore Detector Data. The HMMwD experimental results are compared to (i) the ideal model and to (ii) the conventional HMM. Its accuracy is clearly an improvement over the standard HMM, and matches that of the ideal solution in many cases where the standard HMM does not. Computationally, the new HMMwD has all the speed advantages of the conventional (simpler) HMM implementation. In preliminary work shown here, HMM feature extraction is then used to establish the first pattern recognition-informed (PRI) sampling control of a Nanopore Detector Device (on a "live" data-stream). Conclusion The improved accuracy of the new HMMwD implementation, at the same order of computational cost as the standard HMM, is an important augmentation for applications in gene structure identification and channel current analysis, especially PRI sampling control, for example, where speed is essential. The PRI experiment was designed to inherit the high accuracy of the well characterized and distinctive blockades of the DNA hairpin molecules used as controls (or blockade "test-probes"). For this test set, the accuracy inherited is 99.9%.


Introduction
It appears possible to obtain kinetic features directly from the channel blockade signals obtained during the capture of certain molecules in a nanopore detector, shown in Fig.  1 (see further details on the Detector in the Background), where individual blockade levels appear to correlate with binding or conformational states of the molecule [1][2][3]. The extraction of kinetic features from nanopore detector measurements, e.g., obtaining the median dwell times of the most frequented channel blockade levels, requires that we faithfully preserve the dwell times of the various blockade states (or "levels") encountered during the channelcapture event, to the exclusion of short noise pulses that might normally be misinterpreted as short dwell times. During analysis using conventional Hidden Markov Models (HMMs), both the combination of first-order modeling and pulsed noise conspire to produce premature state transitions and hence incorrect assessment of kinetic features. From the preliminary work in [4], in particular, we know that bi-level synthetic data with Poisson-distributed dwell times provides an example of such a pulsed noise instability. Here we clarify how to solve the problem with (1) dwell-time dependence in the state-to-state transitions; and (2) emission variance amplification (EVA projection); and show new experimental results.
The conventional HMM is first order and fixed (scalar) in the transition probabilities for remaining in a given state, which leads to a geometric length distribution for remaining in that state [5] -i.e., conventional HMMs automatically impose geometric length distributions on their same-state regions, such as exon or intron lengths or blockade level durations. An HMM-with-Duration (HMMwD) is an HMM where true, or a much more complete, knowledge of the length distributions on same-state regions is incorporated into the model [6]. Here we describe a novel HMMwD where the non-geometric length distribution information is incorporated via dwelltime dependent transition probabilities (for transitions of state to self) [4]. New experimental results are shown, and compared to an exact HMMwD (described in [7]).
Part of the novelty of the new "cellular" HMMwD that is proposed is that it is defined at the cell-level in its dynamic programming table construction, much like the conventional HMM, with one column's computation only dependent on information held in the prior column (in an overall table computation involving a single pass through the table). Our HMMwD can be defined for either the Viterbi or the Forward/Backward algorithms (see Methods). This is convenient because we have a method for distributed HMM processing based on such table computations (paper in preparation), that takes advantage of the basic, underlying Markov assumption to do distributed processing with simplicity of the "chunk" distributed processing that we use for the SVM training [8]. That method is shown to work very effectively for the Viterbi algorithm (similar distribution methods for the Forward/Backward distribution algorithm are also discussed there). The end-result of all of this is that very sophisticated feature extraction tools can be brought to bear for real-time pattern recognition informed operation.
Real-time control of a nanopore detector, based on live, streaming measurements and sufficiently fast pattern recognition identification of any blockading ("captured") analyte, holds great promise for single molecule experiments. Real-time sampling control of a nanopore detector, alone, can boost nanopore detector sampling productivity by orders of magnitude, depending on the mix of desirable signal classes vs. undesirable in the data being analyzed. An example of such an experiment is the focus of the proof-of-principle experiment performed here. If there is a 1 to 100 ratio of desirable to undesirable, for example, then one obtains desirable signal sampling only about 1% of the time with a passive sampling system. With pattern recognition informed sampling this can potentially be changed to desirable signal sampling almost 99% of the time. In a real-time setting the challenge is to perform the HMM feature extraction sufficiently quickly (whereas the SVM is trained off-line, so operates very quickly on-line). In this work we show that this can be accomplished with pattern recognition used to identify DNA molecules within the first few hundred milliseconds of their blockade of the detector channel.
We establish a real-time experimental setup for Patter Recognition Informed (PRI) sampling control via integration of LabWindows automation capabilities with our "inhouse" Channel Current Cheminformatics (CCC) methods (see Methods). Data acquired with LabWindows is passed to a network of CCC software clients, on a streaming real time basis, for analysis and classification. The classification results are then quickly returned to the LabWindows automation software for experimental feedback control. Further details of a real-time test of PRI sampling is described in the Results. The classification inherits the 99.9% accuracy of the non real-time implementation (established in prior work [1]) as nothing has changed in regards to the features extracted and the classifier used.
To make these Channel Current Cheminformatics and Machine Learning tools available to fellow researchers, we are developing web-accessible machine-learning tools. Using hidden Markov model (HMM) processing, and finite state automata (FSAs), we are able to extract robust features and obtain very accurate support vector machine (SVM) classification results. The Machine Learning webinterfaces described here are for both machine learning experts and non-experts, particularly biologists and biophysicists/biochemists. For non-experts, default values are specified on the key parameters.
Web-accessible tools for HMM-based feature extraction and SVM classification are accessible at http:// Left Panel: A lipid bilayer supports the alpha-hemolysin heptamer that creates a pore, or channel used to collect the data, as shown left Figure 1 Left Panel: A lipid bilayer supports the alpha-hemolysin heptamer that creates a pore, or channel used to collect the data, as shown left. The channel is supported by an aperture, which allows the flow of ions between cis (here, left) and trans (here, right) wells. Right Panel: The assembled α-hemolysin pore shown to scale, with a captured dsDNA molecule. As shown, the double stranded form is too wide to pass through the pore, while a single strand may pass through.
logos.cs.uno.edu/~nano/. Examples of the Web interfaces are shown in the Results. The web tools can help in identification of blockade levels, the level transition and lifetime characteristics, and the fast blockade "spike" characteristics. The SVM classification is of general use for any kind of classification problem, and a number of novel kernels and novel implementations are employed. SVMbased clustering is also implemented in a novel way to yield a non-parametric clustering approach, which is used to cluster signals into multiple classes (particularly important for complex multi-orientation data-sampling situations such as with an antibody).

Cheminformatics
The pattern recognition informed (PRI) signal processing architecture builds and "closes the loop" on the prototype architecture presented in [1] (see Fig. 2). The signal processing architecture is used to perform a preliminary test of PRI sampling control (see Results). The processing is designed to rapidly extract useful information from noisy blockade signals using feature extraction protocols, wavelet analysis, Hidden Markov Models (HMMs) and Support Vector Machines (SVMs). For blockade signal acquisition and simple, time-domain feature-extraction, a time-domain Finite State Automaton (τFSA) approach is used [37] that is based on tuning a variety of threshold parameters (see Fig. 3 and [1] for full description of the model). A generic HMM is then used to characterize current blockades by identifying a sequence of sub-blockades as a sequence of state emissions [1,[9][10][11]. The parameters of the generic-HMM can then be estimated using a method called Expectation/Maximization, or 'EM" [5], to effect de-noising. The HMM method with EM is part of the standard implementation used in what follows. Classification of feature vectors obtained by the HMM for each individual blockade event is then performed using SVMs.

EVA projection
The HMM method is based on a stationary set of emission and transition probabilities. Emission broadening, via amplification of the emission state variances, is a filtering heuristic that leads to level-projection that strongly preserves transition times between major levels (see Fig. 4, and [4], and Methods for further details). This approach does not require the user to define the number of levels (classes), which is a major advantage compared to existing tools that require the user to determine the levels (classes) and perform a state projection. This allows kinetic features to be extracted with a "simple" FSA (Finite State Automaton) that requires minimal tuning. Figure 5 (reproduced from [4]) shows the benefits of EVA filtering.

Preliminary HMM with Duration and EVA for channel current signal analysis
One important application of the HMM-with-duration method used in [4] includes kinetic feature extraction from EVA projected channel current data (the HMM-with-Duration is shown to offer a critical stabilizing capability in an example in [4]). The EVA-projected/HMMwD processing offers a hands-off (minimal tuning) method for extracting the mean dwell times for various blockade states (the core kinetic information).
The HMM-with-Duration implementation, described in [4], has been tested in terms of its performance at parsing synthetic blockade signals (see Fig. 5). The synthetic data ranges over an exhaustive set of possibilities for thorough testing of the HMM-with-Duration. The synthetic data used in [4] was designed to have two levels with lifetime in each level determined by a governing distribution (Poisson and Gaussian distributions with a range of mean values were considered). The HMM here was performed with 0 EM iterations.

SVM classification used in PRI sampling selection
Support Vector Machines (SVMs) are variational-calculus based methods that are constrained to have structural risk minimization (SRM) such that they provide noise tolerant solutions for pattern recognition [38,39]. Simply put, an SVM determines a hyperplane that optimally separates one class from another (see Fig. 6). Once learned, the hyperplane allows data to be classified according to the region (separated by the hyperplane) in which it resides. Currently there are two approaches to implementing multiclass SVMs. One arranges several binary classifiers as a decision tree such that they perform a multi-class decision-making function (SVM-external classification -an example of this is the architecture used here, see Fig. 2). The second approach involves solving a single optimization problem corresponding to the entire data set (with multiple hyperplanes), with multi-class discriminator optimization performed internally. The SVM-internal approach, when it is stable and properly generalizable (an active area of research), is preferred (see Results for interface), since a tuning over Decision tree topologies and weightings is avoided [40]. The on-line discriminatory speed of a trained SVM is simply that of evaluating an inner product, so it's operational constraint on the PRI feedback endeavor is negligible compared to that of the HMM feature extraction stage. For this reason, there is little discussion of SVMs in this paper, even though SVMs Nanopore Cheminformatics and Control Architecture Figure 2 Nanopore Cheminformatics and Control Architecture. LabWindows Server now used. Data sent to cluster of Linux Clients via TCP/IP channel. Linux clients run expensive HMM analysis as distributed processes (similarly for off-line SVM training). The sample classification is used by the Server to provide feedback to the nanopore apparatus to increase the effective sampling time on the molecules of interest (this can boost nanopore detector productivity by magnitudes).
comprise much of the complexity of the HMM/SVM PRI feedback system.

Generalized HMMs
The work by Johnson in [41] is an excellent source of references on generalized HMMs (but entirely focused on speech applications). The approaches to HMM duration described in [41] are broken into three categories: 1) Hidden semi-Markov models (HSMMs), 2) variable transition HMMs (VTHMMs), and 3) standard HMMs with more states, etc. The current work thus would fall into the VTHMM category. In [41] they assert that VTHMM methods are essentially variations on Ferguson's [42] explicit duration HMM (EDHMM), placed in the HSMM category, yet acknowledge the differences in computational complexity. In [43], Levinson gives estimates of transitions and emissions for state durations modeled by gamma distributions, but no explicit method provided or used to decode actual signals. In [44], Ramesh et al estimate the duration dependent state transitions, A ij (d), in general rather than assume, as in the current work, that A ij (d) = C ij for i ≠ j, but do not offer any performance analysis per se of the Viterbi decoding algorithm. Also, [44] provides insight into the work by Ferguson [41] in that, the expansion state HMM (ESHMM) in [41] uses multiple expansion states in order to finitely approximate the duration of a given state of the original system. The current work uses only one model state for each known physical state, and uses time dependent self-transition probabilities to capture state duration information.
In [45], Mitchell et al model duration explicitly using expansion substates, rather than an explicitly time dependent self-transition probability as in the current work. It also manages computational complexity via parallel platform architecture. The paper listed as [46], is the only prior work that is found to model the self-transitions using the cumulative duration in the form A ii (d i ) = 1 -P(d i ), and this was only in a much more restricted sense than that described here. The authors there considered only a two-state system for which the "splitting" in probabilities upon exiting one state is trivial. In [46], Vaseghi computes the duration dependent state transitions directly from the Viterbi output, thus avoiding the costly forward-backward computation and this is similar the methods employed here for general multi-state HMMs. This is useful in situations where the exit transitions are not already known and must be estimated from the data. Otherwise, like most of the references included here, though it can apply specifically to the current work, it involves algorithms that are more generic and computationally complex and not the clear, extremely fast, and simple implementation described here. HMM/EM Viterbi-path level occupation feature extraction. Strong EVA projection (effects of λ = 2 shown) is employed to project the data onto dominant levels, a Viterbi path Histogram then shows the barcode "fingerprints" of the different molecular species. The labels are for the DNA hairpins examined in [1], and since then have been used as controls.

=1 =2
to use the framework in [47] to confirm the choice of splitting factor for multiple exit transitions, see Discussion.) Park et al [48] also performs analysis using only left-to-right models, where the splitting factor for multiple exit transitions is not an issue. Though emissions for a given state in cheminformatic data are not expected to be time-dependent -other than noise component, future work would be to use the framework in [48] to confirm stationary emissions. Finally, in Yoma et al. [49] the analysis is restricted again to the typical left-to-right models of speech recognition, where considerations of splitting factor for exit transitions are trivial.

HMM with Duration experimental tests
Results for our new, implicit HMMwD, are presented in Figure 7 and in the figures in Additional Files 1-3, along with the comparative results from an explicit HMMwD (for a two state system) that is used for comparative analysis (a detailed analysis with the explicit HMMwD is given in [7]). The explicit two-state HMMwD has 2n states, where n is the number of dwell bins in the quantization of the dwell-time distribution. The computations that are needed scale quadratically in n. So, for n = 1000, the explicit HMMwD can take 1,000,000-times longer to compute than the HMMwD described here (or the conventional HMM). Performance of the new HMMwD is given in comparison to the conventional HMM is also Results using synthetic ~Poisson data Figure 5 Results using synthetic ~Poisson data. Synthetic data with Poisson distributed length statistics is shown in the upper trace. Emission broadening is introduced with an emission variance amplification factor of 4.5. This effectively broadens the noise band (thickness) seen in the upper trace by a factor of 4.5, which leads to a blurring between the upper and lower levels of blockade since the noise bands now overlap (i.e., here we purposefully over-project to lead to the typical toggling cross-over instability shown in the bottom trace). The middle trace shows the clean, highly accurate Viterbi parsing into the appropriate levels that is obtained with use of the HMM-with-Duration implementation. The lower trace shows the Viterbi parse with a simple HMM, that is uninformed about the underlying length distributions, thus giving rise to a Viterbi traceback parse that fails to penalize unlikely, very short duration, blockade events (seen as the unstable, rapid level-projection toggles). In another test, we compare the performance of our twostate HMMwD to explicit HMMwD and conventional HMM (see Tables 1 and 2), with different size data generation and length distribution representation. The comparative scores shown are not optimised for use of the internal HMMwD, but are meant to explore the validity of the model and the behaviour of the accuracy of the different methods -which is consistently ordered as conventional HMM good, implicit HMMwD better, explicit HMMwD best (but the latter at much great computational cost). The results clearly demonstrate the superior performance of the HMM-with-duration over its simpler HMM without Duration formulation. With use of the EVA-projection method this affords a robust means to obtain kinetic feature extraction. The HMM with duration is, thus, critical for accurate kinetic feature extraction, via a pairing of the HMM-with-Duration stabilization with EVA-projection.

Pattern recognition informed feedback via LabWindows automation
A blockade signal's stationary statistics, or phases thereof, reveals information about the kinetics of the biopolymer resulting from interactions with surroundings, or from undergoing conformational changes. LabVIEW Automation software is used to help manage the feedback linkage between patch-clamp amplifier measurements and inhouse cheminformatics software. This has been used to demonstrate molecular identification in the first 100 ms of capture, with return of classification information to the control of the amplifier -for voltage-controlled sample ejection if desired. Screen-captures of the interfaces are shown in the figures in Additional Files 5-7. Additional File 5 shows the Acquisition Server interface and LabWindows C development environment. Additional File 6 shows a real time 9at vs 9gc classification (with 9at identification indicated by the led being on). Additional File 7 shows a real time 9gc identification.
The LabWindows Server initiates the distributed CCC computations by sending data to a cluster of Linux Clients via a TCP/IP channel. The Linux clients run the expensive HMM analysis as distributed processes (similarly for offline SVM training). The sample classification is then used by the Server to provide feedback to the nanopore appara-

Difference between upper & lower emission means # Correct out of 10k
Geometric 2n-state Poisson

2-state Poisson
Average Viterbi Decoding Accuracy over 10 different trials (instances) of 10 k-length synthetic 3-level signal data, where all lev-els have identical Poisson duration but the separation (gaussian emission means) between the levels varies

Level Separation # Correct out of 10k
Geometric 3n-state Poisson

3-state Poisson
tus to increase the effective sampling time on the molecules of interest (meant to boost nanopore detector productivity by magnitudes, as described in the Background).

Channel current cheminformatics and Machine-Learning web interfaces
The tFSA/HMM-based channel current cheminformatics web-interfaces are as follows: (1), As the tFSA runs, signal regions are identified for processing by the HMM. (2), A 50-state generic-HMM (typically) is used to obtain feature vectors, to be used by a SVM, consisting of three components: (i) Blockade Histogram: blockade level occupation probabilities (stat's on Viterbi path); (ii) The Emission Variances; and (iii) Compression of Transition probabilities to weighted sum of transition profiles from two dominant levels (the UL and LL states) (one of the interfaces for this process is shown in the figure in Additional File 8). And (3), Kinetic features are then extracted (often with EVA-projected data from step 2). An SVM interface and all other software described is available via the webpage: http://logos.cs.uno.edu/~nano/. As with the external and internal multiclass SVM discriminator implementations, the strong performance of the binary SVM enables SVM-External as well as SVM-Internal approaches to clustering [40].

Conventional HMM
An HMM consists of 2 main computations (Durbin et al [5]): 1. Baum-Welch Iteration of recursively defined forward/ backward probabilities (symbols x, states π) In the HMMwD described here, each of the stationary transition probabilities a kl are replaced by a dwell-time dependent update factor. The forward/backward probabilities used in the standard HMM-EM algorithm occur when evaluating p(Z 0...L-1 ) by breaking the sequence probability p(Z 0...L-1 ) into two pieces via use of a single hidden variable treated as a Bayesian parameter: p(Z 0...L-1 ) = Σ k p(Z 0...i , s i = k)p(Z i+1...L-1 , s i = k) = Σ k f ki b ki , where f ki = Thus the resolution of the duration distribution for both the signal-generating and the signal-decoding HMM's are identical for all row entries. Upper and lower mean durations are .2 and .6 times that of the bin count, respectively. Upper and lower mean emissions are 50.0 and 45.0, respectively, whereas the emission variance for both upper and lower is 20.0. In other tests (see Fig. 7 and figures in the Additional Files), the decoding duration distribution is an increasingly refined step function approximation of the generating duration distribution. For the data shown in this table, both upper and lower dwell-time distributions are 2-component beta-mixtures that are monotonically decreasing. The numbers shown represent the accuracy of Viterbi decoding out of an input signal 1 M samples in length.  p(Z 0...i , s i = k) and b ki = p(Z i+1...L-1 , s i = k). Given stationarity, the state transition probabilities and the state probabilities at the ith observation satisfy the trivial relation p qi = Σ k a kq p k(i-1) , where p qi = p(S i = q), and p q0 = p(S = q), and the latter probabilities are the state priors. The trivial recursion relation that is implied can be thought of as an operator equation, with operation the product by a kq followed by summation (contraction) on the k index. The operator equation can be rewritten using an implied summation convention on repeated Greek-font indices (Einstein summation convention): p q = a βq p β . Transitionprobabilities in a similar operator role, but now taking into consideration local sequence information via the emission probabilities, are found in recursively defined expressions for the forward variables, f ki = e ki (a βk f β(i-1) ), and backward variables, b ki = a kβ e β(i+1) b β(i+1) . The recursive definitions on forward and backward variables permit efficient computation of observed sequence probabilities using dynamic programming tables. It is at this critical juncture that side information must mesh well with the states (column components in the table), i.e., in a manner like the emission or transition probabilities. Length information, for example, can be incorporated via length-distribution-biased transition probabilities (introduced in [4]), and that is what is experimentally validated done here.

EVA projection
Using a standard implementation of a HMM with emissions probabilities parameterized by Gaussian distributions: emission_probabilities[i][k] = exp(-(k -i)*(k -i)/ (2*variance)), where "i" and "k" are each a state where 0 <= i, k <= 49 in a 50 state system. To perform EVA, the variance is simply multiplied by a factor that essentially widens the gaussian distribution parameterized to best fit the emissions, and the equation simply becomes exp(-(ki)*(k-i)/(2*variance*eva_factor)). The choice of this amplification factor is important. If too large of a factor is used, then the power signal will be altered to the point where the state transition information will be invalid. But for a sizable range of this parameter, HMM with EVA will remove the noise from the power signal while strictly maintaining the state transitions. In practice, any multiplicative factor between 2 and 10 works well.
After EVA-projection, a simple FSA can easily extract level duration information. Each level is identified by a simple threshold of blockade readings, typically one or two percent of baseline. Then, the HMM with EVA processed data is swept through with a small window to eliminate any remaining noise or spike artifacts that may distort actual level duration statistics. It is important to note that there must be a sufficient amount of data to support the level duration statistics. A small sample will simply not be representative of the true kinetic information. Further, it is important that the FSA is tuned properly in order to properly identify levels.

HMM with Duration via cumulant transition probability
The transition probabilities for state 'e' to remain in state 'e', a "ee" transition can be computed as: Prob(ee | e length = L) = Prob(e length ≥ L + 1)/Prob(e length ≥ L). The transition probabilities out of state 'e' can have some subtleties, as shown in the following where the states are exon (e), intron (i), and junk (j). In this case, the transition probabilities governing the following transitions, (jj) -> (je), , where the total number of (ej) transitions is 60 and the total number of (ei) transitions is 40. The pseudocode to track the critical length information, on a cellular basis in the dynamic programming table, goes as follows: 1. Maintain separate counters for the junk, exon and intron regions.
2. The counters are updated as follows: a. The exon counter is set to 2 for a (je) -(ee) transition b. The exon counter gets incremented by 1 for every (ee) -(ee) transition Prob(e length ≥ L + 1) is computed as: Prob(e length ≥ L + 1) = 1 -∑ i=1..L Prob(e length = i). Hence we generate a list such that for each index k > 0, the value 1 -∑ i=1..k Prob(e length = i) is stored.
Simplifying from three state model, {e, i, j}, to a two-state model, {e, i}, for a moment: after n occurrences of state 'e', the 2 cases of update factor to handle are: P(e n+1 |ee..e n ) = P(length e ≥ n+1)/P(length e ≥ n) P(i|ee..e n ) = P(length e = n)/P(length e ≥ n) Similarly for n occurrences of state 'i', and there are no probability splitting ambiguities upon exiting state 'e' as there is only one state to exit to in the two-state system (and there are no differences in the Viterbi and Forward/ Backward transition probability alterations).
Consider, as an example, a simple extension of our twostate notation to cover N + 1 states: {e, i 1 ,...,i N }. Suppose we are interested in the probability of an 'i' after seeing a length 4 segment of e-states: P(i|eeee) = 1 -P(e|eeee) = 1 -P(L e = 5)/P(L e = 4) There are two types of transition rule that appear to result, one for Viterbi, with its maximum operation on paths, and one for Forward/Backward, with its sum operation. For the two-state case, N = 1, and these update rules involving "splitting" factors all become the same (see Results): Viterbi update (1) Difference splitting: p(i k |eeee) = [1 -P(L e = 5)/P(L e = 4)] * [1 + P(i k |e) -Avg z P(Z|e)] (2) Ratio splitting: p(i k |eeee) = [1 -P(L e = 5)/P(L e = 4)] * P(i k |e)/[Avg z P(Z|e)] In this situation we are not maintaining a sum rule on probabilities, here we are viewing each path through the table in a manner consistent with the maxprob evaluation. The equation above with the factor [1 + P(i k |e) -Avg z P(Z|e)]/N provides a suitable "splitting factor", as i k and 'e' probabilities sum to one, remain positive, and have other nice properties. The splitting factor is not unique, however, as case (2) makes clear.
It is important to note that we appear to have some freedom on splittings of probabilities upon exiting a state (when we are using the length distribution cumulants to describe transition probabilities, etc.). This is merely an associated effect of that length distribution incorporation -now upon exiting that length distribution our main factor is P(i k |some prior length of e), that factor is blind to the appropriate splittings amongst the "not e" states, and we must incorporate another factor to deal with the probability splitting -in the case of forward/backward, this is chosen to obey a prob sum to 1 on all cases, on Viterbi this must maintain each path's probability with proper weighting with respect to the others (consistent with the max-path operation).

Real-time processing hardware/software setup
The server was able to concurrently perform the following using a single 1.5 GHz processor with no hyper-threading: 1. Acquire data from the DAQ buffer at 50 KHz sample rate.

2.
Update the server GUI screen with the acquired datathough a 10× data decimation was required in order to avoid irrevocable delays in reading from the DAQ buffer 3. Perform tFSA logic to screen for signals resulting from molecular capture events at the nanopore channel 4. Send capture-signals as long as 100 ms in duration at a rate 10 per second to a waiting 1.