Deterministic inference for stochastic systems using multiple shooting and a linear noise approximation for the transition probabilities

Estimating model parameters from experimental data is a crucial technique for working with computational models in systems biology. Since stochastic models are increasingly important, parameter estimation methods for stochastic modelling are also of increasing interest. This study presents an extension to the ‘multiple shooting for stochastic systems (MSS)’ method for parameter estimation. The transition probabilities of the likelihood function are approximated with normal distributions. Means and variances are calculated with a linear noise approximation on the interval between succeeding measurements. The fact that the system is only approximated on intervals which are short in comparison with the total observation horizon allows to deal with effects of the intrinsic stochasticity. The study presents scenarios in which the extension is essential for successfully estimating the parameters and scenarios in which the extension is of modest benefit. Furthermore, it compares the estimation results with reversible jump techniques showing that the approximation does not lead to a loss of accuracy. Since the method is not based on stochastic simulations or approximative sampling of distributions, its computational speed is comparable with conventional least‐squares parameter estimation methods.


Introduction
Parameter estimation is very important for the analysis of models in systems biology. Computational modelling is a central approach in systems biology for studying increasingly complex biochemical systems. Progress in experimental techniques, for example, the possibility to measure small numbers of molecules in single cells [1], highlights the need for stochastic modelling approaches. Simulation methods for stochastic processes are being developed for decades since [2] and nowadays exist with a lot of variants [3]. The development of parameter estimation methods for stochastic models, however, has started only recently. This paper extends a method of [4] with a more accurate approximation for the variances of the transition probabilities.
Parameter estimation approaches for time series data exist using stochastic simulations. Owing to the Markov property of the time series, the likelihood function factorises into the product of transition probabilities. These transition probabilities are generally unknown in stochastic modelling. They can be estimated using stochastic simulations. This can be done with density estimation methods [5,6]. Another approach is the use of a reversible jump algorithm [7,8]. As results of these two papers will be used for comparison, their ideas will be explained in more detail. Reversible jump algorithms start with a path connecting the previous and the succeeding state that contains the minimum number of reactions necessary to move from state to state. For this specific path it is possible to calculate the probability. Then reactions are added and subtracted from the path to obtain new paths and their probabilities. The transition probability is the sum over all possible paths. The space of paths is explored in a Monte Carlo way. The parameter estimation can then be performed with Bayesian methods as in [7]. The same idea of exploring the space of possible paths can also be used to calculate derivatives of the transition probabilities with respect to the parameter. This can be combined with a gradient descent for the optimisation as in [8].
Using a surrogate probabilistic model as an approximation is faster from a computational point of view [9]. Komorowski et al. [10] approximates the system with a multivariate normal distribution, in which the mean is described by the rate equations and the covariance matrix contains all inter-temporal covariances. Another approximation is suggested in form of an approximate maximum-likelihood method [11], where also a singular value decomposition likelihood method is described. A use of approximate Bayesian methods is suggested in [12].
A second class of methods focuses on a numerical solution of the chemical master equation (CME), which describes the probability for each state as a function of time. These systems are generally high dimensional. To address this problem, a state space truncation can be used [13] or moment-closure methods, which are an approximation focusing on a finite number of moments of the probability distribution [14,15]. Parameter estimation with the CME and an approximation for the likelihood function for small systems or a hybrid method for large systems is suggested in [16]. Deuflhard et al. [17] and Engblom [18] use an adaptive Galerkin method for the solution of the CME. If distribution information is available from measurement, a finite state projection [19] can be used to solve the CME. The common challenge is the fact that the solution of the CME, as well as simulation-based methods, become very time-consuming as the number of states in the state space increases.
Moment matching methods use repeated measurements to estimate moments of the distribution for the parameter estimation [20,21]. Further approaches can be taken from stochastic epidemic models and are based on expectation maximisation algorithms or Martingale-based approaches [22]. A different approach is suggested by Chattopadhyaya et al. [23], called 'inverse Gillespie', using certain geometrical properties of the population updated to infer the reaction structure and the propensities.
The problem for parameter estimation for stochastic models can be written in the following form: we assume that the experimental or simulated data records at each time point t i , i =0, 1, …, n, the number of molecules of species k, n (k) i , hence n (k) i [ N 0 . Each data set n =(n 0 , n 1 , …, n n ) is stochastic. This means that even two simulated realisations of the system without measurement noise can be different, and even that the mean of stochastic simulations does not have to be close to the ordinary differential equation (ODE) solution. Furthermore, it is assumed that the system's behaviour depends on some unknown parameters θ, which are to be estimated. To this goal, the data n are compared with some sort of stochastic or ODE simulations with respect to some objective function F which measures the quality of the fit. Parameter estimation means finding the optimal valueû for the parameter.
A common choice for measuring the quality of a fit is the likelihood function. The likelihood function describes the probability for obtaining a data set n given a parameter θ.A maximum-likelihood estimator is the choice of θ that maximises the probability to obtain the data n. For processes satisfying the Markov property, the likelihood function factorises into the product of the transition probabilities from a point n i−1 at time t i−1 to a point n i at time t i , i =1, …, n.
The MSS approach that we recently developed [4] approximates this transition probability with a normal distribution of constant variance. It has been shown to work well on oscillatory systems even when the dynamics is qualitatively different from an ODE solution. However, certainly the approximation with constant variance is rough and can be improved. The new approach therefore approximates the transition probabilities with a normal distribution with a covariance calculated by a linear noise approximation (LNA) on the intervals between two succeeding time points of measurements. The conditions for the approximation on the shorter time intervals are much less strict than for the LNA applied over the complete time course.
Therefore the approach is comparable with [7][8][9] in that it uses a factorisation of the likelihood function, but is different from those in that it does not use simulations to calculate the transition probabilities.
The extended MSS method can tackle models with fully observed and partially observed data sets. The fact that it works without stochastic simulations and without solving a high-dimensional CME means that it is possible to approach Fig. 1 Outline of the simulation study Time courses are simulated (5 small graphics) and used as pseudo data for a parameter estimation (rep resented by the arrows). Each time series results in one estimator plotted in the coordinate system in the middle. As each time series is intrinsically stochastic, the estimators are random variables clustering around the true parameter value. To judge "how close" they are, two statistical quantities are displayed: their average av,. to see if the method is unbiased, and their average relative error ARE, to see how far they spread around the true value. Note that for a parameter estimation one time series is necessary. Only for the simulation study more than one time series is needed. systems of a size as large as realistic models being approached with conventional deterministic methods. Since the evaluation of the objective function does not involve stochastic simulations, all methods from derivative-based methods, to global optimisation techniques, to Bayesian methods can be applied for its numerical optimisation.
As the objective function of the MSS method is deterministic, the estimator for a single stochastic time course is unique. However, using the same model with the same parameters and initial conditions with a different stochastic time course realisation, the estimation will result in a different value because of the intrinsic stochasticity of the time courses. To evaluate the performance of the objective function, it is therefore necessary to estimate more than one stochastic realisation of the same model with the same parameters and initial conditions. This paper uses 100 different realisations for each model. A statistic over the 100 estimations can then be used to assess the quality of the objective function ( Fig. 1). It has to be underlined that the 100 realisations are only used to test the quality of the objective function. For an estimation one realisation is sufficient.
This paper is structured as follows: The method section will introduce stochastic modelling, suggest and discuss the approximation, and present the resulting objective function.
The results section will show the performance of the extended MSS method on stochastic models of an immigration-death model, a prokaryotic auto-regulatory network and a Lotka-Volterra model. For the different scenarios, we will discuss the performance of the extended MSS method with respect to a state-of-the-art alternativenamely reversible jump techniquesand the old MSS approach.

Stochastic modelling of biochemical reactions
This section will introduce stochastic modelling and explain in which situations it is important. Let X =(X 1 , …, X D ) denote the D reactants in a system with r reactions in which q ij denotes the number of educts or reactant molecules of species X i for reaction j and u ij is the number of product molecules of species X i for reaction j. Hence, the systems read as q 1j X 1 + q 2j X 2 +···+q Dj X D u 1j X 1 + u 2j X 2 +···+u Dj X D , for j = 1, ..., r The stoichiometric matrix S is a D × r dimensional matrix. Its entries s ij = u ij − q ij describe the net effect of reaction j to species X i . In terms of ODEs, the systems would read as with a rate law υ =(υ 1 , …, υ r ) T describing the speed of the reactions and initial concentrations x 0 . Stochastic modelling is important in systems with small numbers of molecules, in which stochastic fluctuations can influence the systems' behaviour [24]. It focuses on single species and considers each reaction explicitly. The temporal order of the reactions and the waiting times are stochastic quantities depending on the state of the system and the rate laws. For simulating a stochastic time course, the Gillespie algorithm [2] is the method of choice. It is an iterative algorithm, simulating reaction event after reaction event, using functions of random numbers to determine the time step and the reaction. There exists a number of implementations [3] of the Gillespie algorithm. The resulting time course is a discrete state continuous time Markov jump process, see also [25] for details.
Stochastic modelling can show dynamic behaviour which cannot be seen with ODE modelling: stochasticity can, for example, introduce bistability in genetic toggle switches which have a steady state in ODE modelling [26]. The structure of calcium oscillations may change qualitatively from ODE to stochastic modelling [27]. Furthermore, intrinsic stochasticity may provide information, for example, in form of reactivity, which allows to solve identifiability problems [4]. This shows the importance of stochastic modelling and the need for suited parameter estimation methods for stochastic models.

Approximation
As mentioned in Section 1, the parameter estimation is based on maximising a likelihood function that is a product of transition probabilities. An accurate calculation of these transition probabilities is very time-consuming. To overcome this problem, we need to find an approximate way to estimate the transition probabilities that is fast enough to be applicable in realistic size systems and still accurate enough to capture intrinsic stochastic features. The original MSS method suggests a rather simple approximation with a normal distribution and constant variance [4]. Although this approach has been shown to work well in several cases, we expect that a more accurate approximation will lead to more accurate estimation results or allow parameter estimation for a larger set of models. Therefore we extend the approximation by taking into account the variability of the variances of the transition probabilities.
Let the system be in a state n i−1 at time t i−1 . We approximate the distribution of n at time t i−1 + Δt by On the time interval Δt, the mean μ is the ODE solution of the system and the variance S is calculated by an LNA.
and the volume Ω. Details and derivation can be found in [28]. See also [29] ((6) and Appendix) and [30][31][32], which discuss software tools for calculating LNAs. The approximation differs from the approach used in the old MSS method [4], in that the covariance is not kept constant over the trajectory, but is calculated by an LNA on each time interval.
Although the validity of the LNA depends on the model and its parametrisation, and thus cannot be guaranteed a priori, the LNA is much more likely to hold on the intervals between measurements than for the whole time series. In many cases LNAs are applied to approximate stochastic systems from the first until the last observation. This means that the stochastic dynamics must fulfil the condition from the beginning until the end. The point where the LNA typically fails is when the probability distribution of the state variables deviate significantly from a normal distribution, that is, when the system cannot be considered to behave such as the deterministically simulated model plus some noise. Unfortunately, this includes more or less exactly those cases where we have to choose a stochastic modelling approach because an observed effect is not explained by a deterministic model. Since the error of the LNA grows over time, its validity conditions can be fulfilled much easier if we only apply it on the much shorter time intervals between measurements. Note that in this use case, we only require the LNA to hold locally on each of the intervals since we only use it to calculate the transition probabilities for each interval, from which we determine a global likelihood function. We do not combine the local LNAs to one global approximation of the system's dynamics, and therefore we do not have to test how the approximation errors of each of the LNAs combine to a global error.
Additionally, this approximation can not only be used for parameter estimation, but also for simulating trajectories. Choose a set of time points t 0 , ··· , t n a n da ni n i t i a ls t a t en 0 . Then, draw random numbers according to (1) to iteratively calculate the values for the system states n 1 , …, n n . Rounding the n i to integers and defining the process' s t a t et ob en i−1 on [t i−1 , t i ) leads to a trajectory that is a continuous time discrete state Markov jump process.
Naturally, the question arises how to check whether the approximation holds for a specific system under consideration. We suggest a fast and intuitive check, discuss the formal conditions and propose a calculation that can be carried out a posteriori showing how good the approximation worked † Since the approximation can also be used for an approximative simulation, we can perform a first check by visually comparing the approximate trajectory with one obtained from an exact simulation method (such as the Gillespie algorithm). Although this will not tell us about the quantitative accuracy of the approximation, it can tell us if the qualitative features of the trajectory, and especially the stochastic effects that lead us to use a stochastic modelling approach, are conserved. † The theoretical condition for the validity of the approximation on the interval [t i−1 , t i ] is the condition of an LNA on this interval. The theoretical condition for the validity first uses a variable transformation to write the CME in terms of concentrations and volumes. Next, the resulting equation is expanded in inverse powers of the volume (similar to a Taylor expansion). If higher-order terms are small and can be neglected, the LNA holds and one can calculate the remaining terms (after cutting off the higher-order terms) as described above. Details for the expansion and its derivation can be found in [28,30]. † Given time series data and a parameter, it is possible to reconstruct the random number for each interval that is necessary to obtain the data. Using the notation from (1), this is r i = S −1/2 (n i − x), with n i being the data point at t i . Owing to the multiplication with the square root of the inverse of its corresponding covariance matrix r i should be an N(0 D , 1 D × D )-distributed random number if the approximation holds, where 1 D × D stands for a D × D-matrix with diagonal entries 1. Checking whether the random numbers r 1 , …, r n indeed follow this distribution, can give information on the accuracy of the approximation (details in Section 2.6).

Extended objective function
Given experimental data as input, parameter estimation uses numerical techniques to calculate a parameter which calibrates the model to the data.
Denote the data as n =(n 0 , …, n n ), measured at time points t 0 , …, t n . The likelihood function L(n, θ) gives the probability to observe the data n for a given parameter θ. With the help of the approximation in the previous section, it is possible to derive a formula for the likelihood function. Owing to the Markov property of biochemical reaction networks [25], the likelihood function factorises into the product of the transition probabilities where Δ i = t i − t i−1 and p(n i ;n i−1 , Δ i , θ) is the probability for a transition from state n i−1 at time t i−1 to state n i at time t i given the parameter θ. This transition probability will be approximated with a normal distribution as in (1) , n i with x and S as in equation (1) where PDF(dist, y) stands for probability density function (PDF) of the distribution 'dist' at y. Using the negative logarithm of the likelihood function, this leads to the extended MSS objective function As mentioned in Section 1, from a numerical point of view the method is motivated by the multiple shooting method of [33] that performs the integration on the so-called multiple shooting subintervals. Equality constraints between the end point of a previous interval and initial value variables on the subintervals lead to a continuous trajectory. This method shows beneficial behaviour for parameter estimation in ODE systems [34,35]. The MSS method is similar to the cited method as it also uses a multiple shooting approach, but it differs by not using the equality constraints. In a statistical setting the method is a Gaussian, non-linear, first-order autoregressive time series model. An ODE integration initialised with the previous observation is used as the mean of the next interval; the conditional variance is obtained from an LNA on the intervals between succeeding time points of measurement.
A maximum-likelihood parameter estimate can be calculated by maximising the likelihood function or equivalently minimising the negative log likelihood function. Using the approximation, a parameter estimateû can be determined bŷ u = argmin u F MSS (n, u)

Partially observed data
In typical experimental setups, it is not possible to obtain measurements for every species taking part in the reactions. Assume that the first d species of the system are observable and denote their measurements with n obs and denote the unobserved or hidden components with n hid so that n =(n obs , n hid ) and the data n obs = n obs 0 , ..., n obs n is measured at time points t 0 , …, t n . Include the unobserved species at time t 0 , n hid 0 , into the optimisation vector. Owing to the approximation, the distribution at time point t 1 is a normal distribution. As a very simple state estimate for the unobserved species n hid 1 , we choose the most likely value from this distribution, which is the mean. It is in this case calculated by the ODE solution on the time interval. More generally, estimate the unobserved staten hid In this case the objective function reads as The estimator in the partially observed scenario is determined by an optimisation over the parameter θ and the unobserved initial states n hid 0û ,n hid 0 = argmin u, n hid 0 F MSS (n, u, n hid 0 ) It might be noted that the fully observed case is a subcase of this scenario with n = n obs and n hid = {}. Therefore in the following the notation of the partially observed case will be used. Our choice to estimate the unobserved state variables solely from deterministic simulation over the preceding time interval is guided by the desire to provide a very simple objective function. A more accurate estimate would take into account the correlation between unobserved and observed species, as predicted by the LNA. In principle, the unknown state variables could even be estimated including information from future measurements, for example, by using Kalman smoothing techniques [36]. This would, however, run contrary to the underlying principle for our approach that the intervals between measurement points are treated independently. And our goal for this paper is to show that even a simple state estimate can serve well for parameter estimation.

Measurement noise
Measurement noise is assumed to be normally distributed with zero mean. Furthermore, it is assumed to be independent of the intrinsic stochasticity of the process and inter-temporally independent, which means that the measurement errors of two time points are independent. Apart of that the d × d covariance matrix S meas of the measurement error at a certain time point may account for correlated errors between the species. More formally the measurement error at time point t i is distributed with N (0 d , S meas ). In this case, the first line of (4) extends to Assuming that the noise model is given (e.g. from knowledge about the experimental setup) and has zero mean, the observed data point can serve as a simple state estimate. In principle, a better estimate could be calculated by also taking into account the prediction of the model and measurements of other variables as well as measurements at other time points. As mentioned in the previous section, the focus of this paper is to show that even this very simple approach allows for good parameter estimation results.

Accuracy of the approximation
For given time series measurements and parameter values, it is possible to test how well the approximation works: instead of calculating the sum in the objective function F MSS , calculate the following values for i =1,…, n with the solution of the ODE system for x and S as in (4). Taking into account (1), it follows that This can be used to check how well the approximation works. Note that there are n + 1 measurements, but only n transition probabilities, and therefore only n numbers r i as well.

Optimisation
The objective function is completely deterministic, which means that for its evaluation no stochastic simulations have to be performed. The optimisation problem can therefore be solved using gradient-based methods. Of course, it is as well possible to use global optimisation techniques or Bayesian approaches.

Immigration-death model
The first example is an immigration-death model where X is the substance and θ 1 , θ 2 are parameters and a rate law v = (θ 1 , θ 2 x). The stoichiometric matrix is S = (1, −1) and the system in ODE representation reads as 3.1.1 Considered scenarios: Several parametrisations and choices of time points for measurements are considered. The scenarios have been selected according to the choice of [8]t ob e able to compare the accuracy of the estimation results. The initial value is chosen in a way that the system is in the deterministic steady state, x 0 = θ 1 /θ 2 .

Design of simulation study:
One difficulty when evaluating the quality of a parameter estimate for a stochastic process is that the observed data (in our case the test data generated from the stochastic process) is itself random. This means that even when a method is deterministic, and so gives the same estimate when run repeatedly on the same time series, it will result in different estimates if applied to different realisations of the same process. Therefore it is not possible to estimate the accuracy of a method by just looking at the results based on one set of observations. Any deviation between true and estimated parameter values could either be the result of a random deviation of the data or of a bias in the method. To resolve this problem, the estimation process was performed 100 times with 100 different data sets for each scenario. Note that this is only needed to 'test' the method; it is still possible to 'apply' it with a single time series. The data sets were created by running 100 simulations with the Gillespie algorithm [2] in the software COPASI [37] Table 1.
For comparison Table 1 includes the estimate of [8]. In [8]n o simulation study is performed and hence only one estimate is provided. Thus we can only state how this one result ranks in comparison with our 100 results. The relative error of this estimate is calculated as well as the 100 relative errors of our estimates. The 101 relative errors are sorted by size (from low to high) and the rank of [8]'s estimate is recorded. A rank 1 for most of the scenarios would indicate that the method of [8] is clearly better, a rank 101 that it is clearly worst, and ranks in between that the accuracy is comparable.
The optimisation has been performed with the FindMinimum routine implemented in the software Mathematica [38] Version 8.0.4.0.

Observations:
The estimations of the simulation study show that the extended MSS method is able to estimate both parameters for all considered scenarios. As a reference for the accuracy of the estimate, we compare our 100 results to the one result for each scenario that is given in [8]. In each but one case, the result of [8] falls well into the range of our results. From this, we conclude that our method shows comparable accuracy while being computationally less demanding. Fig. 2 shows a plot of the estimation of the first row and first column of Table 1 in a coordinate system, illustrating how the estimates spread around the true parameter. Both parameters are identifiable (there is a limited range of values for them), but the stretched form of the cloud indicates that the ratio of the parameters can be identified more accurately. This is expected since in the case of a deterministic model the ratio would be the only identifiable value while the two separate parameters would be completely unidentifiable. Fig. 3 shows the result of testing the validity of the approximation using the concept described in Section 2.6. The basic idea is that for each time step the deviation of the data value from the result of a deterministic simulation should be normally distributed with variance as calculated from the LNA. The result is only shown for one time course, but would hold for most other time courses as well.
Only the case of parameter values (0.1, 0.03) leads to situations in which for a few time series no convergence could be achieved, namely 5 of 100 for the first measurement scenario, 1 of 100 for the second, 2 of 100 for the third and none for the fourth.  The rank of w among the 100 estimates of this paper with respect to relative error (details on how this rank is calculated can be found in Sections 3.1.2). The table shows that the extended MSS method estimates the parameters with acceptably small relative error for all considered scenarios. In comparison with the reversible jump method by Wang et al. [8], the relative error is not significantly larger, which shows that the high computational speed of the method does not lead to a loss of accuracy.

Fig. 2 Estimates for an immigration-death scenario
For the scenario of 21 measurements with Δt = 2 (table 1, first row, first column) and θ (0) = (0.6,0.03) the graphic shows the 100 estimates (black dots) and the true parameter value θ (0) (big grey dot). Using a deterministic model, only the ratio but not the absolute value of the parameters is identifiable. Using the MSS method, one can see that not only the ratio but also the absolute value is identifiable, however, with a slightly worse accuracy. Examining the time series for which no convergence could be achieved shows that they are untypical in that they just decay from the starting value to zero, indicating that no or very few influx events actually occurred during the observation time. It is understandable that the rate of the influx reaction (i.e. the propensity of an influx event) cannot be estimated if the reaction does not occur during the observation time. Checking the validity of the approximation, using the concept described above, shows that the approximation does not hold in the cases where convergence could not be achieved. An increase of the number of measurement points with the same inter-sample distance (row 1 to row 2 of Table 1) shows that the accuracy increases. Varying the inter-sample distance and keeping the number of observations fixed (row 1 and row 3 of Table 1) does not show a clear effect as for columns 1, 2 and 4 the accuracy is improved, but not for columns 3 and 5. The reason is that the optimal inter-sample distance depends on the parameter. Methods to choose an inter-sample distance before doing any experiments are called optimal design methods and are ongoing research. Note that for none of the scenarios the parameters would be identifiable using a conventional deterministic least-squares approach.
Comparing the results of the extended MSS method to the original MSS approach with constant variances (results not shown), shows that the old approach is only able to estimate the parameters correctly in the scenarios with 100 observations and parameters θ = (0.6, 0.03) or θ = (0.6, 0.06) with an ARE of (31%, 32%) and (21%, 24%), which is still higher than with the extended MSS. The other scenarios are time courses in which only very few reactions happen between two time points of measurement. These are problematic for the old MSS as indicated in [4]. With the improved method now all scenarios are tractable.
The immigration-death model is one of the few stochastic models in which it is possible to solve the CME analytically, which allows for a calculation of exact estimates without approximation. This means that the remaining error of the estimation is because of the stochasticity of the time series data, rather than because of approximations in the estimation process. Comparing these results to the extended MSS approach for the scenario of 101 measurements and parameter (0.6, 0.06), shows that the ARE of the extended MSS approach (16%, 16%) is only slightly higher than the ARE of the exact method (15%, 16%). This shows that the extended MSS method is able to extract almost all information out of the data.

Prokaryotic auto-regulatory network
The second example has also been investigated by Wang et al. [8] and is an example of gene regulation where DNA represents promoter sequences, P proteins, P 2 protein dimmers, mRNA messenger RNA and θ 1 , …, θ 8 parameters. All reactions are modelled with mass action as kinetic law. An representation in ODEs reads as

Considered scenarios:
Different choices of measurement time points are considered. All scenarios are considered in full observation and in partial observation. For the partially observable case, measurements are taken for mRNA, P and P 2 . The total amount DNA t is assumed to be known as well. Furthermore, two different values for DNA t are considered.  Tables 2 and 3. For comparison, Tables 2 and 3 include the estimate given by Wang et al. [8]. As [8] does not perform a simulation study, and hence only one estimate is provided, it is calculated which rank this estimate would obtain in comparison with the estimates in this paper with respect to the relative error. The comparison is described in detail in the previous section.

Design of simulation study
The optimisation has been performed as above. If the optimisation did not converge to an estimate within [0, 100] 8 , the result was counted as non-converging.

Observations:
In both scenarios with 500 observations (rows 3 + 6) the extended MSS method performs well in estimating all parameters with a high accuracy. With only 100 observations (rows 2 + 5) the ARE for θ 5 and θ 6 increases. These ARE increase even more for the 50 (rows 1 + 4) observations scenario. This can be interpreted as a practical non-identifiability. The same holdsalthough less stronglyfor parameters θ 1 and θ 2 comparing rows 2 + 5 and rows 1 + 4. The increasing number of non-converging time series with decreasing number of observation points further supports this thought. Changing the value of DNA t seems to have only a small effect on the quality of the estimation. Only in the case with 500 observations, it has a strong influence on the number of The median of the 100 relative errors † w1: the estimate of one time series shown by Wang et al. [8] † w2: the estimate of a second time series shown by Wang et al. [8] † the rank of w1 and w2 among the 100 estimates of this paper with respect to relative error (details on how this rank is calculated can be found in Sections 3.2.2); and † div stands for the number of time series for which no convergence could be achieved Table 3 Statistics of the estimation results for an auto-regulatory gene networkfully observed plus measurement noise   non-converging time series. Except that the extended MSS method can cope equally well with different amounts of DNA t . Testing the accuracy of the approximation in the fully observed case, indicates that for mRNA the mean is well approximated and the variance in most cases as well; for two others, DNA and P 2 , the mean fits well, but the variance is too high; whereas the fourth parameter, P(t), has a bias. It is worthy to note that while the approximation is therefore shown to be not strictly valid, most of the estimates still have an acceptably small relative error. This means that the approach for testing the approximation may be overly strict. Estimation may still work when the test fails. The reason for this is that the variance intuitively spoken acts as a weighting factor for the estimation. An estimation is possible with inaccurate variances, although with a certain loss in accuracy. The fact that an estimation is possible even with a very rough approximation of the variances, as in [4], supports that point.
The estimates provided by Wang et al. [8] generally fall within the range of the 100 results of the MSS method. This shows that the MSS method, in general, has comparable accuracy while being computationally less demanding. It would be interesting for further research to investigate the behaviour of the reversible jump algorithm on the time series for which the MSS method did not converge. The fact that the ARE is high for some parameters, but still most of the estimates are of a similar precision as the estimates of [8], can be explained by some estimates with extremely high relative error influencing the ARE negatively. This observation is supported by the small values for the median of the relative errors. Table 3 shows how additive normally distributed measurement noise decreases the accuracy of the estimates with increasing standard deviation. Depending on the size of the measurement noise, parameters might become non-identifiable (especially θ 7 , θ 8 in the second row of Table 3).
Similar to the immigration-death model, this model shows only few reactions between two time points of measurement, and therefore falls within the class of models that are hardly tractable without the extension. Therefore no results with the old MSS method are shown.
The results for the partially observed case (Table 4) underline that the extended MSS method can cope with partially observable scenarios. The effects of the practical non-identifiability seem similar except that also θ 7 , θ 8 are affected in the DNA t = 10 case.

Lotka-Volterra
The third example is a Lotka-Volterra model which shows oscillatory behaviour. This model has been used as a test model for parameter estimation in [7] and consists of three reactions where Y (1) is the prey and Y (2) is the predator and θ 1 , θ 2 and θ 3 are parameters. The first reaction of (6) is the prey reproduction, the second the predator reproduction and the third is the predator death. In terms of ODEs this system reads as The true parameter used in this paper is θ (0) = (0.5, 0.0025, 0.3).

Considered scenarios:
Five scenarios of measurement point choices are considered, three with fully observable variables and two partially observable, LVp1: recordings for Y 1 and Y 2 (0) and LVp2: recordings for Y 1 only. The fully observed scenario with total observation horizon 200 is split up in three cases depending on the specific time series behaviour: both species dying out, exploding of one species' population or none of both, see also Fig. 4. It should be noted that on the long run asymptotically one of the species will die-out with certainty, but the time horizon 200 is short enough to observe all three scenarios.  Tables 5 and 6. For comparison, Tables 5 and 6 include the estimates by Boys et al. [7]. Since in [7] only estimations from one time series are provided, we report which rank this estimate would obtain in comparison with the estimates in this paper, as described above.

Design
The optimisation has been performed with the FindMinimum routine implemented in the software Mathematica [38] Version 8.0.4.0. Tables 5 and 6 show that the extended MSS method estimates all parameters in all considered scenarios including the partially observable case with a very high accuracy.

Observations:
For the fully observable case with 200 observations, the study distinguishes between trajectories in which both species die-out within the observation horizon (die-out), trajectories in which one of the trajectories explodes (explode) and trajectories for which none of both happens (normal). One reason for the slightly higher ARE for the exploding trajectories might be that the 'explosion' happens before time 200, which means that the trajectories are usually shorter than the others.
Testing the approximation for the fully observed case indicates that it works very well for all 'normal' scenarios. For species which die-out there is of course a huge mass of the distribution of The stochastic Lotka-Volterra model can show a behaviour that is qualitatively different from the deterministic Lotka-Volterra model. Panel A shows oscillations in which the intrinsic stochasticity influences amplitude and frequency. Panel B shows a case in which the prey Y (1) dies out and then after that the predator Y (2) . Panel C shows a scenario in which the predator dies out and then the prey population explodes  [7] using an approximation of a block updating method † (d) an estimate by Boys et al. [7] using a diffusion approximation † the rank of rj, bu, a and d among the 100 estimates of this paper with respect to relative error (details on how this rank is calculated can be found in the "Design of the simulation study section") r (k) close to zero and for species' population that are exploding the left tail of the distribution is heavier than that of an N(0, 1) distribution. Nevertheless all estimations had a small relative error, again supporting the point from the previous section that the test might be overly strict, and therefore a positive test rather be used in support of the approximation, but a negative result not necessarily for a rejection.
For the partial observable case, the values for y (2) being not observable, the accuracy of the estimates decreases; however, still all parameters are well identifiable.
Compared with the approaches mentioned in [7], the extended MSS method does not estimate the parameters with worst accuracy.
Comparing it with the old MSS approach with constant variances, the ARE is very similar with the exception of the 'die-out' scenario and the partially observed scenario, in which the extension performs better. The reason why the difference in performance is so small here is that the trajectories are highly dynamical because of the oscillations. This makes the influence of the varianceacting as a weighting factorless strong.

Discussion and conclusion
This paper suggests an extension to the MSSs method of [4]. The extension also uses a Gaussian approximation for the transition probabilities, but in contrast to the 'old' version does not assume constant variances. Mean and variance are calculated by an LNA between the time points of measurements. The fact that the system is approximated only on the relatively short time intervals (with respect to the total observation horizon) allows to deal with intrinsic stochastic effects.
On the oscillatory Lotka-Volterra model, the extension performed slightly better than the old approach. The real benefit could be seen in the non-oscillatory immigration-death example. Here, the old versions failed in most of the scenarios to estimate the parameters correctly. The reason has already been discussed in [4] and is that only very few reactions happen between two measurements. The extension outperforms the old version considerably as it is able to estimate the parameters with small ARE for all scenarios. As the prokaryotic auto-regulatory network falls within the scenarios where the original MSS is not expected to perform well, only the extended MSS method is tested and is shown to work fine. Depending on the choice of sample points, some parameters become non-identifiable especially in the partially observed cases.
The suggested approach for testing the accuracy of the approximation turned out to be a candidate for a sufficient criterion. All time courses with an acceptably small relative error had as well a distribution of the r (k) which was close to a standard normal distribution. However, it does not work as a necessary criterion as there were time courses with a r (k) distribution not close to standard normal distribution having a small relative error. The reason is that the variance intuitively spoken acts as a weighting factor for the estimation, so even with a rough approximation of the variance an estimation can be possible, as shown in [4]. Further research on developing a criterion distinguishing between high relative errors because of approximation error and high relative error because of identifiability problems would be desirable.
To investigate whether the approximation leads to a loss of accuracy, the results of the method are compared with other methods. Wang et al. [8] uses a reversible jump technique combined with a stochastic gradient descent and Boys et al. [7] provides four estimates in a Bayesian framework using a reversible jump techniques and approximations. Both papers do not provide a simulation study estimating the parameters for multiple time series of the same setting. Hence, it is not possible to compare the ARE, but only to assign a rank which the provided estimate would obtain in comparison with the estimates of this paper. This comparison shows that the approximation does not lead to a significant loss in accuracy. In the case of the Lotka-Volterra oscillator even the special cases of a dying out or exploding population can be treated successfully.
Concerning the identifiability issues, techniques of experimental design would be desirable to obtain information on the identifiability of parameters as well as on confidence intervals. This is ongoing research.
Both the old and the extended MSS methods do not need stochastic simulations or the solution of a CME system. The old MSS method needs the solution of D ODEs (D number of different substances in the system) for the calculation of a transition probability. The extended MSS method requires the solution of D +((D +1)D)/2 equations, which is more, but still not prohibitive for tackling systems of the size that could be realistically treated, for example, with estimation methods for deterministic models.
For the same reasons, we expect that the MSS methods perform well from a computational performance point of view. For each evaluation of the likelihood function, one numerical integration of the system of differential equations is required. No sampling as approximation of distributions is performed. Although it will be highly model dependent how the speed of one integration compares with that of one stochastic simulation, it is definitely advantageous that the integration is only performed once per evaluation of the likelihood function, and that no trade-off between sample numbers and accuracy needs to be considered. Furthermore, the method is comparatively easy to implement; the most complex part being a numerical integrator, for which reliable packages are readily available.

Acknowledgments
The authors thank Jürgen Pahle for inspiring discussion and the anonymous reviewers for helpful clarifications. This work is supported by BIOMS and the Klaus Tschira Stiftung.  [7] using a reversible jump method † (bu) an estimate by Boys et al. [7] using a block updating method † an estimate by Boys et al. [7] using an approximation of a block updating method † (d) an estimate by Boys et al. [7] using a diffusion approximation † the rank of rj, bu, a and d among the 100 estimates of this paper with respect to relative error (details on how this rank is calculated can be found in "Design of the simulation study sections") 6 References