Generalized method of moments for estimating parameters of stochastic reaction networks

Background Discrete-state stochastic models have become a well-established approach to describe biochemical reaction networks that are influenced by the inherent randomness of cellular events. In the last years several methods for accurately approximating the statistical moments of such models have become very popular since they allow an efficient analysis of complex networks. Results We propose a generalized method of moments approach for inferring the parameters of reaction networks based on a sophisticated matching of the statistical moments of the corresponding stochastic model and the sample moments of population snapshot data. The proposed parameter estimation method exploits recently developed moment-based approximations and provides estimators with desirable statistical properties when a large number of samples is available. We demonstrate the usefulness and efficiency of the inference method on two case studies. Conclusions The generalized method of moments provides accurate and fast estimations of unknown parameters of reaction networks. The accuracy increases when also moments of order higher than two are considered. In addition, the variance of the estimator decreases, when more samples are given or when higher order moments are included. Electronic supplementary material The online version of this article (doi:10.1186/s12918-016-0342-8) contains supplementary material, which is available to authorized users.


Background
A widely-used approach in systems biology research is to design quantitative models of biological processes and refine them based on both computer simulations and wet-lab experiments.While a large amount of sophisticated parameter inference methods have been proposed for deterministic models, only few approaches allow the efficient calibration of parameters for large discrete-state stochastic models that describe stochastic interactions between molecules within a single cell.Since research progress in experimental measurement techniques that deliver singlecell and single-molecule data has advanced, the ability to calibrate such models is of key importance.For instance, the widely-used flow cytometric analysis delivers data from thousands of cells which yields sam-ple means and sample variances of molecular populations.Here, we focus on the most common scenario: a discrete stochastic model of a cellular reaction network with unknown reaction rate constants and population snapshot data such as sample moments of a large number of observed samples.The state of the model corresponds to the vector of current molecular counts, i.e., the number of molecules of each chemical species, and chemical reactions trigger state transitions by changing the molecular populations.A system of ordinary differential equations, the chemical master equation [1], describes the evolution of the state probabilities over time.
A classical maximum likelihood (ML) approach is possible if all populations are small [2] or if the model shows simple dynamics (e.g.multi-dimensional normal distribution with time-dependent mean and co-variance matrix) such that the likelihood can be approximated by a normal distribution [3].In this case, the likelihood (and its derivatives) can usually be approximated efficiently and global optimization techniques are employed to find parameters that maximize the likelihood.However, if large populations are present in the system then direct approximations of the likelihood are unfeasible since the underlying system of differential equations contains one equation for each state and the main part of the probability mass of the model distributes on an intractably large number of states.Similarly, if the system shows complex dynamics such as multimodality, approximations of the likelihood based on Gaussian distributions become inaccurate.
In the last years several methods have been developed to accurately simulate the moments of the underlying probability distribution up to a certain order k over time [4,5,6].The complexity of these simulation methods are therefore independent of the population sizes but, for large k, the corresponding differential equations may become stiff and lead to poor approximations.However, reconstructions of complex distributions from their moments show that for many systems already for small k (e.g.k ∈ {4, . . ., 8}) the moments contain sufficient information about the distribution such as the strength and location of regions of attraction (i.e.regions of the state space containing a large proportion of the probability mass) [7].
For models with complex distributions such as multiple modes or oscillations, the accuracy and the running time of the moment approximation can be markedly improved, when conditional moments are considered in combination with the probabilities of appropriately chosen system modes such as the activity state of the genes in a gene regulatory network [8,9,10,11].Recently a full derivation of the conditional moment equations was derived and numerical results show that when the maximum order of the considered moments is high, the number of equations that have to be integrated is usually much smaller for the conditional moments approach and the resulting equations are less stiff [12].In addition, the approximated (unconditional) moments are more accurate when the same maximal order is considered.
An obvious parameter inference approach is the matching of the observed sample moments with those of the moment-based simulation of the model.Defining the differences between sample and (approximated) population moments as cost functions that depend on the parameters, an approach that minimizes the sum of the squared cost functions seems reasonable.However, in a simple least-squares approach low moments such as means and (co-)variances contribute equally to the sum of squared differences as higher moments, whose absolute magnitudes are much higher (even if they are centralized).Moreover, correlations between the different cost functions may exist and thus necessitate an approach where also products of two different cost functions are considered.The generalized method of moments (GMM) that is widely used in econometrics provides an estimator that is computed after assigning appropriate weights to the different cost function products [13].The GMM estimator has, similar as the ML estimator, desirable statistical properties such as being consistent and asymptotically normally distributed.Moreover, for optimally chosen weights it is an asymptotically efficient estimator, which implies that (asymptotically) it has minimum variance among all estimators for the unknown parameters.In this paper we explore the usefulness of the GMM for moment-based simulations of stochastic reaction networks.We focus on two particular estimators that are commonly used in econometrics: the two-step estimator of Hansen [13] and the demean estimator [14].We study the accuracy and variance of the estimator for different maximal moment orders and different sample sizes by applying the GMM to two case studies.In addition, we show that poor approximations of some higher order moments have a strong influence on the quality of the estimation.Interestingly, we see that the additional information about the covariances of the cost functions can lead to identification of all parameters.In addition, the variance of the estimator becomes smaller when higher order moments are included.Compared to the simple least-squares approach, the GMM approach yields very accurate estimates.
Table 1: Simple gene expression model [15]: The evolution of the molecular populations DNA ON , DNA OFF , and mRNA is described by the random vector X(t) = (X 1 (t), X 2 (t),X 3 (t)), respectively.

Reactions
Propensities Intervals

Stochastic chemical kinetics
Our inference approach relies on a Markov modeling approach that follows Gillespie's theory of stochastic chemical kinetics.We consider a well-stirred mixture of n molecular species in a volume with fixed size and fixed temperature and represent it as a discrete-state Markov process {X(t), t ≥ 0} in continuous-time [16].The random vector X(t) = (X 1 (t), . . ., X n (t)) describes the chemical populations at time t, i.e., X i (t) is the number of molecules of type i ∈ {1, . . ., n} at time t.Thus, the state space of X is Z n + = {0, 1, . ..} n .The state changes of X are triggered by the occurrences of chemical reactions.Each of the m different reaction types has an associated nonzero change vector v j ∈ Z n (j ∈ {1, . . ., m}), where v j = v − j + v + j such that v − j (v + j ) contains only non-positive (non-negative) entries and specifies how many molecules of each species are consumed (produced) if an instance of the reaction occurs, respectively.Thus, if X(t) = x for some x ∈ Z n + with x + v − j being non-negative, then X(t + dt) = x + v j is the state of the system after the occurrence of the j-th reaction within the infinitesimal time interval [t, t + dt).W.l.o.g.we assume here that all vectors v j are distinct.An example is given in Table 1, where the first reaction gives, for instance, v − 1 = (−1, 0, 0), v + 1 = (0, 1, 0), v 1 = (−1, 1, 0).Note that, given the initial state x = (1, 0, 0), at any time either the DNA is active or not, i.e. x 1 = 0 and x 2 = 1, or x 1 = 1 and x 2 = 0.Moreover, the state space of the model is infinite in the third dimension.
We use α 1 , . . ., α m to denote the propensity functions of the reactions, where α j (x) • dt is the probability that, given X t = x, one instance of the j-th reaction occurs within [t, t + dt).Assuming law of mass action kinetics, α j (x) is chosen proportional to the number of distinct reactant combinations in state x.Although our inference approach can be used for any model parameter in the sequel we simply assume that the proportionality constants c j are unknown and have to be estimated based on experimental data.
For x ∈ Z n + and t ≥ 0, let p t (x) denote the probability P (X(t) = x).Assuming fixed initial conditions p 0 the evolution of p t (x) is given by the chemical master equation (CME) [1] which is an ordinary first-order differential equation that has a unique solution under certain mild regularity conditions.Since for realistic systems the number of states is very large or even infinite, applying standard numerical solution techniques to the CME is infeasible.If the populations of all species remain small (at most a few hundreds) then the CME can be efficiently approximated using projection methods [17,18] or fast uniformization methods [19,20].Otherwise, i.e., if the system contains large populations, then analysis methods with running times independent of the population sizes have to be used such as moment closure approaches [6,5,4] or methods based on van Kampen's system size expansion [21,22].For both approaches, accurate reconstructions of the underlying probability distribution, i.e., the solution of the CME, are possible [7,22].

Moment-based Analysis
From the CME it is straightforward to derive the following equation for the derivative of the mean of a polynomial function T : Omitting the argument t of X and choosing T (X) = X i , X 2 i , . . .yields the following equations for the (exact) time evolution of the k-th moment E[X k i ] of the distribution for the i-th species.
where v ji refers to the i-th component of the change vector v j .In a similar way, equations for mixed moments are derived.If all reactions are at most monomolecular (1 ≥ i |v − ji | for all j), then no moments of order higher than k appear on the right side (also in the mixed case) and we can directly integrate all equations for moments of at most order k.However, most systems do contain bimolecular reactions (in particular those with complex behavior such as multistability).In this case we consider a Taylor expansion of the multivariate function about the mean µ := E[X].It is easy to verify that, when applying the expectation to the Taylor sum, the right side only contains derivatives of f at X = µ, which are multiplied by central moments of increasing order.For instance, for k = 1 and a single species system with n = 1, Equation (2) becomes • ∂ 2 ∂x 2 α j (µ) + . . .In the expansion, central moments of higher order may occur.For instance, in the case of bimolecular reactions, the equations for order k moments involve central moments of order k + 1 since second order derivatives are non-zero.By converting the non-central moments to central ones and truncating the expansion at some fixed maximal order k, we can close the system of equations when we assume that higher order central moments are zero.A full derivation of the moment equations using multi-index notation (as required for n > 1) can be found in [6].
The accuracy of the inference approach that we propose in the sequel depends not only on the information given by the experimental data but also on the accuracy of the approximated moments.Different closure strategies have been suggested and compared in the last years showing that the accuracy can be improved by making assumptions about the underlying distribution (e.g.approximate log-normality) [23,24].In addition, the accuracy of moment-closure approximations has been theoretically investigated [25].

Hybrid Approaches
Compared to deterministic models that describe only average behaviors, stochastic models provide interesting additional information about the behavior of a system.Although this comes with additional computational costs, it is in particular for systems with complex behavior, such as multimodality or oscillations, of great importance.Often the underlying source of multiple modes are discrete changes of gene activation states that are described by chemical species whose maximal count is very small (e.g. 1 for the case that the gene is either active, state 1, or inactive, state 0).Then the moment-based approaches described above can be improved (both in terms of accuracy and computation time) by considering conditional moments instead [12,8,10,9].The idea is to split the set of species into species with small and large populations and consider the moments of the large populations conditioned on the current count of the small populations.For the small populations, a small master equation has to be solved additionally to the moment equations to determine the corresponding discrete distribution.More specifically, if x is the subvector of x that describes the small populations and x is the subvector of the large populations (i.e.x = (x, x)), then for the distribution of x we have where vj is the corresponding subvector of v j .Using Taylor expansion, the conditional expectations of the propensities can, as above, be expressed in terms of conditional moments of the large populations.In addition, equations for the conditional moments of the large populations can be derived in a similar way as above.For instance, the partial mean E[ Xi | x]p t (x) follows the time evolution where on the right side again Taylor expansion can be used to replace unknown conditional expectations by conditional moments.As above a dependence on higher conditional moments may arise and a closure approach has to be applied to arrive at a finite system of equations.Unconditional moments can then be derived by summing up the weighted conditional moments.It is important to note that if p t (x) = 0 then algebraic equations arise turning the equation system into a system of differential-algebraic equations, which renders its solution more difficult (see [12,26] for details).
In Fig. 1 we give an example for a comparison of the accuracy of the hybrid approach and the standard moment closure (assuming that all central moments above a fixed maximal order are zero) for one of our case studies.As "exact" moment values we chose the average of 500,000 samples generated by the stochastic simulation algorithm (SSA) [27] and considered the absolute difference to the approximated moments of one chemical population until a maximal order of four.Since for our case studies we assumed 10,000 samples we additionally plot the (approximated) standard deviation of the 50 sample means taken from batches of 10,000 samples.The moments computed based on the hybrid approach show a smaller error than those computed using the standard moment closure and lie within the deviations given by the sample moments.As investigated in [12], the number of ODEs that needs to integrated, when the hybrid approach is applied, grows slower in the maximum moment order as for the standard approach (e.g. for the example in Figure 1 we have 126 (standard) against 15 (hybrid) equations for a

Sample moment Hybrid Standard
Figure 1: Absolute error of the first four moments of P 1 for the exclusive switch model, where the moments are either computed based on a standard moment closure approach or a hybrid approach.The maximal order of the considered moments is 5.
maximal order of 4).However, certain reductions are possible in the standard approach [28].In addition, for all case studies that we considered, the unconditional moments resulting from the hybrid approach are not only more accurate but the corresponding system of ODEs is also less stiff.This results in a dramatically reduced execution time for the proposed inference procedure, which has to solve the system for many different parameter instances.In addition, the more accurate results substantially improve the quality of the estimated parameters as demonstrated in the Results section.

Generalized Method of Moments
We assume that observations of a biochemical network were made using single-cell analysis that gives population snapshot data (e.g. from flow cytometry measurements).Typically, large numbers (about 5,000-10,000) of independent samples can be obtained where each sample corresponds to one cell.It is possible to simultaneously observe one or several chemical populations at a time in each single cell.In the sequel, we first describe the inference procedure for a single observation time point and a single chemical species that is observed.Later, we extend this to several time points and species.For a fixed measurement time t and a fixed index i of the observed population we can define the r-th order sample moment as where Y is the -th sample of the observed molecular count of the i-th species at time t and there are N samples in total.For large N , the sample moments are asymptotically unbiased estimators of the population moments.Let θ be a vector of, say, q ≤ m unknown reaction rate constants1 , for which some biologically relevant range is known.Moreover, let m r be the r-th theoretical moment, i.e., m r (θ In the sequel we also simply write Y instead of Y whenever Y appears inside the expectation operator or when the specific index of the sample is not relevant.An obvious inference approach would be to consider the ordinary least squares estimator where k is the number of moment constraints.Under certain conditions related to the identification of the parameters as discussed below, this estimator is consistent (converges in probability to the true value of θ) and asymptotically normal.However, its variance may be very high.This is due to the fact that for increasing order the variance of the sample moments increases and so does the variance of the estimator.This problem can be mitigated by choosing appropriate weights for the summands in (3).Moreover, since correlations between the cost functions exist, a more general approach that considers mixed terms is needed.This leads to a class of estimators, called generalized method of moments (GMM) estimators that have been introduced by Hansen [13].
The idea is to define the estimator as where g(θ) is the column vector with entries g r (θ), r = 1, . . ., k, and W is a positive semi-definite weighting matrix.Note that by defining f r (Y, θ) = Y r − m r (θ) we see that is the sample counterpart of the expectation The latter satisfies where f (Y, θ) is the column vector with entries f r (Y, θ) and θ 0 is the true value of θ.Note that the choice W = I gives the least-squares estimator with k terms while for general W there are k•(k+1) 2 terms in the objective function (with k being the dimension of g(θ)).
Here we assume that identification of θ is possible, i.e., we require that q ≤ k, i.e., the number of the moment constraints used is at least as large as the number of unknown parameters and In addition, the theoretical moments m r (θ) should not be functionally dependent (see Chapter 3.3 in [29]) to ensure that the information contained in the moment conditions is sufficient for successfully identifying the parameters.
By applying the central limit theorem to the sample moments, it is possible to show that the GMM estimator is consistent and asymptotically normally distributed and that its variance becomes asymptotically minimal if the matrix W is chosen such that it is proportional to the inverse of the covariances between the Y r [13].This result is intuitive since usually higher moments might be more volatile than others and, thus, it makes sense to normalize the errors in the moments by the corresponding covariance.
Formally, we define Y as the random vector with entries (Y ) r for r = 1, . . ., k and, as before, omit the subindex if it is not relevant.Then, and choosing W ∝ F −1 will give an estimator with smallest possible variance, i.e., it is asymptotically efficient in this class of estimators [13,29].
Since F depends on the true value θ 0 , a two-step updating procedure has been suggested during which W is chosen as the identity matrix I in the first step such that an initial estimate θ is computed.In a second step, F is estimated by the sample counterpart of If, however, the model is "misspecified", i.e., then the above estimator is no longer consistent.An estimator for F that is consistent is then given by [29] where Y is the vector with entries 1 N N =1 Y r .In the sequel we refer to the estimator based on (6) as the demean estimator.This estimator removes the inconsistencies in the covariance matrices estimated from the sample moments by "demeaning".Since moment-based analysis methods usually give approximations of the moments and not the exact values, we consider both, the demean estimator defined by (6) and the estimator of the 2-step procedure in (5) for our numerical results.
The estimation procedure described above can be generalized to several dimensions by also using mixed sample moments instead of only mr and mixed theoretical moments instead of only m r (θ).For instance, for moments up to order two and two simultaneously observed species X and Y , we use the cost functions In the same way, we can extend the estimators F1 and F2 to several dimensions.For instance, the covariance between X Y and X 2 can be estimated as where again we use * to denote the sample mean operator.
If, instead of snapshot data for a single observation time, independent samples for different times are available then the GMM estimator can also be easily generalized to θ = arg min Here, for each time point t ∈ {t 0 , . . ., t f } the vector of cost functions g (t) is calculated as before and the minimum is taken over the sum of these uncorrelated cost functions.Note that for each observation time point a weight matrix W (t) has to be computed.In the two-step approach, the initial weight matrices are all equal to the identity matrix and then in the second step different weight matrices may arise since the estimator of F depends on Y , which in turn depends on the distribution of the model at the specific time t.

Results
To analyze the performance of the GMM we consider two case studies, the simple gene expression model in Table 1 and a network of two genes with mutual repression, called exclusive switch [30].The reactions of the exclusive switch are listed in Table 2.All propensities follows the law of mass action.For the parameters that we chose, the corresponding probability distribution is bi-modal.
For fixed reaction rate constants and initial conditions, we used the SSA to generate trajectories of the systems and record samples of the size of the corresponding protein/mRNA populations.In addition, we used the software tool SHAVE [31] to generate moment equations both for the standard moment closure and for the hybrid approach.In SHAVE the differential-algebraic equations are transformed into a system of (ordinary) differential equations after truncating modes with insignificant probabilities.Then an accurate approximation of the solution using standard numerical integration methods can be obtained.The system of moment equations is always closed by setting all central moments of order > k to zero.We used for the inference approach only the moments up to order k − 1 since the precision of the moments of highest order k is often poor.SHAVE allows to export the (hybrid) moment equations as a MATLABcompatible m-file.We then used MATLAB's Global Search routine to minimize the objective function in Eq. ( 4).Global Search is a method for finding the global minimum by starting a local solver from multiple starting points that are chosen according to a heuristic [32].Therefore the total running time of our method depends on the tightness of the intervals that we use as constraints for the unknown parameters as well as on the starting points of the Global Search procedure.The running times for one local solver call (using the hybrid approach for computing moments) were about 2 s (demean estimator) and 40 s (2-Step estimator) for the gene expression model.For the exclusive switch the average running time for a local solver call was about 2 min (demean) and 10 min (2-Step).Note that the total running time depends on the amount of local solver calls carried out by Global Search, which varied between 2 and 50.For all experiments we chose a single initial point that is located far away from the true values and allowed Global Search to choose 500 (potential) further starting points.Different initial points yielded similar results except if the initial points is chosen close to the true values (then the results are significantly better in particular in the case of only few moment constraints).
The intervals that we used as constraints for the parameters are all listed in Tables 1 and 2.

Standard vs. hybrid momentbased analysis
In Fig. 2 we plot the results of a comparison between the standard and the hybrid moment closure when it is performed during the optimization procedure of the GMM inference approach.We chose the exclusive switch model for this since for this model the accuracy of the standard approach is poor.As an estimator for F we used (6), which is based on demeaning (demean).Results for the 2-step procedure show similar differences when standard and hybrid moment closure are compared.We fixed the degradation rates to ensure that identification of p 1 and p 2 is possible when the two protein populations are measured at only a single observation time point.To simultaneously identify all parameters (including p 1 and p 2 ) several observation time points are necessary (see Appendix A).
The true values of the six unknown parameters are plotted against the means and standard deviations of the estimated values for a maximal moment order of 2 and 3, where for each of the six unknown parameters 50 estimations based on 10,000 samples each were used.
We see that the inaccurately approximated moments of the standard approach lead to severe problems in the inference approach.Nearly all parameters are estimated more accurately when the hybrid moment closure is used.For parameter b 1 most of the optimization runs converged to the upper limit of the given interval (0.1) when the standard approach was used.In addition, the running times for the standard moment closure were very high compared to the hybrid approach.For the results in the sequel, we only used the hybrid moment closure.

Two-step vs. demean approach
In Fig. 3 and 4 we plot results of the GMM approach applied to the two example networks, where we compare the performance of the two-step estimator in Eq. ( 5) with the demean estimator in Eq. ( 6).We plot the true values of the parameters against the estimated values, where 2-Step I is the result of the first step of the two-step procedure (with W = I) and 2- Step II that of the second step (with W = F1 and F1 as defined in Eq (5)).
For the results in Fig. 3 only one population (mRNA) was observed at t = 100 where the initial conditions were such that DNA OFF = 1, DNA ON = 0 and 10 mRNA molecules were present in the system.For three parameters the means and standard deviations of the estimated values are plotted, again based on 50 repetitions of the inference procedure.
In the first row of Fig. 3 the accuracy of the estimation is compared with respect to the number/order of moments considered, where again for each of the 50 estimated values 10,000 samples were used.We see that if only one moment is considered or if equal weights are used for the first two moments, only a rough estimate is possible since identification is not possible.The accuracy is markedly improved when the weights are chosen according to the demean approach.Here, it is important to note that for a maximal order of k = 2, in W we also consider, besides the squared cost functions g 1 (θ) 2 and g 2 (θ) 2 , the mixed term g 1 (θ)g 2 (θ).This additional term significantly improves the quality of the estimation such that it is possible to achieve a good estimation of the parameters with only the sample mean and the sample second moment).However, the variance of the estimator is relatively high but decreases significantly when also the third (and fourth) moment is considered.Here, demean and the second step of the twostep procedure perform equally well.Opposed to this W = I (first step of two-step procedure) gives poor results and a high variance also if higher moments are considered.In Table 3 we give an example for the (normalized) matrix W as used for demean and 2-Step II.The two methods choose nearly identical weights and the mean has the highest weight.Then, the mixed cost function for mean and second moment has a (negative) weight of about 2 • (−4.95)% since these moments are negatively correlated (and so are the second and third moment).All terms that in-  volve the third moment have a very small weight as their covariances are high.
It is important to note that also if the number of moment constraints, k, is equal to the number of parameters, q, 2-Step I performs poor (see results for maximal order k = 3 in the first row of Figure 3).The reason is that in this example identification is not possible if only three terms are used due to functional dependencies between the parameters of the first two reactions and due to the fact that only at a single time point measurements were made.If identification was possible and the computed population moments were exact, the results should be independent of the choice of W for the case that q equals k.Thus, the weights given by the estimators for F in ( 5) and ( 6) substantially increase the accuracy of the results allow identification, because additional information about the covariances between the Y r are used by the terms of the objective function that corre-spond to the off-diagonal entries of W .In the second row in Figure 3, we compare the accuracy for different samples sizes where the first three moments were considered.While 2-Step I does not show a systematic improvement when the number of samples increases, we see for 2-Step II and demean not only significantly improved estimates but also smaller variances.However, in the case of few samples, demean gives in particular for parameter a a high variance.This comes from the fact that the corresponding estimator uses the sample mean instead of the theoretical mean and therefore the weight matrix is far from optimal if N is small.In Fig. 4, (a)-(h), we plot results for the exclusive switch model where all eight parameters where estimated based on observations of the two protein populations of P 1 and P 2 at two time points.On the x-axis the maximal order of moments used is plotted.For the orders 1, 2, 3 and 4 there are in total 2, 5, 9 In addition, the variance of the estimator decreases with increasing maximal order.However, the values for 2-Step II become slightly worse and have higher variance for a maximal order of four since these moments are not approximated very accurately.Also the accuracy of the demean estimator does not improve when the maximum order is increased from three to four.Thus, the cost functions of order four moments do not lead to any significant improvement in this example and should be excluded.

Discussions
In the context of stochastic chemical kinetics, parameter inference methods are either based on Markov chain Monte Carlo schemes [33,34,35,36], on approximate Bayesian computation techniques [37,38,39] or on maximum likelihood estimation using a direct approximation of the likelihood [15,2] or a simulation-based estimate [40,41].Maximum likelihood estimators are, in a sense, the most informative estimates of unknown parameters [42] and have desirable mathematical properties such as unbiasedness, efficiency, and normality.On the other hand, the computational complexity of maximum likelihood estimation is high as it requires a simulation-based or numerical solution of the CME for many different parameter instances.Since the applicability of these methods is limited, approaches based on moment closure [24,43,44,3,45] or linear noise approximations [46, 47] have been developed.A promising approach based on moment closure relies on expressions for the approximate distributions of sample means and variances that involve the moments of the model [24,44,43].However, these approaches are currently limited to a single species and do not take into account information about sample moments of an order higher than two.A similar approach is used in [3] where the moment equations are closed by a Gaussian approximation.The parameter estimation is based on using a ML estimator and a Markov chain Monte-Carlo approach.In [45] the importance of higher moment orders when using least square estimators is shown.Weights for terms that correspond to different moments are chosen ad hoc and not based on any statistical framework.Here, we present results for the general method of moments that assigns optimal weights to the different moment conditions.We showed that trivial weights (e.g.identity matrix) give results whose accuracy can be strongly increased when optimal weights are chosen.In the very common case that functional dependencies between parameters exist (e.g.degradation and production of the same species) and identification is difficult, the GMM estimator allows to accurately identify the parameters.Moreover, our results indicate that the accuracy of the estimation increases when moments of order higher than two are included.A general strategy could be to start with k = q cost functions (equal to the number of unknown parameters) and increase the maximal order until tests for over-identifying restrictions (e.g. the Hansen test [13]) suggest that higher orders do not lead to an improvement.In this way, cost functions that do not improve the quality of the estimation, such as the fourth order cost functions for the results in Fig. 4, can be identified.We also found that an accurate approximation of the moments is crucial for the performance of the GMM estimator.Thus, hybrid approaches such as the method of conditional moments [12] or sophisticated closure schemes (e.g.[24]) should be preferred.

Conclusion
Parameter inference for stochastic models of cellular processes demands huge computational resources.
The proposed approach based on the generalized method of moments is based on an adjustment of the statistical moments of the model and therefore does not require the computation of likelihoods.This makes the approach appealing for complex networks where stochastic effects play an important role, since the integration of the moment equations is typically fast compared to other computations such as the computation of likelihoods.The method does not make any assumptions about the distribution of the process (e.g.Gaussian) and complements the existing moment-based analysis approaches in a natural way.
Here, we used a multistart gradient-based minimization scheme, but the approach can be combined with any global optimization method.We found that the weights of the cost functions computed by the GMM estimator yield clearly more accurate results than trivial (identical) weights.In particular, the variance of the estimator decreases when moments of higher order are considered.We focused on the estimation of reaction rate constants and, as future work, we plan to investigate how well Hill coefficients and initial conditions are estimated.
An important advantage of the proposed method is that in the economics literature the properties of GMM estimators have been investigated in detail over decades and several variants and related statistical tests are available.We will also check how accurate approximations for the variance of the GMM estimator are [29] and whether other estimators that were proposed in the context of the GMM perform differently (e.g. the continuously updating GMM [48]).Since we found that when moments of order higher than three are included, the results become slightly worse, we will in addition explore the usefulness of statistical tests for over-identifying moment conditions.In this way, we can ensure that only moments conditions are included that improve the estimation.

A Influence of multiple Time Points
For certain pairs of chemical reactions the identifying condition is violated when only regarding snapshot data from a single time point.For example in the very simple reaction system ∅ → A rate λ, A → ∅ rate µ every combination of λ and µ with λ µ = const would lead to the same snapshot data for species A at a certain time point.In order to resolve this problem more information, i.e. snapshot data at several time points (of independent samples to avoid correlation), is needed or one of the parameters has to be fixed.In section 3.1 this problem already occured for the exclusive switch: The corresponding rates are production p i and degradation d i as well as binding b i and unbinding u i .By fixing the degredation rates d i the estimation of the production rates becomes quite well, whereas b i und u i can not be estimated due to the identifying problem.For the following estimations the demean procedure was used.The 2-Step method showed a similar behavior.With no fixed parameters and only a single time point t = 200 nothing can be reliably estimated as indicated in Fig. 5.The estimated values are often far away from the real ones and the variance is also quite high in all cases.The consideration of a second time point (t = 100, 200) resolves the issue in case of sufficient moment conditions, i.e. order 2 or higher.Four time points (t = 50, 100, 150, 200) do not further improve the estimation but due to the higher total number of samples (500,000 per time point) the variance is decreased.

2 (f) u 2 Figure 2 :
Figure 2: Exclusive switch model: Comparison of estimations with the demean procedure for the standard moment closure and hybrid moments.

Figure 3 :
Figure 3: Gene expression model: Estimated parameters a,b and c for different numbers/orders of moments and 10,000 samples (a)-(c) and for different sample sizes based on 3 moments (d)-(f).

b 2 Figure 4 :
Figure 4: Exclusive switch model: Estimated parameters for maximal moment order 1-4 based on 10,000 independent samples observed at time t = 100 and t = 200 (a)-(h) and at 1-4 different time points for the demean-based estimation of b 2 (i).

Figure 5 :
Figure 5: Exclusive switch model: Comparison of estimations with the demean procedure for single time point data and combined data for samples of two and four independent time points.

Table 2 :
[30]usive switch model[30]: Two different proteins P 1 and P 2 can bind to a promotor region on the DNA.If P 1 is bound to the promotor the production of P 2 is inhibited and vice versa.In the free state both proteins can be produced.

Table 3 :
Weight matrices for the two-step and demean procedure with moment order 3 for the gene expression model.The entries are normalized with respect to the weight for the mean and rounded (the original weight matrices are both positive semidefinite).