Bayesian Computation Methods for Inference in Stochastic Kinetic Models

In this paper we investigate Monte Carlo methods for the approximation of the posterior probability distributions in stochastic kinetic models (SKMs). SKMs are multivariate Markov jump processes that model the interactions among species in biological systems according to a set of usually unknown parameters. The tracking of the species populations together with the estimation of the interaction parameters is a Bayesian inference problem for which Markov chain Monte Carlo (MCMC) methods have been a typical computational tool. Specifically, the particle MCMC (pMCMC) method has been shown to be effective


Introduction
Stochastic kinetic models (SKMs) are multivariate systems that model interactions among species in ecological, biological, and chemical problems, according to a set of unknown rate parameters [1].The aim of this paper is to study computational methods for the approximation of the posterior distribution of the rate parameters and the populations of all species, provided a set of discrete, noisy observations is available.This inference problem has been traditionally addressed in a Bayesian framework using Markov chain Monte Carlo (MCMC) schemes [1][2][3][4].In [5], a particle MCMC (pMCMC) method [6] has been applied to Lotka-Volterra and prokaryotic autoregulatory models.The pMCMC technique relies on a sequential Monte Carlo (SMC) approximation of the posterior distribution of the populations to compute the Metropolis-Hastings (MH) acceptance ratio.
However, MCMC methods in general and pMCMC in particular suffer from a number of problems.The convergence of the Markov chain is hard to assess and the final set of samples presents correlations which can greatly reduce its efficiency.Besides, MCMC methods do not (easily) allow for 2 Complexity parallel implementation and turn out to be computationally intensive, even for relatively simple models.A classical technique to reduce the complexity of the existing MCMC methods when applied to SKMs is the so-called diffusion approximation of the underlying stochastic jump process [7].However, the diffusion approximation breaks down in low-concentration scenarios [5].Other techniques, including linear noise approximations [8][9][10][11] and Gaussian-process estimators of the likelihood [12], have also been proposed in the past few years to reduce the computational cost.The parameters of the MCMC proposal are also hard to choose and they determine the performance of the algorithm.
An appealing alternative to MCMC methods is the class of the population Monte Carlo (PMC) algorithms [13].PMC is an adaptive importance sampling (IS) scheme [14] that yields a discrete approximation of a target probability distribution.The PMC algorithm has important advantages with respect to MCMC techniques.It provides independent samples and asymptotically unbiased estimates at all iterations, which avoids the need of a convergence period.Additionally, PMC may be easily parallelized.
On the other hand, the main weakness of IS and PMC is their low efficiency in high dimensional problems, due to the well-known degeneracy problem [15].The recently proposed nonlinear PMC (NPMC) scheme [16,17] mitigates this difficulty by computing nonlinear transformations of the importance weights (IWs), in order to smooth their variations and avoid degeneracy.In [17] a rigorous proof of convergence of the nonlinear importance sampling scheme as the number of Monte Carlo samples increases is presented.Similarly to the pMCMC method in [5], the NPMC algorithm resorts to an SMC approximation of the posterior distribution of the populations to compute the IWs.
In this paper we apply the NPMC method to the estimation of both the parameters and the unobserved populations in a complex SKM modelling an autoregulatory feedback network with several species and noisy observations.We present numerical results to compare the performance of the state-ofthe-art pMCMC and the proposed NPMC in two scenarios of different dimension and with two different observation models.We show that the NPMC method outperforms the pMCMC method for the same computational cost.
The rest of the paper is organized as follows.In Section 2 we present an introduction to the basics of SKMs and the usual solutions to this Bayesian inference problem.In Sections 3 and 4 we describe the pMCMC and NPMC methods, respectively, when applied to the approximation of posterior distributions in SKMs.In Section 5 we numerically compare the performance of pMCMC and NPMC schemes when applied to an autoregulatory feedback network, an SKM that can be interpreted as a generalization of the paradigmatic Lotka-Volterra model.Finally, Section 6 is devoted to the conclusions.

Bayesian Inference for Stochastic Kinetic Models
where   > 0,  = 1, . . .,  are the random constant rate parameters and  V and  V ,  = 1, . . ., , V = 1, . . ., , denote the coefficients of the population of the V-th species, before and after the -th interaction, respectively.We will refer to  V and  V as reactant and product coefficients, respectively, by analogy with terminology commonly used in biochemical reactions.A matrix P of size  ×  contains the reactant coefficients  V and, similarly, Q contains the product coefficients  V .The stoichiometry matrix of size  ×  is defined as S = (Q − P) ⊤ .The vector c = [ 1 , . . .,   ] ⊤ contains the rate parameters.
See, e.g., [1] for a detailed description and discussion of the class of SKMs.

Bayesian Inference for SKMs.
We consider the log-transformed rate parameters  = [ 1 , . . .,   ] ⊤ , where   = log(  ), Complexity 3  = 1, . . ., , with prior pdf ().The prior pdf (for simplicity of notation, in this section we use  to denote the pdfs in the model.We write conditional pdfs as (y | x), and joint densities as () = ( 1 , . . .,   ).This is an argumentwise notation, hence ( 1 ) denotes the distribution of  1 , possibly different from ( 2 )) of the initial population vector x 0 is denoted by (x 0 ).We assume that a linear combination of the populations of a subset of species is observed at discrete time instants corrupted by Gaussian noise, i.e., where M is the observation matrix with dimensions  ×  and w  ∼ N  (w  ; 0, Λ) is a multivariate Gaussian noise component with zero mean and covariance matrix Λ.We denote the complete observation vector with dimension DN × 1 as y = [y ⊤ 1 , . . ., y ⊤  ] ⊤ .The dynamical behavior of an arbitrary SKM may be described in terms of the following set of equations (that describe a state-space model with unknown parameters): where (x  | x −1 , ) and (y  | x  ) denote the transition pdf and the likelihood function, respectively.The Gillespie algorithm [18] (see Appendix A for a detailed description) allows to perform exact forward simulations of arbitrary SKMs, drawing samples from the transition densities (x  | x −1 , ),  = 1, . . ., , given a set of log-rate parameters  and an initial population x 0 .While the assumption Gaussian observational noise may not be adequate for some applications of SKMs, it has been employed by other authors in the past [5,11] and, in particular, it has been used for computer experiments involving the prokaryotic autoregulation model [5] that we study numerically in Section 5.
In this paper, we aim at obtaining a Monte Carlo approximation of the full joint posterior distribution of the log-rate parameters  and the populations x, with density (5) given the prior distributions () and (x 0 ), the transition pdf (x | x 0 , ) = ∏  =1 (x  | x −1 , ), and the likelihood function (y | x) = ∏  =1 (y  | x  ) constructed from (3).We note that this state-space model can represent the SKM reaction equations of Section 2.1, yet it is abstract enough to make it applicable to other, relatively similar, models.For example, the assumption of mass action kinetics is not explicitly needed for this representation; hence other reaction rates could be incorporated into the state-space model (and the inference algorithms to be derived from it).
Bayesian inference based on exact stochastic simulations from (x  | x −1 , ) generated via the Gillespie algorithm often becomes practically intractable even for models of modest complexity [7].Thus, it is very common to resort to a continuous approximation of the underlying stochastic process, which is known as the diffusion approximation [1].This approximation is known to be poor in low-concentration scenarios and thus should be avoided for models involving species with a very low population.Alternatively, in [3] the authors propose a solution based on a moment closure approximation of the stochastic process.Other techniques for the approximate modelling of the underlying jump process that have become relatively popular in the past few years include Gaussian processes [12] and the linear noise approximation [8][9][10].
This inference problem has been traditionally addressed using MCMC methods, and IS based schemes have been avoided due to their inefficiency in high dimensional spaces [1].In [2] various MCMC algorithms are evaluated in datapoor scenarios.In [5] a likelihood-free pMCMC scheme [6] is applied to this problem.This method is, to the best of our knowledge, the most powerful, yet computationally expensive, method provided so far for this kind of applications.
In [16] a NPMC scheme is proposed for the approximation of the marginal posterior pdf ( | y).The performance of the NPMC method is tested in a simple SKM known as predator-prey model [20], providing excellent results with a low computational cost.
In this paper we compare the performances of the pMCMC and the NPMC methods in the approximation of the full joint posterior pdf (, x | y) in (5), which allows performing Bayesian inference for the rate parameters  and the full sample path x, including unobserved components.

Particle MCMC for SKM's
The particle marginal Metropolis-Hastings (PMMH) algorithm is a pMCMC method originally proposed in [6] for Monte Carlo sampling from the full posterior distribution (, x | y).The PMMH scheme suggests a proposal mechanism of the form ( ⋆ | ) p (x ⋆ | y,  ⋆ ).To be specific, a new candidate in the parameter space,  ⋆ , is drawn from an arbitrary proposal distribution ( ⋆ | ), while the new candidate in the variable space, x ⋆ , is generated using an approximation of the posterior marginal (x ⋆ | y,  ⋆ ) constructed by means of an SMC algorithm (i.e., a particle filter) with  particles and denoted p (x ⋆ | y,  ⋆ ).The probability of accepting the proposed pair
where p (y |  ⋆ ) is an unbiased approximation of the marginal likelihood of  ⋆ (i.e., (y |  ⋆ )), computed, again, by way of a particle filter with  particles.The PMMH algorithm is reproduced in Algorithm 1, and the SMC approximations of (y |  * ) and (x * | y,  * ) are described in Appendix B. Full details can be found in [6].Note that the forward simulation of the stochastic process in the particle filter may be performed exactly with the Gillespie algorithm or using a diffusion approximation.
In [5] the proposal is selected as a Gaussian random walk ( ⋆ | ) = N  ( ⋆ ; ,  2 I), ignoring the correlation structure of the target distribution.The variance  2 has to be tuned and partly determines the performance of the algorithm (see [21] for some practical advice on the choice of this parameter).Low values of  result in a poor exploration of the space of , while high  values yield low acceptance rates.In both situations the resulting chain presents highly correlated samples and slow mixing properties.To reduce the correlation among samples it is common to thin the obtained chain, keeping a subset of equally spaced samples and discarding the rest.After removing the initial burnin samples and thinning the output, we obtain a Markov chain { () , x () }  =1 with  correlated samples.Then, we may construct a sample approximation of the marginal posterior distributions of the parameters  and the populations x, as respectively, where   (푖) and  x (푖) denote the unit delta measure centered at  () and x () , respectively.The approximation of the full joint posterior has the form

Nonlinear PMC for SKM's
The PMC method [13] is an iterative IS scheme that generates a sequence of proposal pdf's  ℓ (⋅), ℓ = 1, . . ., , that approximate a target pdf  along the iterations.The NPMC scheme is proposed in [16] and it introduces nonlinearly transformed IWs (TIWs) in order to mitigate the numerical problems caused by degeneracy in the proposal update scheme.

NPMC Targeting 𝑝(𝜃 | y).
We first consider as a target density the marginal posterior pdf of the parameters  given the observation vector y, i.e., () = ( | y).As in [16], we construct the proposal pdf  ℓ (), ℓ = 2, . . ., , as a Gaussian approximation of the target pdf obtained at the previous iteration ℓ − 1, whose mean and covariance parameters are selected to match the moments of the previous sample set.
The NPMC algorithm is displayed in Algorithm 2. Details and some simple convergence results can be found in [16].See [17] for a rigorous analysis.Similar to the pMCMC algorithm, in the NPMC implementation the densities (x | y, ) and (y | ) required in steps 2 and 3 are replaced by their SMC approximations, which are given in Appendix B. The NPMC method may also use either exact or approximate samples of the stochastic process, depending on the computational capabilities.
For the clipping procedure performed in step 4 we consider, at each iteration ℓ, a permutation  1 , . . .,   of the indices in {1, . . ., } such that  ≥ T  ℓ ,  = 1, . . .,   − 1.This transformation leads to   flat TIWs in the region of interest of , allowing for a robust update of the proposal.The performance of the algorithm shows little sensitivity to the selection of the clipping parameter   [16].For simplicity, step 5 performs multinomial resampling.
At each iteration of the NPMC algorithm we may construct a discrete approximation of the posterior probability distribution described by the pdf ( | y), based on the set of samples and TIWs, as 1. Draw a set of  samples { () ℓ }  =1 from the proposal density  ℓ (): (i) at iteration ℓ = 1, let  1 () = ().
4. For  = 1, . . ., , compute normalized TIWs,  () ℓ , by clipping the original IWs as The choice of a Gaussian approximation of the proposal  ℓ+1 () in step 6 is arbitrary.We have adopted it here for simplicity.Other families of pdfs can be used without modifying the rest of the algorithm.

NPMC Targeting 𝑝(𝜃, x | y).
The NPMC method proposed in [16] may be readily applied to the approximation of the full joint posterior (, x | y), in an manner equivalent to the pMCMC algorithm.We consider a sampling mechanism of the form () p (x | y, ), where samples  () are again generated from the latest proposal () and x () are drawn from the SMC approximation p (x | y,  () ) obtained via particle filtering (the iteration index has been omitted for simplicity).Then, the standard, unnormalized IW associated with the pair ( () , x () ) is computed as and is independent of x.This reveals that, when samples x ()  ℓ are drawn from p (x | y, ), the algorithm yields a discrete approximation of the posterior distribution of the unobserved populations x constructed as Even though the proposed NPMC and the pMCMC require very similar computations for each pair of samples of {, x} and thus have an equivalent computational cost, the NPMC has a set of important advantages with respect to its MCMC counterpart.PMC methods provide independent sets of samples at all iterations and do not require a burnin period.On the other hand, the nonlinearity applied in the NPMC mitigates weight degeneracy, which is the main problem arising in conventional IS based methods, dramatically increasing its efficiency in high dimensional problems.As a consequence, we claim that the total number of samples,  (and thus, the computational complexity), required by the NPMC can be significantly lower than that of pMCMC, .Additionally, contrary to pMCMC, which requires a careful choice of the proposal tuning parameter, the proposed method does not require the accurate fitting of any parameters.
The NPMC method processes a set of  i.i.d.samples at each iteration, requiring a low number of iterations (around 10 for the type of problems addressed here) for convergence to the target distribution.The bulk of the computational cost of NPMC, as well as of pMCMC, is the SMC approximation of the likelihood function.In the pMCMC algorithm the samples  () are processed sequentially (one after the other), and this process cannot be parallelized.On the contrary, at each iteration ℓ of the NPMC method, the process of drawing  samples  ()  ℓ and computing the associated IWs  () * ℓ can be performed independently for each sample .Thus, steps 1, 2, and 3 can be easily parallelized, reducing the total execution time up to that of a single sample  ()  ℓ .On the other hand, steps 4, 5, and 6 require the complete set of samples and Complexity weights { ()  ℓ ,  () * ℓ }  =1 and must be performed in a centralized manner.However, these computations have a negligible cost in comparison with a single likelihood approximation.Thus, the parallelization of the NPMC method can allow for a reduction in execution time up to a factor ≈ 1/.Note that we refer here to the parallelization of the PMC method and not of the SMC filter used to approximate the likelihood function.
An extensive numerical comparison of pMCMC versus NPMC for a complex SKM that can be interpreted as a generalization of the paradigmatic Lotka-Volterra model is presented in Section 5.

Example: An Autoregulatory Feedback Network
In this section, we compare the performance of the pMCMC and the NPMC methods when applied to the problem of approximating the posterior distributions of the log-rate parameters ( | y) and the populations (x | y) in a complex SKM given some observed data y.This model is significantly more involved than the standard predatorprey system.It has been introduced in [7] and further analyzed in [1, 5] for biochemical processes.In particular, it has also been considered as a model for representing the mechanisms for autoregulation in prokaryotes.The model is minimal in terms of the level of details included and offers a simplistic view of the mechanisms involved in gene autoregulation.However, it contains many of the interesting features of general autoregulatory feedback networks and it does provide sufficient detail to capture the network dynamics.

Prokaryotic Autoregulatory
We construct the -dimensional vector containing the population of all species at time instant  as x() = [  (),   (),   2 (),  ⋅ 2 (),   ()] ⊤ .Thus, we obtain a stoichiometry matrix of the form and the hazard vector is given by where the time dependance of the population of each species is omitted for notational simplicity.This model involves a conservation law given by the relation  ⋅ 2 +   = , where  is the number of copies of this gene in the genome.We could use this relation to remove  ⋅ 2 from the model, replacing any occurrences of the latter in the hazard function with  −   , but in this paper we abide by the notation in equation (15).Further details of this model can be found in [1].

Simulation Setup.
We have selected most of the simulation parameters following [5].The true vector of rate parameters which we aim to estimate has been set to c = [0.1,0.7, 0.35, 0.2, 0.1, 0.9, 0. The initial populations and the conservation constant have been set to x 0 = [ 1 (0), . . .,   (0)] ⊤ = [8, 8, 8, 5, 5] ⊤ and  = 10, respectively.In all the simulations in this paper we have performed exact sampling from the stochastic model with the Gillespie algorithm to obtain the likelihood approximation via particle filtering.The number of particles for the SMC approximation p (x | , y) has been set to  = 100 for all the simulations (both the pMCMC and the NPMC algorithms converge (as  → ∞ and  → ∞, respectively) even for fixed  [6,17].We have tested numerically that for  > 100 performance hardly improves, while running times obviously become larger).Independent uniform priors U(  ; −7, 2) have been taken for each   = log(  ).Opposite to [5], the initial populations x 0 are assumed unknown for the inference algorithm, which employs independent Poisson priors ( V (0)) = P( V (0);  V ), with the  V parameters set to the true initial populations, that is,  V =  V (0), V = 1, . . ., .Following [5], the conservation constant  is assumed to be known in the simulations.
We consider two different observation scenarios.In the complete observation (CO) scenario we assume that all species  V , V = 1, . . ., , are observed at regular time intervals of length Δ = 1 and corrupted by Gaussian noise with variance  2 = 4 (assumed to be known).Thus, the observation matrix is of the form M = I  and the observations are given by y  = x  + w  ,  = 1, . . ., .
We remark that the assumption of the observation noise terms (either w  or   ) displaying a Gaussian distribution is not needed to apply neither the pMCMC method nor the NPMC algorithm.Both techniques can be applied as long as the joint likelihood (y  | x  , ) of the state and the parameters can be evaluated, numerically and point-wise (either exactly or approximately; see [22]).In particular, it is possible to plug a Gaussian-process estimate of the likelihood [12] into the NPMC algorithm while running the rest of the algorithm in the same way as described in Section 4 (and still rely on the theoretical guarantees introduced in [22]).

Performance Evaluation.
To evaluate the performance of the pMCMC and the NPMC methods we compute, in all the simulation runs, the mean square error (MSE) attained by the sample set that approximates the marginal posterior of , generated by both schemes.
For the pMCMC method, we compute the MSE of each parameter   based on the -size final output (after removing the burn-in period and thinning), as For the NPMC method, we compute the MSE associated with each parameter   ,  = 1, . . ., , based on the unweighted sample set at the ℓ-th iteration { θ() ℓ }  =1 , ℓ = 1, . . ., , as where  ℓ, is the -th component of the mean vector  ℓ and the variance term  2 ℓ, is the (, ) component of matrix Σ ℓ .We have chosen the MSE as a figure of merit because the observations are generated synthetically and, therefore, we have access to ground truth values for the rate parameters that can be used to compute MSEs.A direct comparison in terms of the posterior distributions (e.g., using the Kullback-Leibler divergence or the total variation distance) is not possible because the posterior distribution of  conditional on the observations y is intractable.
However, the MSE cannot be computed in real problems, where the true parameters   are unknown.To monitor the stability and the efficiency of the two sampling schemes based on the generated sample alone, we resort to the socalled normalized effective sample size (NESS), which is often defined differently for MCMC and IS schemes [23].
In the MCMC literature, the NESS gives the relative size of an i.i.d.(independent and identically distributed) sample with the same variance as the current sample and thus indicates the loss in efficiency due to the use of a Markov chain [23].For pMCMC we compute the NESS from the final chain (after removing the burn-in period and thinning) as where ρ() = corr( (0) ,  () ) is the average autocorrelation function (ACF) at lag .For the computation of the NESS, we truncate  when ρ() < 0.1.
For IS methods, the NESS may be interpreted as the relative size of a sample generated from the target distribution with the same variance as the current sample.Even when high values of the NESS do not guarantee a low approximation error, the NESS is often used as an indicator of the numerical stability of the algorithm [24].It cannot be evaluated exactly but we may compute an approximation of the NESS at each iteration of the NPMC scheme based on the set of TIWs as 2 , ℓ = 1, . . ., .
In the next subsection we present, together with other numerical results, some comparisons involving the NESS for the PMCMC and PMC methods.The comparison should be taken with some caution because, although the interpretation of the NESS is the same for both techniques, the estimators have a different form and they are derived in a different way.Nevertheless, we believe that the NESS is a good numerical indicator of degeneracy for the two schemes, because it directly reflects poor mixing (high correlation) of the Markov chain and concentration of the weight in importance samplers, and we plot it with that purpose.See [25] for more details on the NESS and related statistics.

Simulation Results
. We consider two simulation scenarios in which a different number of parameters is estimated.

Estimation of a Single Rate
Parameter  1 .In this section we present numerical results regarding the approximation of the posterior distribution ( 1 , x |  \1 , y) of a single rate parameter  1 = log  1 and the populations x, when the rest of parameters  \1 = [ 2 , . . .,   ] ⊤ , are assumed known.
We compare the pMCMC and the NPMC methods in this simple scenario in order to show that the two algorithms perform almost equivalently in low dimensional problems.This is in a clear contrast with the results presented in Section 5.4.2, which show that the NPMC method can be more efficient in higher dimensional settings.
We have performed  = 100 independent simulation runs of the pMCMC and the NPMC schemes in the CO and the PO scenarios, with independent population and observation vectors in each simulation  = 1, . . ., .Both in the CO and the PO cases, the same true population trajectories x () were used, but the observations in the CO scenario, y ()  , and in the PO scenario, y ()  , differ.The number of observation times has been set to  = 100 and exactly the same data has been used for NPMC and pMCMC.
Following [5], as a proposal pdf ( ⋆ | ) in the pMCMC scheme we consider a Gaussian random walk update with variance  2 .We have performed simulations with different  values and the best results were obtained with  2 = 1, which are presented here.A total number of  = 10 4 iterations have been run in each simulation.A final sample of size  = 10 3 has been obtained from each Markov chain by discarding a burn-in period of 10 3 samples and thinning the output by a factor of 9.
In the NPMC scheme, the number of iterations has been set to  = 10, the number of samples per iteration is  = 10 3 and the clipping parameter is   = 100.In this way, the computational effort of the two methods is approximately the same, as they both generate  =  = 10 4 samples in the space of .
In Figure 1 the final MSE obtained by the pMCMC (left) and the NPMC (right) algorithms for each simulation run is depicted versus the final NESS, in the CO and the PO scenarios.Note that the NESS is computed differently for pMCMC and NPMC.It can be observed that both algorithms perform similarly in this case, with an equivalent computational cost.Both algorithms attain on average lower MSE values in the CO scenario, as expected.However, the NESS also takes lower values in the CO case, which indicates a worse mixing of the Markov chains in the pMCMC algorithm and also higher degeneracy in the NPMC algorithm.
In Figure 2 the evolution of the NESS (left) and the MSE (right) along the iterations of the NPMC algorithm is represented, for the CO and the PO scenarios.It can be observed that both measures attain a steady value by the 5-th iteration, both in the CO and the PO case, which suggests that actually less iterations are sufficient for this problem.Again, we observe that in the CO scenario both the NESS and the MSE reach lower values.
Figure 3 (left) plots the average ACF of the final pMCMC sample, after removing the burn-in period and thinning the Markov chain by a factor of 9. Particularly high correlations are present in the CO case, leading to a poor NESS.Related to the ACF, the average sample acceptance probability in the pMCMC scheme in the PO scenario is 0.091, while in the CO scenario it is only 0.0034, which means that 910 samples are accepted out of  = 10 4 in the CO case and only 34 in the CO case.
In Figure 3 (right) the final pdf estimates p( 1 |  \1 , y) of the average simulation runs represented as big circles and crosses in Figure 1 are represented in the CO and the PO scenario, for the pMCMC and the NPMC schemes.For the pMCMC method we have built a Gaussian approximation of the posterior density ( 1 |  \1 , y) based on the final MCMC sample { ()  1 }  =1 .For the NPMC method, this approximation corresponds to the proposal pdf for the next iteration  + 1, i.e., p( 1 |  \1 , y) =  +1 ( 1 ) = N( 1 ;  ,1 ,  2 ,1 ), where the mean and variance terms  ,1 and  2 ,1 are computed as in ( * ).It can be observed in Figure 3    It can be observed that, in the PO scenario, the trend of the population of all the species is reasonably identified, even though only a linear combination of the proteins is observed.In the CO scenario the populations of all species are accurately estimated and are not shown for conciseness.Note that the populations of all species are very low, which suggests that the diffusion approximation may perform poorly in this scenario.
The results presented in this section reveal a very similar performance of the two methods in this simple scenario, which suggests that the parameters of pMCMC have been properly tuned.Also in terms of computational complexity pMCMC and NPMC perform very similarly.The execution time per 10 3 samples (one NPMC iteration and 10 3 pMCMC iterations) for the pMCMC scheme is 312 seconds, while for NPMC it is 325 seconds, both in the CO and in the PO cases, on a 3-GHz Intel Core 2 Duo CPU, with 4 GB of RAM.The stochastic forward simulation of the prokaryotic model with the Gillespie algorithm has been implemented in C, and the rest of the code in Matlab R2014b.Out of the execution time per iteration of the NPMC method, only 0.06 second correspond to batch steps 4, 5, and 6, and the rest of operations can be easily parallelized, as previously discussed.
However, the pMCMC method does not allow for an easy parallelization, provides a set of highly correlated samples (especially in the CO scenario) and requires the setting of the proposal variance  2 , the burn-in and the thinning parameters, which may not be straightforward, and determines the performance of the algorithm.On the contrary, the NPMC scheme provides uncorrelated sets of samples at each iteration and does not require the precise fitting of any parameters.Additionally, the computer simulations suggest that the convergence of the NPMC algorithm may be assessed observing the evolution of the NESS, which usually reaches a steady value simultaneously with the MSE.In this section we present simulation results to evaluate the performance of the pMCMC and the NPMC schemes in the approximation of the posterior distribution of the rate parameters and the populations of all species, (, x | y), assuming that all the rate parameters are unknown, again in the CO and the PO scenarios.

Estimation of All the
In this case,  = 200 observation times are assumed for all the simulations.Again,  = 100 independent simulation runs of each algorithm have been performed.The NPMC scheme has been run for  = 15 iterations, with  = 10 3 samples per iteration and clipping parameter   = 100.The pMCMC scheme has been run with  = 15 × 10 3 iterations in each simulation run, a burn-in period of 10 3 iterations, and thinning the output by a factor of 14.With this setup the computational effort is approximately the same in the two schemes.The variance of the random walk proposal has again been set to  2 = 1, since this value seems to yield the best performance.
In Figure 5 the MSE (in logarithmic scale), averaged over the parameters   , attained by the pMCMC (left) and the NPMC (right) algorithms, is represented versus the NESS, in the CO and PO scenarios.Simulation runs which attained a final MSE close to the global average value are indicated with big circles (CO) and squares (PO) on both plots.It can be observed that the pMCMC method performs similarly in both scenarios, in terms of MSE and NESS, yielding poor results in both cases.On the contrary, the NPMC method provides significantly better MSE results in the CO scenario, where a larger amount of information is available.The NPMC method does not present degradation due to the high degeneracy occurring in the CO scenario.
Figure 6 depicts the evolution along the iterations of the NESS (left) and the MSE (right) averaged over  = 100 independent simulation runs for the NPMC algorithm.Both indices converge to a steady value in a low number of iterations also in this complex scenario.As expected, a significantly higher final MSE is attained in the extremely data-poor PO scenario.
In Figure 7 (left) the average ACF attained by the pMCMC in the CO and the PO cases is represented.Even after thinning the output, the sample correlation is extremely high in both scenarios, which leads to a very low NESS.The acceptance rate is also very low and very long chains are required to obtain reasonable results.In the PO scenario ≈ 44 samples are accepted on average in a simulation run of  = 15 × 10 3 samples (acceptance rate 0.0029).In the CO case, only ≈ 23 samples are accepted on average (rate 0.0015).
Figure 7 (right) depicts the final Markov chain provided by the pMCMC method (after removing the burn-in period and thinning the output) in the average simulation run represented with a big square in Figure 5 (left).It can be observed that the mixing of the chain is very poor, with a total number of accepted samples of 46 (close to the average).Many other simulations, in both the PO and the CO scenarios, provide even lower number of accepted samples and thus very inconsistent results.
Figure 8 depicts the final Gaussian approximations of the marginal posteriors (  | y),  = 1, . . ., 8, obtained by the pMCMC and the NPMC methods, in the CO and PO scenarios, for the average simulation runs represented as big circles and squares in Figure 5.We can observe that the NPMC method provides a significantly better approximation of the log-rate parameters in the CO scenario, where a larger amount of data is available, which is also clear from Figure 5 (right).However, the pMCMC on average performs similarly in both scenarios, due to the low efficiency of the pMCMC    sampling scheme when the dimension of the problem (either  or ) increases.
In Table 2 the MSE of each parameter   averaged over  = 100 independent simulation runs is shown, as obtained with the pMCMC and the NPMC schemes, for the CO and the PO experiments.In the CO case, NPMC provides homogeneous results for all parameters.On the contrary, in the PO case, some of the parameters (specially  5 and  6 ) are significantly poorly estimated, presenting a final MSE close to the initial value (which corresponds to the Complexity 13  prior knowledge).The pMCMC scheme presents significantly higher MSE values than NPMC in both observation scenarios and for all parameters   .Figure 9 depicts the population posterior mean x =  (x|y) [x] corresponding to the average simulation runs of the pMCMC and the NPMC methods in the PO scenario, represented as big squares in Figure 5. Again, the NPMC method provides more accurate estimates of the unobserved populations than the pMCMC method, especially for  A .In the CO scenario both methods provide good approximations of the populations of all species.

Conclusion
We have investigated the use of Monte Carlo-based Bayesian computation methods for approximating posterior distributions of the parameters and the species populations in stochastic kinetic models.Specifically, we have applied both particle Markov chain Monte Carlo (pMCMC) and nonlinear population Monte Carlo (NPMC) methods, which rely on different sampling and approximation schemes.Both pMCMC and NPMC methods resort to a sequential Monte Carlo approximation of the posterior populations to estimate the unknown interaction parameters.However, the pMCMC constructs a Markov chain of correlated parameter samples while the NPMC algorithm is based on an importance sampling scheme with nonlinearly transformed weights to avoid degeneracy and the numerical problems typically arising in importance samplers when applied to high dimensional problems.
We have compared the performance of the two schemes for a challenging autoregulatory feedback network with 5 species and 8 unknown interaction rate parameters.Both methods have been applied without resorting to approximations of the underlying jump process (e.g., the diffusion approximation).We have shown how the NPMC algorithm outperforms the pMCMC method and requires only a moderate computational cost.Besides, the proposed method has a set of important features, common to all PMC schemes, as the sample independence, ease of parallelization, and compared to MCMC schemes, there is no need for convergence (burnin) periods.
Let us note that there is room for improvement of both algorithms in practical applications.To be specific, the pMCMC scheme we have assessed is constructed around a random walk Metropolis algorithm, but other MCMC schemes can be adopted; hence it is possible to further optimize its performance.The same is true for the NPMC

Figure 1 :
Figure 1: Performance of the pMCMC (left) and the NPMC (right) methods for the estimation of a unique rate parameter  1 : MSE (in logarithmic scale) obtained from the final output versus the NESS for each simulation run in the CO and the PO scenario.The big circles (CO) and squares (PO) represent simulation runs with a final mean MSE close to the global average.

Figure 2 :Figure 3 :
Figure 2: Evolution along the iterations of the NPMC algorithm of the average NESS (left) and MSE (right) in the CO and PO scenarios, estimating a single parameter  1 .

Figure 4
Figure 4 depicts the posterior mean of the populations, x =  p(x|y) [x], obtained with pMCMC (left) as x = (1/) ∑  =1 x () and with NPMC (right) as x = ∑  =1  ()  x ()  in the PO scenario.The results correspond to the particular simulation runs (different for pMCMC and NPMC) identified with big squares in Figure 1 and whose posterior approximations, p( 1 |  \1 , y), are shown in Figure 3 (right).It can be observed that, in the PO scenario, the trend of the population of all the species is reasonably identified, even though only a linear combination of the proteins is observed.In the CO scenario the populations of all species are accurately estimated and are not shown for conciseness.Note that the populations of all species are very low, which suggests

Figure 4 :
Figure 4: Posterior mean, x =  p(x|y) [x], of the populations obtained in a single simulation run of pMCMC (left) and NPMC (right) in the PO scenario (only a linear combination of the proteins is observed, corrupted by noise).

Figure 5 :Figure 6 :
Figure 5: Performance of the pMCMC (left) and the NPMC (right) methods for the estimation of the whole set of rate parameters : MSE (in logarithmic scale) versus the final NESS, for each simulation run in the CO and the PO scenario.The big circles (CO) and squares (PO) represent simulation runs with a final mean MSE close to the global average.

Figure 7 :
Figure 7: Left.Autocorrelations based on the final sample of size 10 3 of the pMCMC scheme in the CO and the PO scenarios, averaged over  = 100 simulation runs.Right.Markov chain provided by the pMCMC method in the PO scenario, corresponding to the average simulation run depicted with a big square in Figure 5 (left).

Figure 8 :
Figure 8: Marginal posterior pdf approximations of each parameter p(  | y),  = 1, . . ., , attained in an average simulation run by the pMCMC and the NPMC, in the CO and in the PO case.

Figure 9 :
Figure 9: Posterior mean x =  (x|y) [x] of the populations of all species obtained in the average simulation run of the pMCMC (left) and the NPMC (right) schemes, in the PO scenario.

Table 1 :
Final mean and standard deviation (std) values of the MSE for  1 in the CO and PO scenarios, for pMCMC and NPMC.The prior values are included for comparison.
in Table1, together with the MSE corresponding to the prior distribution.

Table 2 :
Final MSE for the parameters   ,  = 1, . . .,  in the CO and PO experiments, averaged over the simulation runs.The last two columns correspond to the mean and standard deviation (std) values of the global MSE (averaged over the parameters).The prior values are included for comparison.