Identification of models of heterogeneous cell populations from population snapshot data

Background Most of the modeling performed in the area of systems biology aims at achieving a quantitative description of the intracellular pathways within a "typical cell". However, in many biologically important situations even clonal cell populations can show a heterogeneous response. These situations require study of cell-to-cell variability and the development of models for heterogeneous cell populations. Results In this paper we consider cell populations in which the dynamics of every single cell is captured by a parameter dependent differential equation. Differences among cells are modeled by differences in parameters which are subject to a probability density. A novel Bayesian approach is presented to infer this probability density from population snapshot data, such as flow cytometric analysis, which do not provide single cell time series data. The presented approach can deal with sparse and noisy measurement data. Furthermore, it is appealing from an application point of view as in contrast to other methods the uncertainty of the resulting parameter distribution can directly be assessed. Conclusions The proposed method is evaluated using artificial experimental data from a model of the tumor necrosis factor signaling network. We demonstrate that the methods are computationally efficient and yield good estimation result even for sparse data sets.


Background
The main goals of research in systems biology are the development of quantitative models of intracellular pathways and the development of tools to support the modeling process. Thereby, most of the available methods and models consider only a single "typical cell" whereas most experimental data used to calibrate the models are obtained using cell population experiments, e.g. western blotting. This yields problems in particular if the studied population shows a large cell-to-cell variability. In such situations inferring a single cell model from cell population data can lead to biologically meaningless results. In order to understand the dynamical behavior of heterogeneous cell populations, it is crucial to develop cell population models, describing the whole population and not only a single individual [1][2][3][4].
This has already been realized by several authors, and it has been shown that stochasticity in biochemical reactions and unequal partitioning of cell material at cell division can lead to complex population dynamics [1][2][3][4][5], such as bimodal distributions. Besides these sources for heterogeneity also genetic and epigenetic differences have to be considered [6].
For the purpose of this paper, heterogeneity in populations is modeled by differences in parameter values and initial conditions of the model describing the single cell dynamics [4,7,8]. The network structure is assumed to be identical in all cells. The distribution of the parameter values within the cell population is described by a multivariate probability density function, which is part of the population model. This modeling framework is well suited for modeling genetic and epigenetic differences among cells [2,4,7].
In the following, the problem of estimating the probability density of the parameters is studied. Therefore, we employ population snapshot data (PSD), which provide single cell measurements at every time instance but which do not provide single cell time series data. A typical experimental setup which provides PSD is flow cytometric analysis. In general, PSD are a common data type in the experimental analysis of biological systems.
So far, there are not many methods available for the estimation of parameter distributions. In pharmacokinetic studies mixed effect models [9] are used frequently. Unfortunately, as in the problem we consider the number of individuals is very large (> 10 4 ) and the amount of information per individual very limited (often only one data point), these methods are computationally too demanding. Furthermore, as in this study we are particularly interested in intracellular signal transduction, also methods which purely focus on the population balance [10][11][12] cannot be employed. In [8,13,14] methods are proposed which can in principle deal with the problem at hand. There, the considered estimation problem has been formulated as a convex optimization problem. Unfortunately, these methods either require an extensive amount of measurement data [8,13], and/or do not allow considering prior knowledge [8,13,14]. Additionally, no methods to evaluate the reliability of the estimates are provided.
In this paper a novel Bayesian approach [15,16] for inferring the parameter density will be introduced. The approach is mainly based on the maximum likelihood methods presented in [13,14], but can deal with sparse and noisy single cell data in addition to realistic measurement noise models. Furthermore, one may directly access the remaining uncertainty of the estimation result and the prediction uncertainties via the calculation of Bayesian confidence intervals [17,18]. It is shown that the posterior distribution can be determined efficiently employing a parameterization of the parameter density in combination with commonly used Markov chain Monte Carlo (MCMC) sampling techniques [19].
To illustrate the properties of the proposed methods, a mathematical model of the tumor necrosis factor (TNF) pathway [20] is analyzed using artificial experimental data.

Problem statement Cell population model
For the purpose of this work we consider intracellular biochemical reaction networks which are modeled by systems of ordinary differential equations. This modeling framework allows to describe metabolic networks as well as signal transduction pathways, as long as spatial effects and stochasticity of the biochemical reactions can be neglected. Mathematically, the dynamic behavior of each single cell is determined by an ordinary differential equation in state space form (θ (i) ) : with state variables x (i) (t) ∈ R n + , output variables y (i) (t) ∈ R m + , and parameters θ (i) ∈ R q + . The vector field F : R n + × R q + → R n is Lipschitz continuous and the func- where c is a proportionality factor. The index i specifies the individual cells within the population. The parameters θ (i) can be kinetic constants, e.g. reaction rates or binding affinities.
Employing the definition of the single cell dynamics (1), a cell population model is given by the collection of N cells, The heterogeneity within this population is modeled by differences in parameter values among individual cells. The parameters are distributed according to the probability density function : This density function is part of the population model specification. The parameter vector of cell i is subject to the probability distribution Note that interactions among individual cells influencing the analyzed pathway are not allowed. This is a restriction but indeed fulfilled in many in vitro lab experiments.

Measurement data and noise
In this paper we consider experimental setups where measurement data are obtained in the form of population snapshots, e.g. via flow cytometry. Population snapshots are taken at different times t j , and the jth snapshot contains measurements for the output y of M j cells. Due to experiment setup, it can be assumed that any cell is present at most in one snapshot.
The cells in the individual snapshots are referenced through index sets: snapshot j contains all cells from the Let the data point for the cell with index i be denoted as where t (i) is the time at which the measurement was taken, andȳ (i) is the measured, noise-distorted output as defined below. If cell i has been measured as part of the jth snapshot, then t (i) = t j . The snapshot j is the set of all data points D i with i I j , as depicted in Figure 1: In the parameter estimation, only the union of all snapshots is considered, and the parameter density function Θ is fitted to all snapshots simultaneously. To this end, we introduce the collection of all data, denoted as in which M = M 1 + M 2 + . . . + M n S is the total number of measured cells.
We emphasize that experimental setups are considered in which cells are not tracked over time. These setups are very common in studies on the population scale. Classical examples for measurement techniques yielding such data are flow cytometric analysis and cytometric fluorescence microscopy. These measurement techniques allow to determine protein concentrations within single cells. As the population is well mixed when the measurement is performed and no cell is measured more than once, the individual single cell measurements D i are independent. This independence of D i 1 and D i 2 (respectivelyȳ (i 1 ) (t (i 1 ) ) andȳ (i 2 ) (t (i 2 ) )), i 1 ≠ i 2 , holds if both cells are measured during one snapshot (t (i 1 ) = t (i 2 ) ) as well as if the cells are measured within different snapshots (t (i 1 ) = t (i 2 ) ).
Like most other experiments also the considered single cell experiments are subject to noise. We consider the noise model in whichȳ (i) is the measured output and y (i) is the actual output from (1). The multiplicative noise is denoted by η × ℝ m and the additive noise s denoted by η + ℝ m . Both, h × and h + are in the following assumed to be vectors of log-normally distributed random variables with probability density functions for all j = {×, +}, k = 1, ..., m. This noise model allows the study of relative and absolute measurement noise and describes the commonly seen noise distributions in biological experiments [21].
From (8) the conditional probability of measuringȳ (i) given y (i) can be determined. As the different output errors are assumed to be independent the conditional probability density is with p(ȳ which is illustrated in Figure 2. For this line integral no explicit solution can be given. In this paper its value is determined numerically using the adaptive Simpson quadrature method [22] implemented in MATLAB.

Problem formulation
As mentioned previously, when studying heterogeneous populations the density of the parameters Θ is in general unknown but necessary to gain an in-depth understanding of the population dynamics. Therefore, we are concerned with the problems: Problem 1 Given the measurement data D, the cell population model Σ pop , and the noise model (8), infer the parameter density Θ*. Figure 1 Population snapshot data of heterogeneous cell population. The single cell measurement (·) is denoted by D i and the snapshot at a particular time instance t j is denoted by D S,j . To the collection of all data we refer as D . Unfortunately, the number of cells considered in a standard lab experiment is on the order of 10 4 to 10 7 . Thus, simulating the population model (2) is computationally expensive. Furthermore, it is hard from a theoretical point of view to deal with ensemble models such as (2). Density-based descriptions of the population dynamics are far more appealing for solving Problem 1 and 2. Therefore, in the next section a density description of the population is introduced.

Density-based modeling of cell populations
To simplify the inference problems on Θ the population description is changed from an ensemble model (2) to a density model [13]. The variables of this new model are no longer states or outputs of the single cells but the density function ϒ of the output, with ϒ : R m + × R × 1 → R + : (y, t, ) → ϒ(y |t, ). and in which Y ⊂ R m + is an arbitrary subset of the output space. Hence, y (i) (t) can be interpreted as a random variable which is distributed according to ϒ(y|t,Θ).
To compute the cell population response ϒ(y|t,Θ) for a given Θ, S independent single cell trajectories y (i) (t) of the cell population (2) are calculated. The parameters for these cells are drawn from Θ and the initial conditions are computed according to x 0 (θ (i) ). This yields in which K(y, y (i) (t), H) is the density of the applied kernel density estimator, with R m + Kdy = 1 . This is illustrated in Figure 3. In this work multivariate log-normal kernels are used to conserve the property that all variables are positive. The positive definite matrix H is used to select the width of the kernel K, and is chosen using the available rule of thumb described in [23]. The selection of H is crucial for achieving a good approximation of the probability density, and depends strongly on S.
Approaches similar to the one we use to approximate ϒ(y|t,Θ) are employed in [13,14], in [8] with a naive density estimator and in [24] in the context of cell migration.

Estimation of the parameter density
In the previous section an approach to determine the output density ϒ within the cell population for a given parameter density Θ is presented. Based on this an approach for estimating Θ from the available data D is developed next.

Bayes' theorem for heterogeneous cell populations
For learning the parameter density from the data Bayes' theorem is used, in which p(Θ) is the prior probability of Θ, p(D| ) is the conditional probability of observing D given Θ, p( |D) is the posterior probability of Θ given D, and p(D) = p(D| )p( )d is the marginal probability of D. As the different single cell measurements are independent (14) can be written as in which p(D i | ) is the conditional probability of observing D i given Θ. Note that due to the independence of D i 1 and D i 2 , for i 1 ≠ i 2 , it is not necessary to distinguish between the cases that (1) the two cells are measured at the same instance (t (i 1 ) = t (i 2 ) ) and that (2) the two cells are measured at different time instances K(y, y (i) (t), H). Hence, merely the conditional probability of each individual single cell measurement has to be determined. For the considered process the p(D i | ) can be determined using the output density ϒ, As this equation cannot be solved explicitly the integral has to be approximated numerically. This could be realized using importance sampling [19], but as drawing a independent sample from ϒ requires knowledge of ϒ in the first place, p(D i | ) is for computational purposes expressed as an integral over θ, , with y(t (i) , θ) being the output at time t (i) for a cell having parameters θ. This reformulation of (16) is possible as ϒ directly depends on Θ. This step simplifies the evaluation of p(D i | ) tremendously.
Based on (15) and (17), the calculation of the posterior probability for a given probability density of the parameters Θ is possible. Unfortunately, the inference problem nevertheless cannot be solved directly, as Θ is element of a function space, and hence further steps are necessary.

Parameterization of parameter density
In order to avoid the infinite dimensional inference problem the parameter density is parameterized. Θ is modeled by a finite weighted sum of multivariate ansatz functions Λ j , The ansatz functions j : R q + → R + are themselves probability densities with R q + j (θ )dθ = 1. The weighting vector is denoted by ϕ ∈ [0, 1] n ϕ , where n is the number of ansatz functions and n ϕ j=1 ϕ j = 1 to guarantee that Θ is a probability density. The weightings j can be interpreted as parameters determining the probability density Θ and are for the remainder also called density parameters.
Note that the method presented in the following is independent of the choice of ansatz functions. Nevertheless, a clever choice of the ansatz functions may improve the approximation of the true parameter density dramatically. In this work, the ansatz functions are chosen to be multivariate Gaussians.
Given a parameterization of Θ , the output density can be simplified to in which ϒ (y|t, Λ j ) is the output density obtained for single cell parameters distributed according to Λ j . This representation of the response is possible as the output density fulfills the superposition principle with respect to the parameter distribution Θ . This reformulation has the advantage that computing the output density for arbitrary density parameters only requires the nonrecurring computation of the responses ϒ (y|t, Λ j ) and summation of those.

Reformulation of posterior probability
Having parameterized Θ and ϒ (y|t, Θ j ) the conditional probability p(D i | ϕ ) may be parameterized and expressed as the weighted sum, in which p(D i | j ) is the conditional probability of observing D i given the parameter density Λ j . As the ansatz functions are predefined the conditional probability p(D i | j ) can be evaluated, This in general high-dimensional integral is approximated employing Monte Carlo integration, yielding in which θ (k) is drawn from Λ j , θ (k)~Λ j , and S c is the total size of the Monte Carlo sample {θ (k) } k . If Λ j allows for an efficient drawing of samples, the computational cost of determining c (i) j is reasonable, requiring S c simulations of the single cell model (1).
Given these precomputed c (i) j 's, which are independent of the density parameters , the conditional probability can be simplified to n ϕ ] T ∈ R n ϕ + and 〈·,·〉 denotes the scalar product. Employing (26) the posterior probability can be written as where the prior probability, enforces the satisfaction of the constraint of Θ being a probability density. Note that for parameter estimation often only the shape of the posteriori probability density is of interest, and not the normalization. Therefore, we only consider in which q( ϕ |D) is the unnormalized posterior probability. Sampling from q( ϕ |D) and p( ϕ |D) will yield the same distribution of sample members. Furthermore, q( ϕ |D) and p( ϕ |D) have the same extrema.

Computation of maximum posterior probability estimate
Given the simplified unnormalized posterior probabilities q( ϕ |D) one important question is which parameter density Θ maximizes q( ϕ |D). This is the most likely parameter density for the measured data and the prior knowledge.
This optimal parameter density ϕ * can be computed by solving the optimization problem in which the two constraints ensure that the obtained density is positive and has integral one. Note that as Λ j is a probability density, ϕ * is positive if all j are positive. Employing this and optimizing -log(q( ϕ |D)) instead of q( ϕ |D), (30) can be simplified to This minimization problem can for concave p(Θ ) be solved rather efficiently, as in such case (31) is a convex optimization problem [25]. For this problem solvers exist which guarantee convergence to the global optimum in polynomial time, e.g the interior point method [25].

Uncertainty of parameter density
In the previous section a method is presented which allows the computation of the maximum posterior probability estimate ϕ * . As measurement data are limited and noise corrupted this estimate will not be the true parameter density. Hence, the uncertainty of the parameter density has to be evaluated. Sampling of posterior probability density In order to analyze the uncertainty of the estimate, a sample of the posterior probability density q( ϕ |D) is generated. This is possible, as the unnormalized posterior probability of a distribution ϕ (k) can be evaluated efficiently given (24) - (28). In this work the sampling is performed with a classical Metropolis-Hastings method [19]. Also Gibbs or slice sampling approaches may be employed. Compared to importance and rejection sampling these methods are well suited as they do not require the selection of an appropriate proposal density, a task which is difficult in this case.
The main point of concern when using MCMC sampling for the problem at hand is that the prior probability and the posterior probability respectively are only non-zero on a (n -1) -dimensional subset of the density parameter space (28). This is due to the fact that the sum over the elements of has to be one for Θ being a probability density. If a standard proposal step was used, the acceptance rate would have been zero.
This problem can be overcome by performing the sampling in the (n -1)-dimensional space, [ϕ 1 , . . . , ϕ n ϕ −1 ] T ∈ R n ϕ −1 , and computing the remaining density parameter via the closing condition ϕ n ϕ = 1 − n ϕ −1 j=1 ϕ j . According to this the update step for consists of two steps: 2. Determine ϕ k+1 n ϕ such that In this work, the reduced proposal density is chosen to be a multivariate normal distribution, This two-step proposal generation procedure is in the following denoted by k+1~T ( k+1 | k ). The proposed density parameter vector k+1 is accepted with probability The distinction of the two cases is hereby crucial to ensure that only probability densities ϕ k+1 which are greater than zero for all θ ∈ R q + are accepted. By combining update and acceptance step one obtains an algorithm which draws a sample of weighting vectors , from the posterior distribution. The number of sample members is thereby S . The pseudo code for the routine is given in Algorithm 1. In particular, the facts that Initialize the Markov Chain with 0 . for k = 1 to S do Given i propose k+1 from proposal density T ( k+1 | k ).
Calculate posterior probability if r <p a ( k+1 | k ) then Accept proposed parameter vector k+1 . else Restore previous parameter vector, k+1 = k . end if end for end Bayesian confidence intervals The sample { ϕ k } S ϕ k=1 generated by Algorithm 1 contains information about the shape of the posterior density p( ϕ |D). This information can be employed to determine the Bayesian confidence intervals, also called credible intervals.
In this work an approach is presented which employs the percentile method [17] to analyze the uncertainty of Θ . The 100a-th percentile of a random variable r is the valuer (α) below which 100a % of the observations fall. Accordingly, the 100(1-a)-th percentile interval of r is defined as [r (α/2) ,r (1−α/2) ]. The Bayesian confidence interval is frequently defined as the 95-th percentile interval [18]. Thus, the parameter is contained in [r (0.025) ,r (0.975) ] with a probability of 95% given the measurement data and the prior knowledge.
For the problem of estimating parameter densities, the 100a-th percentile is not simply a number but a function. As we are interested in the confidence intervals of Θ (θ), the percentiles are defined point-wise for every θ. The 100a-th percentile of Θ (θ) is thus the function Consequently, the 100(1-a)% Bayesian confidence interval CI (1−α) (θ ) of Θ (θ) is defined as As the sample { ϕ k } S ϕ k=1 is given, an approximation of (α/2) ϕ and¯ (1−α/2) ϕ can be obtained by studying the percentiles of the sample [26]. For instance the nearest rank method or linear interpolation between closest ranks can be used to determine¯ (α) ϕ .

Predictions of output density
As the parameter density is not known precisely, also the model predictions show uncertainties. To evaluate the reliability of the population model and its predictive power, these prediction uncertainties have to be quantified. Therefore, the Bayesian confidence interval of the prediction around the output density with the highest a posteriori probability density, in which the 100a-percentileῩ (α) (y|t, ϕ ) of the predicted out put ϒ(y|t,Θ ) is defined as ∀y ∈ R m + , ∀t : Pr(ϒ(y|t, ϕ ) ≤Ῡ (α) (y|t, ϕ )) = α. (39) ComputingῩ (α) (y|t, ϕ ) for an experiment is a three step procedure. At first, the outputs ϒ(y|t,Λ j ) (12) are computed. Given ϒ(y|t,Λ j ) and the sample from the pos- , a sample from the predicted output density p(ϒ|D) is given by This sample can be used to approximate the prediction confidence interval CI (1−α) ϒ (y|t) . As the population model has to be simulated only n times, this calculation is computationally tractable.
To sum up, in this section a method for the estimation of parameter distributions in heterogeneous cell populations from population data has been presented. It has been shown that the optimal value as well as the Bayesian confidence intervals can be computed efficiently employing a parameterization of the parameter density. Also a method to determine prediction uncertainties has been presented. This allows an in-depth analysis of the reliability of the model. A summary of the procedure is shown in Figure 4.

Results and discussion
To illustrate the properties of the proposed methods, artificial measurement data of a cell population responding to a tumor necrosis factor (TNF) stimulus will be analyzed. For illustration purposes, we consider a case where only one parameter is distributed in a first step. In a second step, we show that the method is also applicable in the case of multi-parametric heterogeneity.
In multicellular organisms, the removal of infected, malfunctioning, or no longer needed cells is an important issue. Therefore, multicellular organisms developed different mechanisms to externally enforce cell death. Thereby the signaling molecule TNF is one of the key players.
TNF can bind to specific death receptors in the cell membrane and is able to induce programmed cell death, also called apoptosis, via the activation of the caspase cascade. On the other hand, it promotes cell survival via the inflammatory response, specifically activation of the NF-B pathway [27]. The proportion of the activation of these two signaling pathways decides about the fate of the single cell. In the following a simple model for the caspase and NF-B activation is studied which reproduces the main features of the single cell response to a TNF stimulus.

Model of TNF signaling pathway
The model considered in this study has been introduced in [20] and is based on known activating and inhibitory interactions among key signaling proteins of the TNF pathway. A schematic is shown in Figure 5. Besides active caspase 8 (C8a) and active caspase 3 (C3a), the nuclear transcription factor B (NF-B) and its inhibitor I-B are considered in the model. The model is given by the ODE systeṁ The state variables x i , i = 1, ..., 4 denote the relative activities of the signaling proteins C8a, C3a, NF-B and I-B, in this order. The functions act j (x i ) and inh j (x i ) represent activating and inhibiting interactions, respectively. They are given by and The parameters a j and b j are activation and inhibition thresholds, respectively, and take values between 0 and 1. The external TNF stimulus is denoted by u. Initial conditions of the single cells are the steady states with C3a = 0 for u = 0. All nominal parameter values are given in Table 1.
It is known from experiments that the cellular response to a TNF stimulus is highly heterogeneous within a clonal cell population. Some cells die, others survive. The reasons for this heterogeneous behavior are unclear, but of great interest for biological research in TNF signaling, e.g. the use of TNF or related molecules as anti-cancer agent.
To understand the biological process at the physiological and biochemical level it is crucial to consider this cellular heterogeneity, using for example cell population modeling. Here, we model heterogeneity at the cell level via differences in the parameters b 3 and a 4 . The parameter b 3 describes the inhibitory effect of NF-B via the C3a inhibitor XIAP onto the C3 activity, and the    Studies showed that these two interactions show large cell-to-cell variability [4,7,28].

Univariate parameter density
For a first evaluation of the proposed method an artificial experimental setup is considered in which the caspase 3 activity is measured at four different time instances during a TNF stimulus, At each time instance the C3a concentration in 150 cells is determined, y = x 2 , with measurement noise according to (7), where μ × = 0, s × = 0.1, μ + = log(0.05), and s + = 0.3. This corresponds to an average noise level of about 15%. The generated artificial experimental data for a bimodal distribution in b 3 are depicted in Figure 6.
As ansatz functions Λ j for the estimation, we use n = 15 truncated Gaussians where σ = 1 30 = and s j such that in which p b is the probability density of the beta-distribution. The parameter a j and b j are selected such that p b ( j |a j |b j ) has its extremum at 1 s i and convariance s 2 . The distribution of a sample { k } k drawn from this prior is shown in Figures 7 and 8. Note that the prior does not enforce a trend to smaller or larger parameter values of b 3 . Furthermore, it does not enforce a trend to unimodal or bimodal distributions Θ (b 3 ). Such distribution properties shall be inferred from the data. Given the ansatz functions Λ j (45) the conditional probabilities c (i) j of observing D i = (y (i) , t (i) ) are determined using importance sampling, according to (25). This computation takes about three minutes, on a standard personal computer using a single CPU. Thereby, 32% of the computation time are required for the simulation of the individual cells y(t, θ (j) ) for individual parameter values θ (j) , and 59% for the evaluation of the conditional probability p(ȳ (i) |y(t (i) , θ (j) )). The rest is spent on pre-and post-processing. Subsequently, MCMC sampling is performed to obtain a sample {ϕ k } S ϕ k=1 of the prior (with s 2 = 0.05), of the conditional, and of the posterior probability distribution. The sample has S = 10 6 members and the generation takes only four minutes. The computation is very fast, as the proposed approach simplified the evaluation of the conditional probability to a matrix vector multiplication. Note, that all steps during the computation of the conditional probabilities and the MCMC sampling can be parallelized, yielding a tremendous speed-up for more complex models.
The results of the sampling are illustrated in Figure 7 using parallel coordinates [29]. From this representation of {ϕ k } S ϕ k=1 it can be seen that after the learning processes most of the density parameters still show large uncertainties. The uncertainty in the posterior distribution is a lot smaller than the uncertainty in the likelihood function, due to the stabilization via the prior. Note that the visualization also uncovers pronounced correlations between some parameters, e.g. 10 and 11 are negatively correlated for ϕ k ∼ p(D| ϕ ). This indicates that the model of the density of b 3 is over-parameterized with respect to the data. Thus, the number of ansatz functions could be reduced while still achieving a good fit.
To analyze the uncertainty of Θ in more detail the sample { ϕ k } S ϕ k=1 is employed to determine the 80%, 90%, 95%, and 99% Bayesian confidence intervals. The results are depicted in Figure 8. It can be seen that the confidence intervals for some values of b 3 are rather small, indicating that the data contain many information about these regions. Unfortunately, in particular for b 3 > 0.6 the confidence intervals are very wide showing that the parameter density in this area cannot be inferred precisely. But, although the amount of data is limited and the uncertainty with single i 's may be large, the posterior distribution of Θ already shows key properties of the true parameter density, e.g. the bimodal shape, which has not been provided as prior information. This bimodal shape is also seen in the likelihood function, but there the uncertainties are larger than in the posterior probability distribution.
Besides the uncertainty of Θ also the predictive power of the model can be evaluated. This is exemplified by studying the confidence interval ofῩ([C3a]|t, ϕ ) and ϒ([NF -κB]|t, ϕ ) for the previously considered experimental setup. The bar indicates that the distribution of the noise corrupted outputȳ instead of the true output y is considered. This allows the direct comparison of the prediction with the data. The predictions are shown in Figure 9.
It is obvious that, although the parameter distributions show large uncertainties, the predictions are rather certain. This is indicated by tight confidence intervals. Furthermore, the mean predictions E[Ῡ] and the predictions with highest posterior probabilityῩ * agree well with the true output distributionῩ true , for measured output C3a and predicted output NF-B. The small prediction uncertainties can be explained to be sloppiness [30] of the density parameters i parametrizing the distribution of b 3 . A detailed analysis indicates (not shown here) that the number of ansatz function can be decreased, still ensuring a good approximation of the distribution of b 3 .

Multivariate parameter density
In many biological systems several cellular parameters are heterogeneous and different cellular concentrations can be measured [7]. Therefore, we show in this section that the proposed method can also be employed to estimated multivariate parameter densities from multidimensional outputs. Furthermore, the influence of the choice of the prior on the estimation result is analyzed.
To perform this study we considered the same experimental setup as above. The only difference is that two concentrations are measured, C3a and NF-B, y = [x 2 , x 3 ] T . The considered artificial experimental data of 10 4 cells are depicted in Figure 10. The ansatz function for Θ are n = 100 truncated multivariate Gaussians equivalently to (45). The covariance matrix is 0.06 2 · I 2 and the extrema are equidistantly distributed on a regular 2-dimensional grid which is aligned with the axes.
80% confidence intervals ofῩ 90% confidence intervals ofῩ 95% confidence intervals ofῩ 99% confidence intervals ofῩ Given this setup, the convergence rate is studied in terms of the integrated mean square error, of true distribution and distribution with highest posterior probability ϕ * . The IMSE is computed for amounts of measured cells per time instance and different priors. The priors are thereby again beta-distributions (46). The extrema ext are chosen as in the last section such that the prior is flat. The standard deviation on the other hand is reduced step-wise from s = 0.285 (completely uninformative as almost uniform on the feasible interval ϕ ∈ [0, 1] n ϕ to s = 0.001 (very informative). Given this requirements, the values a i and b i of the prior (46) are determined. The result for different numbers of measured cells sampled from the available data set is shown in Figure 11. Note that the IMSE is a stochastic quantity as the selection of measured cells is a stochastic processes and hence also the estimated density ϕ * is stochastic. To account for this stochasticity, several realizations are performed and the mean is computed.
From Figure 11 it becomes clear that the IMSE strongly depends on both, amount of data and informativeness of prior. For uninformative priors, the outcome for little data is highly uncertain and the IMSE is large and shows large variations. On the contrary, if the prior is very informative but wrong, the number of measurement data required to obtain a good estimate is tremendous. For the right choice of s, one observes a fast convergence of the ϕ * to Θ true , as shown in Figure 12, and only little variation for small amounts of data. Hence, the usage of prior knowledge, even if it is only partially correct, yields for more stable estimates and faster convergence. Furthermore, this study suggests that the typical number of cells measured by flow cytometry (10 4 ) is informative enough to infer key features of population heterogeneity.

Conclusions
In this paper a Bayesian approach for inferring the parameter density for heterogenous cell populations with single cell resolution from population data is presented. We show that the proposed model can deal with limited and noisy measurement data as well as realistic noise models. The method utilizes a parameterization of the parameter density which, in combination with a reformulation of the conditional probability, allows a computationally efficient evaluation of the posterior probability. Compared to other methods for cell populations this approach does not rely on the approximation of the measured population response using density estimators.
For sampling from the posterior probability the Metropolis-Hastings algorithm is used. Here it has been adapted to be applicable to the considered constraint problem. Using this sampling strategy a sample from the posterior probability density is determined. This sample is employed to compute Bayesian confidence intervals for the parameter distribution, as well as for the model predictions. Also summary statistics like mean parameter density and mean predicted output density can easily be determined. For concave prior distributions we could even show that the calculation of the parameter density for the highest posterior probability is a convex problem.
The properties of the proposed scheme are evaluated using artificial data of a TNF signal transduction model. It could be shown that the proposed method yields good estimation results for a realistic experimental setup. Furthermore, although the remaining uncertainties are large, the predictions showg only small uncertainty indicating sloppiness of parameters.
Concerning the choice of the prior distribution it could be shown that the regularizing effect is beneficial if only little data is available. On the other hand, if the amount of available data increases, informative but not carefully chosen priors slow down the convergence. 1 Institute for Systems Theory and Automatic Control, University of Stuttgart, Germany. 2 Institute of Cell Biology and Immunology, University of Stuttgart, Germany.