Inferring kinetic parameters of oscillatory gene regulation from single cell time-series data

Gene expression dynamics, such as stochastic oscillations and aperiodic fluctuations, have been associated with cell fate changes in multiple contexts, including development and cancer. Single cell live imaging of protein expression with endogenous reporters is widely used to observe such gene expression dynamics. However, the experimental investigation of regulatory mechanisms underlying the observed dynamics is challenging, since these mechanisms include complex interactions of multiple processes, including transcription, translation and protein degradation. Here, we present a Bayesian method to infer kinetic parameters of oscillatory gene expression regulation using an auto-negative feedback motif with delay. Specifically, we use a delay-adapted nonlinear Kalman filter within a Metropolis-adjusted Langevin algorithm to identify posterior probability distributions. Our method can be applied to time-series data on gene expression from single cells and is able to infer multiple parameters simultaneously. We apply it to published data on murine neural progenitor cells and show that it outperforms alternative methods. We further analyse how parameter uncertainty depends on the duration and time resolution of an imaging experiment, to make experimental design recommendations. This work demonstrates the utility of parameter inference on time course data from single cells and enables new studies on cell fate changes and population heterogeneity.


Introduction
The identification of regulatory mechanisms that control gene expression may have important implications in biological systems. Cell state transitions are a key contributor to many processes in healthy and diseased tissue, and as such they play a major role in development, regeneration and cancer. There is an increasing amount of literature uncovering the relationship between gene expression dynamics, i.e. dynamic changes in protein copy numbers from a single gene, and cell state transitions [1][2][3][4][5][6][7]. For example, Imayoshi et al. [1] used optogenetics to show that oscillatory expression of the transcription factor ASCL1 promotes cell proliferation of mouse neural progenitor cells, whereas sustained expression promotes differentiation. Manning et al. [2] linked aperiodic HES5 protein expression dynamics to murine neural progenitors, and declining oscillatory dynamics to differentiating neurons. Further evidence by Soto et al. and Phillips et al. [3,8] demonstrates the contribution of gene expression noise to tuning oscillatory dynamics and influencing dynamically driven cell state transitions.
Experimentally, the dynamics of gene expression can be studied using a variety of approaches. Accurate measurements of protein dynamics are made through live-imaging of transcription factors in single cells, which provides real-time information on gene regulation and identifies cell-to-cell heterogeneity. This can be achieved through fluorescent fusion reporters [9], where endogenously expressed proteins are attached to fluorescent reporter molecules. Fluorescence microscopy can then be used to obtain time-series data that quantify protein expression levels over time (figure 1a,b; electronic supplementary material, S.10). It may further be possible to translate the fluorescence intensity into exact protein copy numbers [2,3]. Fluorescent protein reporters are widely used to research the role of transcription factor dynamics in cell differentiation events, and have provided dynamic data on gene expression in various contexts, such as neural differentiation, circadian regulation and cell cycle regulation [1,2,[15][16][17][18].
Mechanistically, dynamic gene expression is controlled by multiple processes, including transcriptional pulsing (transcription occurring in pulses or bursts), stochastic fluctuations (due to a limited number of molecules), gene regulatory interactions and translational control. In order to understand how these processes interact to modulate gene expression dynamics, it is necessary to use mathematical models.
Within systems biology, mathematical models are often represented as a collection of gene regulatory motifs [19,20]. One very common motif is the delay-mediated, autorepressive negative feedback loop (figure 1c), which gives rise to oscillations and other dynamic patterns of gene expression that have been observed in somitogenesis, neurogenesis and in cancer cell lines [2,3,7,15,21]. In this motif, a protein represses the transcription of its own gene. In combination with delays that are intrinsic to biological systems, this admits a range of dynamic behaviours, most notably oscillations at the mRNA and protein level. Regulation of gene expression through the auto-negative feedback motif contributes to cell state changes in multiple systems, including neural differentiation [2,22,23].
Despite great advances in the collection of dynamic data on gene expression, and the modelling of these data, challenges remain when calibrating models to data. Even simple mathematical models, such as the auto-negative feedback motif (figure 1c), employ multiple model parameters that correspond to biophysical quantities. For example, the auto-negative feedback motif uses rates of transcription, translation, degradation and other parameters to predict protein and mRNA expression dynamics. Each of these parameters can take a large range of values (figure 1d). For many application areas, parameter inference, i.e. identifying which parameters correspond to a given experimentally obtained data set, remains an open problem, since it requires the 'inverse' of the model, which typically cannot be computed directly. However, solving this problem bears great potential for the research of gene expression dynamics and its links to cell fate. Identifying which parameter changes correspond to observed differences in protein expression dynamics may illuminate the molecular pathways that contribute to cell fate control, and identify new sources of heterogeneity within a cell population.
The need for parameter estimation in biological systems has motivated extensive research in recent years, with a variety of approaches being developed for different types of data [24][25][26][27]. Techniques using Bayesian inference have emerged as a preferred approach due to their ability to quantify uncertainty in the face of noisy data, which is a common feature of biological experiments [28], by representing parameters with  royalsocietypublishing.org/journal/rsif J. R. Soc. Interface 18: 20210393 distributions, rather than point estimates [11,[29][30][31][32][33][34][35]. Placing probability distributions over our parameters, rather than treating them as point estimates, allows us not only to determine the most likely values for each of the parameters, given some data but also to quantify our uncertainty in them. To achieve parameter estimation with uncertainty quantification, Bayesian inference aims to identify the posterior distribution of the model under consideration, denoted pðu j yÞ, where u and y are the model parameters and observed data, respectively. The posterior distribution describes the probability of the model parameters given observed data, and can be calculated using Bayes' rule pðu j yÞ ¼ pðy j uÞpðuÞ pðyÞ / pðuÞpðy j uÞ: ð1:1Þ Here, pðy j uÞ is referred to as the likelihood, and is a measure of the fit of a statistical model to the observed data, given specific values of the model parameters. The prior probability, pðuÞ, is a distribution which outlines one's beliefs in the parameters u before any new data are taken into account. These prior distributions can be informed using published data (figure 1d), as well as physical constraints (e.g. rate constants must be positive).
To visualize the posterior distribution and use it in further analysis it is common to work with computationally generated samples from this distribution. Posterior probabilities may be difficult to compute directly, hindering the efficient generation of these samples [36,37]. Specifically, it may not be possible to calculate posterior probabilities if the likelihood of the model is not available. In these cases, approximate Bayesian computation (ABC) can be used. However, ABC reduces the data to a small number of summary statistics, which inevitably decreases the accuracy of inference [38]. If an expression for the likelihood is available and can be calculated at given parameter points, the calculation of the marginal likelihood π(y) often poses a further challenge in Bayesian inference, since it may require the numerical integration of the likelihood and prior probability. To overcome this challenge, sampling from the exact posterior distribution can be achieved using Markov chain Monte Carlo (MCMC) techniques, such as the Metropolis-Hastings random walk [39].
MCMC methods can produce samples from the posterior distribution pðu j yÞ even if the integration factor π(y) is unknown. In many scenarios, the reconstruction of a posterior distribution using MCMC sampling can be slow, in particular if the parameter space is high-dimensional, if the calculations of the likelihood are computationally expensive, or if parameters are highly correlated within the posterior distribution [40]. In these scenarios, more efficient Hamiltonian Monte Carlo (HMC) or Metropolis-adjusted Langevin algorithm (MALA) methods are preferable [41][42][43]. HMC and MALA algorithms additionally require the gradient of the posterior probability with respect to the model parameters and can result in ordersof-magnitude faster convergence of the sampled distribution to the posterior distribution, especially for high-dimensional distributions or when parameter correlations are present [42,44].
For time-series data specifically, a common approach to calculating the likelihood is the Kalman filter. The Kalman filter is an algorithm which calculates the likelihood of the data at each time point, given a mathematical model of stochastic dynamics, and an observation noise model. It can generally be applied to Markov processes, where dynamic changes over time only depend on the current state of the system, and not past states. The Kalman filter is a powerful method to calculate posterior probabilities if delays are not present in the model [45], and can be extended to estimate the gradient of the likelihood function, making gradient-based sampling of the posterior distribution possible [46].
A number of recent methods focus specifically on time series of gene expression [2,26,[47][48][49][50][51]. For the study of oscillatory gene expression, a wide array of studies discuss timeseries data of protein concentrations, such as in figure 1a,b, as well as the description of these data through the autonegative feedback motif (figure 1c). Despite this, a reliable Bayesian inference method for this popular combination of data and model is still missing. Since the model includes delays, the widely used Kalman filter approaches are not applicable. Recently, Calderazzo et al. [52] have addressed this problem by identifying a method to introduce delays into the Kalman filter [52], indicating that accurate Bayesian inference for the auto-negative feedback motif on timeseries data of gene expression may be possible. However, this approach lacks the ability to calculate gradients of the posterior probability distribution, thus preventing the use of efficient gradient-based sampling methods. Furthermore, while Calderazzo et al. [52] applied their method to a motif containing negative feedback, this method has not yet been applied to the widely used motif in figure 1c, which includes mRNA in addition to protein.
Here, we present a Bayesian inference pipeline that can be used as a non-invasive method to measure kinetic parameters of gene expression emerging from the auto-negative feedback motif using protein expression time course data. We extend the Kalman filtering method presented by Calderazzo et al. [52] by introducing a recursive implementation to calculate the gradient of the likelihood. This enables us to embed the nonlinear delay adapted Kalman filter into a state-of-the-art MALA sampling algorithm. This extension enhances the robustness of the inference, making it more suitable for use in typical experimental settings.
Our method is able to capture multiple kinetic parameters of gene expression simultaneously using time course data from single cells, and outperforms previous approaches. We demonstrate the accuracy of our method on in silico data, provide an example on how the method can be applied to experimental data, and show how the method can be used to obtain experimental design recommendations. This work is paving the way for the use of Bayesian inference methods for the investigation of gene expression dynamics and their links to cell fate.

Methods
In this section, we give an overview of the key components of our method. First, we introduce the mathematical model for the autonegative feedback motif. Then, we discuss how we use a delay adapted nonlinear Kalman filter to approximate the likelihood function. Lastly, we provide details on data processing. Descriptions of our method that require longer derivations, as well as further details on data collection, are provided in the electronic supplementary material. This includes our implementation of two MCMC methods, Metropolis-Hastings random walk (MH) and MALA, as well as our proposed algorithm to compute the gradient of the likelihood function, which is a major technical advancement in this paper. The availability of this gradient royalsocietypublishing.org/journal/rsif J. R. Soc. Interface 18: 20210393 enables the use of a wider range of MCMC samplers, such as MALA, which we use throughout the paper.

The negative feedback chemical reaction network
Here, we consider a widely used model of gene expression, that incorporates knowledge of the auto-repressive negative feedback loop (figure 1c). Our model describes both protein and mRNA expression dynamics over time at the level of a single cell, accounting for transcription and translation, as well as degradation. We include a delay in the model, representing the time taken from the initiation of transcription until the production of a transcript and its removal from the nucleus. We further account for the effect of transcriptional auto-repression, where a high abundance of the target protein inhibits transcription of the mRNA [13,14,53].
Let p(t) and m(t) define the number of protein and mRNA molecules, respectively, at time t for a gene of interest. Gene expression is often subject to stochastic effects due to finite molecule numbers. To reflect this, we model the system with delayed chemical Langevin equations [54][55][56], where ξ m , ξ p denote Gaussian white noise, i.e.
hj m ðt 1 Þj m ðt 2 Þi ¼ dðt The parameters μ m , μ p , α m and α p describe the rate of mRNA degradation, protein degradation, basal transcription rate in the absence of protein, and translation rate, respectively. The transcriptional delay is given by τ, and auto-repression is taken into account via the use of a Hill function f ðpðt À tÞÞ ¼ 1 reducing the rate of transcription for increasing amounts of protein p at time t − τ [57]. Here, τ, the time delay, is the duration of the transcription process. The Hill function (equation 2.3) is close to one when the protein at time t − τ is much less than the repression threshold P 0 and close to zero when the protein at time t − τ is much more than the repression threshold. The steepness of the transition from one to zero can be regulated by the Hill coefficient h. The Hill coefficient reflects the extent of cooperativity between ligand binding sites for the gene of interest [58]. From equations (2.1) and (2.2), we can see that the instantaneous rate of transcription is determined by α m f ( p(t − τ)). This allows us to define an approximation for the average rate of transcription as a T ¼ a m fðpÞ, ð2:4Þ wherep is the average expression of protein. This average expression of protein may be obtained from simulated or experimental data. We simulate the stochastic differential equations (2.1) and (2.2) using the Euler-Maruyama method with a time step Δt = 1 min, which is chosen sufficiently small to ensure numerical accuracy of the scheme. A deterministic version of the model in equations (2.1) and (2.2) was first developed by Monk [13] in order to describe gene expression oscillations of Hes1, p53 and NF-κB, and various versions of the model have since been widely studied [2,8,[12][13][14]55,56]. In particular, when molecular copy numbers of mRNA and protein are low, we expect the rate processes of transcription, translation, and degradation to stochastically vary with time. This effect is accounted for by the noise terms in the chemical Langevin equation (2.1)

The likelihood function can be evaluated through Kalman filtering
The Kalman filter is an algorithm which calculates the likelihood function for linear stochastic differential equations describing time-series data [59]. The Kalman filter evaluates the likelihood of each time-point recording consecutively. The full likelihood is then the product of these individual likelihoods, exploiting the Markov property of the underlying stochastic process. The Kalman filter can be extended to nonlinear dynamical systems by using piecewise-linear Gaussian approximations [60].
Here, we implement a Kalman filter, extended to account for non-linearity and delay, in order to evaluate the likelihood that our observed data results from the model in equations (2.1) and (2.2) at a given parameter combination. This likelihood can then be used to infer model parameters for a given experimentally observed time-series recording of gene expression. The resulting posterior distribution may then represent testable predictions on the biophysical characteristics of the gene of interest, such as transcription, translation and degradation.
Our Kalman filter implementation uses a finer discretization on the time axis than that given by the observation interval. Specifically, we introduce z hidden states between consecutive observations. Introducing such hidden states is common when applying Kalman filters to nonlinear stochastic differential equations. It increases the accuracy of a piece-wise linear Gaussian approximation. In the following, the time variable t will assume integer values numbering all discretization time points, i.e. t = 0, 1, …, nz, where n is the total number of observations.
It is possible to show that the likelihood of a set of observations given specific model parameters can be expressed as [52] where the subscript i · z denotes multiplication of i and z and fðx; m, SÞ ¼ 1 ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ffi detð2pSÞ p exp À 1 2 ðx À mÞ T S À1 ðx À mÞ , ð2:6Þ is the multivariate normal distribution. The true, unobserved state of the system at time t is given by and the relationship between x t and the observed data y t is given by y t = Fx t + ϵ t , where e t N ð0, S e Þ and F is a 1 × 2 matrix. Thus, F and ϵ represent our measurement model. Throughout, we use F = [0, 1], since we aim to apply our method to data on protein expression dynamics, where measurements of mRNA levels are not available. The value S e is called the measurement variance, and describes the observation noise introduced through the experimental measurement process. The variables ρ and P represent the state space mean and statespace variance, respectively. We define y 0:t = [y 0 , y z , y 2z , …, y t ] T , and write r t ¼ E½XðtÞ j y 0:tÀ1 and P t = Cov(X(t), X(t)| y 0:t−1 ). The Kalman filter calculates ρ t , and P t in equation (2.5) using an iterative process with two main steps. At iteration k, the first k observations have been used to infer a probability distribution over the true state of the system X(t) for all discretization time points up to t = kz. This probability distribution is characterized by it's mean r Ã kz ¼ E½XðtÞ j y 0 : kz and covariance P Ã kz ¼ CovðXðtÞ, XðtÞ j y 0 : kz Þ.
royalsocietypublishing.org/journal/rsif J. R. Soc. Interface 18: 20210393 In the Kalman filter prediction step, we then use the model to calculate the predicted probability distribution for protein and mRNA copy numbers at the next observation time point, X((k + 1)z). We use this prediction to evaluate the likelihood of the observed data at the k + 1 observation time point. Before the prediction for the next observation is made, the Kalman filter update step is applied, in which the probability distribution of the state space up to observation k + 1 is updated to take the measurement at t = (k + 1)z into account.
For our update step, we derive an expression for the mean and variance of the state space distribution π(x t−τ:t | y 0:t ), denoted r Ã tÀt : t and P Ã tÀt : t , respectively. That is, the likelihood of our state space estimates from the past time t − τ to the current time, t, given all of our current observations. This is necessary in order to accurately predict the state space distribution at the next observation time point, π(x t+Δt | y 0:t ), as past states can affect future states due to the presence of delays. We provide detailed derivations of our Kalman filter prediction and update steps in electronic supplementary material, S.1.

Implementation of MCMC sampling algorithms
The aim of our inference algorithm is to generate independent samples from the posterior distribution, pðu j yÞ. In this paper, we compare results from two different sampling methods, MH and MALA. The MH algorithm and MALA are two of the most widely used MCMC methods for drawing random samples from a probability distribution. For completeness, we provide their algorithms in electronic supplementary material, S.2 and S.3.
Drawing proposals using MALA requires the calculation of the gradient of the log-posterior UðuÞ, which we outline in electronic supplementary material, S.4. This is achieved by iteratively computing the derivatives of state space mean, ρ t , and state-space variance, P t , with respect to each parameter, as detailed in electronic supplementary material, S.5.

Trends in the data are identified by Gaussian processes
Before applying our inference method we detrend protein expression time series using Gaussian process regression, in order to identify and exclude data that show significant longterm trends [61,62] (see §3.3 for further motivation). Specifically, we make use of a scaled squared exponential Gaussian process combined with white noise, whose kernel is given by where ||x(t) − x(t 0 )|| is the Euclidean distance between x(t) and x(t 0 ), l is the lengthscale, and γ, η ∈ (0, ∞). In the Gaussian process regression, the hyperparameters γ, l and η are found using constrained optimization. The initial value of the lengthscale is 1000 min, and is bounded uniformly in the range (1000 min, 2000 min). The lower bound of this range, 1000 min, was chosen to ensure that detrending does not perturb ultradian dynamics in the data. The upper bound, 2000 min, was chosen sufficiently large to ensure that detrending is not affected by it. The initial value of the parameter γ is the variance of the data, s 2 data , and is restricted by a uniform prior to ð0:1s 2 data , 2s 2 data Þ. The parameter η has initial value 100, and is restricted by a uniform prior to (10 −5 , s 2 data Þ. Here, x(t) and x(t 0 ) represent our protein expression time course data at time t and t 0 respectively. We identified data without a significant long-term trend manually by visual inspection (see §3.3, figure 4) and removed any residual trend before applying our inference method.

Results
Single cells in a seemingly homogeneous population can change cell fate based on gene expression dynamics. The control of gene expression dynamics can be understood with the help of mathematical models, and by fitting these models to experimentally measured data. Here, we analyse our new method for parameter inference on single-cell time-series data of gene expression using the widely used auto-negative feedback motif. We first validate our method by showing the performance of our algorithm on in silico datasets. We then demonstrate the utility of our method by applying it to experimentally measured data and, finally, use our method to analyse how parameter uncertainty may depend on properties of the data, as well as the experimental design.

Sampled posterior distributions agree with analytical derivations for one-dimensional parameter inference
We first test our inference method on in silico data from the forward model of the auto-negative feedback motif ( figure  1c). This is done using chemical Langevin equations, as detailed in §2.1. Specifically, we emulate an in silico imaging experiment by selecting simulated data in sparse intervals of Δt obs mins and mimic measurement noise by adding random perturbations to each observation time point (figure 2a). These perturbations are drawn from a Gaussian distribution with variance S e . Testing the method on in silico data first is beneficial, since ground truth parameter values are known a priori for the generated in silico datasets, and can be compared to the obtained posterior distributions. We start by applying our inference method to simple test cases, where the true values of all but one parameter are known, and only the remaining, unknown, parameter value is inferred (figure 2). This allows us to compare our sampled posterior distributions to the exact likelihood, which can be calculated in these one-dimensional examples using equation equation (2.5). If our inference method is accurate, the sampled posterior distribution should closely match the exact likelihood if the Markov chain has converged (see electronic supplementary material, S.7). We find that this is indeed the case for example in silico datasets (Hill coefficient, transcription rate and transcriptional delay in figure 2b-d, repression threshold and translation rate in electronic supplementary material, figure S1). Additionally, ground truth parameter values lie well within the support of the posterior distribution (figure 2b-d; electronic supplementary material, figure S1, vertical black lines).
Our proposed inference method uses the MALA sampler, which relies on calculating likelihood gradients (see electronic supplementary material, S.4). The comparison with exact calculations in figure 2b-d and electronic supplementary material, figure S1 validates our implementation of MALA, and the associated computations of the likelihood gradient. In order to further test our implementation of MALA, and the associated computations of the likelihood gradients, we compare our results to posterior distributions sampled using the MH algorithm, which does not require gradient calculations. Despite an expected slower convergence of the MH algorithm, this comparison is feasible for one-dimensional posterior distributions, which typically can be well approximated royalsocietypublishing.org/journal/rsif J. R. Soc. Interface 18: 20210393 with a few thousand samples. The sampled means have a relative difference below 0.03%, and the standard deviations fall within 4% of each other (table 1; electronic supplementary  material, table S3). This comparison reveals that posterior distributions from both samplers agree well with each other (figure 2e-f; electronic supplementary material, figure S2), and further validates the implementation of the individual likelihood gradients.

Our method allows for simultaneous inference of multiple model parameters
Having validated the method on one-dimensional posterior distributions, we further test the performance of the method by simultaneously inferring multiple model parameters from a single in silico dataset and comparing the resulting posterior distribution to the ground truth parameter combination ( figure 3a,b). Since we cannot measure convergence of the sampled posterior through comparison to the true posterior distribution in the multi-dimensional case, we rely on typical MCMC convergence diagnostics (electronic supplementary material, S.7).
We choose a dataset that shares characteristics with typically collected time course data from single cells. Specifically, our in silico dataset is of similar length and observation intervals as previously analysed by Manning et al. [2]. In this paper, the degradation rates of protein and mRNA have been measured, so we assume these measurements as known values, leaving five unknown parameter values to infer. The prior distributions were uniform, defined by the range of values given in electronic supplementary material, table S1, and log-uniform for α m and α p (see electronic supplementary material, S.6 for details).
We find that the marginal posterior means, i.e. values of largest probability, all lie within maximally half a standard deviation of the ground truth values (table 2). This indicates that a high degree of accuracy in the inference can be achieved with the amount of data typically gathered from a single cell.
Simultaneous inference of multiple parameters further allows for the investigation of pairwise parameter correlations, using correlation coefficient ν (figure 3c). Pairwise correlations provide crucial information on how posterior distributions can be constrained further. Specifically, the 3.3. Parameter inference on single cell data outperforms previous approaches and may reveal underlying mechanisms for population heterogeneity Next, we seek to evaluate the performance and utility of our method by applying it to experimentally measured data. Specifically, we investigate data on gene expression oscillations in mouse spinal cord neural progenitor cells [2] (see electronic supplementary material, S.10.2), and compare our method to results on parameter inference from ABC (figure 4a). In this previous approach, inference was performed using population-level summary statistics of the collected time-course data. This resulted in posterior distributions with high parameter uncertainty. Specifically, the marginal posterior distributions for the Hill coefficient and the transcriptional delay were close to uniform, illustrating that the provided summary statistics did not contain sufficient information to constrain the uniform prior distribution. The remaining parameters had distinct modes. Nonetheless, parameter uncertainty was high since the spread of the posterior    distribution was comparable to that of the prior [2]. Importantly, this previous approach did not allow for comparison of posterior distributions between single cells. When applying our method to time-series data from fluorescence microscopy experiments, it is necessary to address that our model of the auto-negative feedback motif cannot describe long-term trends in data. Specifically, the model of the autonegative feedback loop considered here is designed to describe ultradian oscillations that typically have periods shorter than 10 h [12,13,55], and cannot describe variations in protein numbers on longer timescales, such as one would expect from a slow up-or downregulation of the gene in the tissue. Hence, we only apply our algorithm to protein expression time series that we expect to be accurately modelled by equations (2.1) and (2.2) by excluding data that show significant long-term trends. In order to identify such time series, we first remove trends from the time series that vary on lengthscales longer than 10 h by using Gaussian process  royalsocietypublishing.org/journal/rsif J. R. Soc. Interface 18: 20210393 regression (see §2.4). Then, we manually identify all time series for which the detrended and raw time series visually agree (figure 4b) and select these for inference.
In order to identify a suitable value for the measurement variance S e we rely on previous estimates [2]. Manning et al. [2] decomposed the measured time series into two contributions, one from a time-varying signal with finite autocorrelation time, and one from a time-varying signal for which consecutive observations are uncorrelated [2]. This second contribution follows an identical distribution as the measurement error in our model, and was estimated to contribute 10% of the total variance across all detrended time series. Hence, we set where N c is the number of considered traces, and s 2 i is the variance for the ith detrended dataset.
We find that our method can identify more accurate posterior distributions than the previous ABC-based approach by using single cell time series of gene expression only (figure 4c versus 4a.). For the single-cell gene expression time course in figure 4b, we find that there is still comparatively high uncertainty on the basal transcription rate (α m in figure 4c), as the support of the marginal posterior distribution reflects that of the uniform prior distribution. However, for all other model parameters that are inferred from this time course, the marginal posterior distributions are narrower than the prior, and than previously identified marginal posterior distributions from ABC (figure 4c).
Having investigated marginal posterior distributions from a single cell, we proceed to analyse to what extent these posterior distributions can vary across multiple cells in the population. Among the experimental data considered here, hierarchical clustering has previously identified two sub-populations (denoted as clusters 1 and 2) which have distinct gene expression dynamics and which also do not have strong long-term trends [2], such as downregulation of gene expression. For clusters 1 and 2, there are 19 and 22 cells, respectively, which we have selected for negligible trends (see §2.4).
We find that the posterior distributions inferred from multiple cells share features that are conserved across all cells and both populations ( figure 4d,e). Specifically, the marginal posterior distributions of the translation rate α p are all larger than exp (2)/min, and biased to larger values. Similarly, the marginal posterior distributions for the delay τ cover the entire range of considered values, and are biased towards smaller values, with most likely values below 10 min. These observations appear to hold true for both clusters considered here, and they highlight that parameter inferences obtained from our method are biologically reproducible, which is a necessary feature to enable the use of the method in practical applications.
By contrast, for the basal transcription rates α m and the Hill coefficient h, marginal posterior distributions vary between individual cells, suggesting that there is an underlying heterogeneity of these parameters across the cell population. However, the remaining parameter uncertainty is too high to reliably identify differences between cells and clusters of cells, raising the question of how imaging protocols may need to be changed in order to achieve lower uncertainty on typical parameter estimates.
3.4. Longer time course data improve accuracy of inference more effectively than more frequent sampling Typically, longer imaging time series can only be collected at the cost of a lower imaging frequency. When designing experiments, it may be desirable to choose an imaging protocol that optimizes the parameter inference towards high accuracy and low uncertainty. However, parameter uncertainty may not only be influenced by the imaging protocol, but also by the bifurcation structure of the underlying dynamical system [63]. Hence, we next analyse how posterior distributions depend on the frequency of sampling, on the length of the imaging interval, and on the position in parameter space. To evaluate the performance of our inference, we investigate the uncertainty using relative uncertainty, RU θ (electronic supplementary material, S.7, equation (S35)), which quantifies the spread of the posterior distribution. We use this metric to quantify the performance of our inference method on a number of synthetic datasets with different lengths and sampling frequencies, and for different locations in parameter space. We choose two locations in parameter space that correspond to two different values of oscillation coherence, thus producing qualitatively different expression dynamics (figure 5a; electronic supplementary material, table S2). The oscillation coherence is a measure of the quality of observed oscillations (electronic supplementary material, S.8). Choosing parameter combinations with different coherence thus ensures that these correspond to different positions within the bifurcation structure of the auto-negative feedback loop [55,64,65].
We first analyse to what extent collecting data for a longer sampling duration may reduce parameter uncertainty ( figure 5b,d). We find that a longer sampling duration can strongly decrease parameter uncertainty. Doubling the length of the time-series reduces the uncertainty by 19% on average for the high coherence parameter combination, and 7.1% on average for the low coherence parameter combination. A tripling of the available data leads to reductions in uncertainty by 29.8% and 18.3% and for high and low coherence, respectively.
By contrast, an increase in sampling frequency leads to a smaller decrease in parameter uncertainty on average (figure 5c,e). Specifically, doubling the amount of data only leads to a decrease by 11.3% in the case of the high coherence parameter combination, and 6.7% in the case of low coherence. A tripling of the available data leads to reductions in uncertainty of 13.2% and 9.1% for low and high coherence, respectively.
We find that analogous conclusions hold true if inference accuracy is analysed (ME θ , electronic supplementary material, S.7, equation (S36)), instead of uncertainty (electronic supplementary material, figure S3). Accuracy increases with longer sampling durations and shorter imaging intervals, and longer sampling durations have a stronger effect than shorter imaging intervals.

Additional measurements of mRNA copy numbers improve estimates of the average transcription rate
In the previous section, we have analysed the impact of changes in the imaging protocol on parameter uncertainty overall. Alternatively, it may be desirable to identify royalsocietypublishing.org/journal/rsif J. R. Soc. Interface 18: 20210393 interventions that reduce uncertainty for particular parameters of interest. For example, an important quantity of interest may be the average rate of transcription of the investigated gene, introduced as α T in equation (2.4). In many in silico examples of our parameter inference, this average rate of transcription α T is poorly inferred, with the mean of the posterior distribution being up to five times larger than the ground truth value. This is for example the case in figure 6a. In this and other examples, the ground truth value lies outside the 85% highest density interval (HDI) of the posterior distribution ( figure 6a-c). Intuitively, one may assume that estimates for the rate of transcription are improved if measurements of mRNA copy numbers, in addition to protein expression dynamics, are considered in the inference. Hence we next assume that, in addition to data on the dynamics of protein expression, measurements of mRNA copy numbers have been conducted on the observed cells. Specifically, we generate in silico data mimicking a single-molecule in situ hybridization (smFISH) experiment. Such smFISH experiments generate distributions of mRNA copy numbers, thus providing a snapshot of mRNA levels across a population at a fixed time point [66,67]. To account for these additional data, we incorporate the observed distribution of mRNA copy numbers into our likelihood function, such that it effectively penalizes parameters for which inferred copy numbers of mRNA are outside the experimentally observed range (see electronic supplementary material, S.9).
We find that this inclusion of mRNA information collected from a cell population leads to more accurate inference of the average transcription rate for single cells, using our algorithm. Observing example datasets from  royalsocietypublishing.org/journal/rsif J. R. Soc. Interface 18: 20210393 observed reduction in uncertainty on the inferred transcription rate is accompanied by a reduction in uncertainty on estimated mRNA copy numbers for individual cells, as inferred by the Kalman filter (see e.g. electronic supplementary material, figure S5A versus S5B). Investigating the uncertainty on the average inferred transcription rate across datasets introduced in figure 5, we observe a reduction in uncertainty of 61.8% for low coherence parameter combinations (figure 6g), and a reduction in uncertainty of 51.2% for high coherence parameter combinations (figure 6h). How does this improved estimate of transcription rate affect overall uncertainty across parameter space, as analysed in figure 5? Counterintuitively, we find that this inclusion of mRNA data into our parameter inference does not reduce overall parameter uncertainty (figure 6i,j). For datasets from the low coherence parameter combination, the relative uncertainty increases by 1.1% on average when mRNA information is included ( figure 6i). For datasets from the high coherence parameter combination, uncertainty decreases slightly (9.1% on average (figure 6j)). Importantly, this reduction of uncertainty is considerably smaller than the reduction of uncertainty observed when longer measurement durations are considered (cf. figure 5d). We make analogous observations as inference accuracy is analysed (electronic supplementary material, figure S4), instead of uncertainty. Inference accuracy is not reduced for high coherence datasets when data on mRNA copy numbers are included, and it is only slightly reduced for some of the low coherence datasets, with the effect being much smaller than the effect of considering longer time course data (cf. electronic supplementary material, figure S3).
The effect that overall uncertainty is not decreased as new data on mRNA copy numbers are included contradicts the intuition that more accurate inference of the average rate of transcription α T will also reduce uncertainty on model parameters regulating α T , such as the basal transcription rate, α m and the repression threshold, P 0 . This effect may be attributed to correlations between these parameters, which we typically observe in our posterior distributions (see figure 3f ). For the dataset in figure 6a, inference of α T is improved upon inclusion of the mRNA. This leads to a tighter coupling between the parameters α m and P 0 (electronic supplementary material, figure S6). However, this constraining of the posterior distribution is not reflected in either of the marginal posterior distributions. Thus, although the inclusion of in silico smFISH data reduces the spread of the posterior distribution overall, uncertainty within the marginal posterior distributions is not reduced, and individual parameter estimates are not improved. An additional factor is that data from smFISH experiments may be considered to reflect the time-averaged mRNA copy number distribution of single cells. Hence, these data might not reduce uncertainty on parameters that are expected to predominantly alter the dynamics rather than the level of expression, such as the transcriptional delay τ and Hill coefficient h. Hence, to better infer these parameters, other strategies, e.g. those discussed in figure 5, may be required.
We conclude that distributions of mRNA copy numbers from population-level measurements can be used to infer average transcription rates for individual cells, using our inference method, which may facilitate the study of transcriptional dynamics in the context of gene expression oscillations. Together with results from figure 5, this illustrates how our method may be used to evaluate the benefit of different experiments in silico, and highlights that our method can be naturally extended to use additional data of different types.

Discussion
The aim of this work was to develop a statistical tool that can be used to infer kinetic parameters of the auto-negative feedback motif, based on typically collected protein expression time-series data from single cells. Importantly, the stochastic nature of the involved processes demanded a method that enables uncertainty quantification. We have achieved our aim by embedding a nonlinear delay-adapted Kalman filter into the MALA sampling algorithm. Our method can generate accurate posterior distributions for the simultaneous inference of multiple parameters of the auto-negative feedback motif. The produced posterior distributions are more informative than those from previous approaches. Since our method can be applied to data from single cells, it enables the investigation of cell-to-cell heterogeneity within cell populations. It can further be used to make experimental design recommendations, which we demonstrated by investigating how parameter uncertainty may depend on the position in parameter space, the sampling frequency, and the length of the collected time-series data. Additionally, our method may be extended to account for the presence of different types of data, for example to improve estimates of the transcription rate for individual cells.
Often, new inference algorithms are presented on a single dataset, and due to necessary tuning requirements of the involved sampling methods, further datasets are not considered. However, it is important to understand the behaviour of a method for a range of datasets if we wish to make experimental design recommendations. It is an achievement of this paper that we provide a method that demonstratively can reliably infer parameters, even when the size and structure of the data are changed significantly.
The mathematical model underlying our method aims to describe the dynamic expression of a protein which is controlled by auto-negative feedback. The success of our inference relies upon how well this model approximates reality. Mathematical models for the oscillatory expression of transcription factors are informed by experimental research [57,68] and have been developed over time [3,8,13,14,64]. Existing model extensions include interactions with other genes or microRNAs [12] and future models could include effects of transcriptional bursting [69]. The simple model used here provides a starting point for developing inference algorithms for further models including nonlinear, stochastic interactions as well as delays, and future validation of experimental predictions can be used to guide data-driven model improvements. To this end, our algorithm may enable model selection to identify gene regulatory network properties, such as interactions between multiple transcription factors.
Chemical Langevin equations such as equations (2.1) and (2.2) approximate the full stochastic dynamics of the system by assuming Gaussian increments. Furthermore, our Kalman filter assumes that measurement errors follow a Gaussian distribution, and are not correlated between consecutive time points. The likelihood calculations within the Kalman filter assume that distributions of protein copy numbers, which are predicted by equations (2.1) and (2.2), can be approximated by Gaussian distributions.
The Gaussian approximation within the chemical Langevin equation can break down when molecule concentrations are very low, resulting in an inaccurate simulation of the dynamics. We do not expect this to be a problem for data analysed in this paper, since protein copy numbers throughout our analysis are around 50 000 protein molecules per nucleus. In other applications, the validity of the chemical Langevin equation may be explicitly tested on samples from the posterior distribution by directly comparing simulated expression time series with those obtained from an exact sampling algorithm, such as the Gillespie algorithm [70]. Similarly, simulations of the chemical Langevin equation can be used to test assumptions on the Gaussianity of the state space made within the Kalman filter. In cases where these assumptions do not hold, alternative inference algorithms, such as particle filter methods, may need to be developed.
For Bayesian inference problems, it is common to use MCMC samplers, such as MH or MALA. We have found that combining a delay-adapted nonlinear Kalman filter and MALA can allow us to infer parameters of the auto-negative feedback motif. This builds on previous approaches which applied a Kalman filter in the context of a different transcriptional feedback motif with delay [52]. MCMC algorithms typically require tuning which may be data specific. We have taken steps to reduce additional input from the user by using MALA, which proposes new samples based on the gradient of the target posterior, hence accounting for geometric properties of parameter space, which can result in faster, more robust performance on some distributions [44]. MALA also has fewer tuning parameters than other algorithms, such as HMC. This allows us to more easily incorporate adaptation into our algorithm [71]. Surprisingly, the MALA sampler did not result in faster convergence than MH on example posteriors from our model (see electronic supplementary material, figure S7). Hence, the added computational cost of calculating likelihood gradients will not be beneficial in all applications, especially since, in our model, gradient calculations increase the computational cost of individual parameter samples by a factor 12. We expect the availability of likelihood gradients to achieve a speed-up in highdimensional problems, where convergence speeds of MALA scale with d 1/3 , rather than d 1 for MH [72], for model dimension d. Note, that more efficient MCMC algorithms can eliminate the problem of tuning entirely [44]. These methods rely on the computation of the Hessian, i.e. the second derivative of the likelihood function. Deriving expressions for the Hessian and investigating the efficiency of the resulting algorithm is thus a potential avenue for future work.
In our applications of the algorithm to experimentally measured data, we detrended the data before applying our inference (figure 4b). Such detrending is commonly used when analysing time series of oscillatory signals [2,73,74]. The detrending removes signal fluctuations from the recorded time series that vary on a much longer timescale than the ultradian oscillations that are being considered. This is necessary, since our model cannot describe such long-term fluctuations. Specifically, independently of the model parameter chosen, simulated traces from the chemical Langevin equation (equations (2.1) and (2.2)) do not include long-term trends. Hence, detrending prevents any bias that the presence of a long-term trend in the data may introduce to the parameter inference. When the algorithm will be applied to data from other transcription factors, we recommend excluding data that contain trends with timescales that are longer than the fluctuations and oscillations that are expected to emerge from the auto-negative feedback, in line with previous detrending recommendations [73,74]. Presumably, variations in the long-term trend of the data stem from a time-dependence of one or multiple of the model parameters due to regulatory processes that our model does not account for. Hence, future improvements to our algorithm may be developed where the temporal variation of model parameters is inferred, instead of one static value.
When applying our inference method to experimental data (figure 4), we relied on previously reported values for the measurement variance, S e , in the dataset that we considered [2]. When users seek to apply our algorithm to other data where previously published values are not available for S e , this parameter can be inferred following the same procedure as reported in Manning et al. [2].
Our algorithm opens up the investigation of research problems, such as cell-to-cell heterogeneity in dynamic gene expression, which would previously not have been accessible. In future applications, our algorithm may provide a non-invasive method to measure the kinetic parameters of the gene of interest, such as the translation and transcription rates, or properties of the gene's promoter, which are described by the repression threshold and Hill coefficient parameters in our model. On experimental datasets where multiple, qualitatively different dynamics are observed [75][76][77], our algorithm may provide insights into the mechanistic origin of these different dynamics, by identifying differences in inferred parameter values between the observed cells or cell populations. In order to classify whether observed differences between posterior distributions are significant, one can construct the posterior distribution describing the difference between parameter values from both cells or populations, and test whether the credible interval of that distribution contains zero [78]. To facilitate such analysis, our method may for example be combined with clustering algorithms on the time series data, such as Gaussian mixture modelling. Since different dynamic patterns of gene expression have been observed in multiple studies of autorepressing transcription factors [2,3], we anticipate that these approaches will spark new scientific investigations.
Throughout, we have assumed that measurements in the form of protein copy numbers per nucleus are available over time. To collect such data, it is necessary to combine endogenous fluorescent reporters with FCS in order to translate reporter intensity values to molecule concentrations. Future versions of our algorithm may be applicable to data where FCS is not available, if one extends our measurement model (F, §2.2) to include an unknown, linear scaling parameter between protein copy numbers and imaged intensity values.
We highlight that the impact of this work is not limited to a single gene in a single model system. The conceptual framework and derivations described here are applicable to any system which can be described by delayed stochastic differential equations, although there may be computational limitations as model sizes increase.
Ethics. Animal experimentation: All animal work was performed under regulations set out by the UK Home Office Legislation under the 1986 United Kingdom Animal Scientific Procedures Act.
Data accessibility. All code and data used in this article are freely available on our github repository, https://github.com/kursawe/ hesdynamics.
Electronic supplementary material is available online [79].