Filtering and inference for stochastic oscillators with distributed delays

Abstract Motivation The time evolution of molecular species involved in biochemical reaction networks often arises from complex stochastic processes involving many species and reaction events. Inference for such systems is profoundly challenged by the relative sparseness of experimental data, as measurements are often limited to a small subset of the participating species measured at discrete time points. The need for model reduction can be realistically achieved for oscillatory dynamics resulting from negative translational and transcriptional feedback loops by the introduction of probabilistic time-delays. Although this approach yields a simplified model, inference is challenging and subject to ongoing research. The linear noise approximation (LNA) has recently been proposed to address such systems in stochastic form and will be exploited here. Results We develop a novel filtering approach for the LNA in stochastic systems with distributed delays, which allows the parameter values and unobserved states of a stochastic negative feedback model to be inferred from univariate time-series data. The performance of the methods is tested for simulated data. Results are obtained for real data when the model is fitted to imaging data on Cry1, a key gene involved in the mammalian central circadian clock, observed via a luciferase reporter construct in a mouse suprachiasmatic nucleus. Availability and implementation Programmes are written in MATLAB and Statistics Toolbox Release 2016 b, The MathWorks, Inc., Natick, Massachusetts, USA. Sample code and Cry1 data are available on GitHub https://github.com/scalderazzo/FLNADD. Supplementary information Supplementary data are available at Bioinformatics online.


Introduction
The time evolution of molecular counts of chemical species in a reaction network is formally described by a Markov jump process (MJP). Interest usually lies in inferring the reaction rates and the unobserved molecule counts of the network's species, given experimental data observed at discrete time-intervals for some or all of the reactants. Biochemical reaction networks are complex and often involve many reactions and chemical species. This is in stark contrast to the fact that only small subsets of species can be observed, albeit indirectly through measurement processes involving e.g. fluorescent reporter protein imaging. Hence, model reductions of the full reaction network towards less parameter-intensive approaches that can feasibly be estimated from the experimental data are of considerable importance.
The introduction of time-delays, in fixed or distributed form, can approximate the network's species and reaction events which are not of primary interest and thus reduce model complexity (see e.g. Ananthasubramaniam et al., 2014;Heron et al., 2007;Koren ci c et al., 2012;Monk, 2003). Furthermore, it is well known that delays, as well as negative feedback, non-linearity and appropriate time-scales for the network's reactions are necessary for the onset of oscillations in mathematical models (Cao et al., 2016;Nová k and Tyson, 2008). Delays can account for non-observable intermediate species or for time-intervals during which there are no measurements relating to the products of reactions. From the mathematical modelling point of view, this implies that the Markov property holds on a longer time-interval which extends up to an assumed maximum delay time.
A further main challenge in this field is due to the intractability of the transition densities of the MJP. Approximations in continuous state-space are available when suitable assumptions on the system size hold, and in particular, inferential applications have been considered for the chemical Langevin equation (CLE) (Golightly andWilkinson, 2005, 2010;Heron et al., 2007) and the linear noise approximation (LNA) (Fearnhead et al., 2014;Finkenstädt et al., 2013;Komorowski et al., 2009;Stathopoulos and Girolami, 2013). For a recent detailed review of modelling and inferential methods for stochastic biochemical systems see Schnoerr et al. (2017). The CLE aims at matching the infinitesimal mean and variance of the original MJP, while the LNA performs a linearization which leads to tractable Gaussian transition densities. Fearnhead et al. (2014) find that in cases where the dynamics are non-linear, parameter inference via the LNA is improved by filtering, i.e. by replacing the mean and variance estimates of the process with their predicted value given the past observations.
The LNA for models with distributed delays has recently been derived by Brett and Galla (2013), but has not been suggested for filtering and inferential purposes, while it appears that a filtering methodology so far has only been suggested for fixed delays (Gopalakrishnan et al., 2011). Here we develop a novel filtering algorithm that is based on the LNA and is generally applicable to stochastic systems comprising distributed delays, with a focus on dynamic state-space models for chemical reaction networks.
We first introduce biochemical reaction networks, their exact mathematical description and the CLE and LNA approximations. We then consider their CLE and LNA approximation in the broader framework of state-space models, and illustrate the LNA updating algorithm in the context of non-delayed systems. We present our novel extension, where the methodology is tested on simulated data and then applied to experimental data. Here we focus on providing an example of a stochastic transcriptional translational feedback loop (TTFL) to describe the expression dynamics of the circadian gene Cry1. The methodology is used to infer parameters of the TTFL of Cry1 from experimental time-series data observed in a mouse suprachiasmatic nucleus (SCN) tissue (Brancaccio et al., 2013).

Reaction networks and their approximations
Consider a reaction network defined by a set of chemical species participating in a set of chemical reactions. Let p and r denote the total number of species and reactions, respectively. The state of the process X(t) can be defined as (Anderson and Kurtz, 2011) where the vector of random variables XðtÞ ¼ ðX 1 ðtÞ; . . . ; X p ðtÞÞ T defines the number of molecules at time t of the species participating in the reaction network, UðÁÞ is a vector of r independent inhomogeneous Poisson processes counting the occurrence of the r reactions and its argument defines its mean, S is the p Â r stoichiometry matrix whose elements s i;k denote the difference in the number of molecules of the i-th species produced and consumed by the k-th reaction. The vector hðXðsÞ; cÞ ¼ ðh 1 ðXðsÞ; c 1 Þ; . . . ; h r ðXðsÞ; c r ÞÞ T contains the reaction hazards where c k denotes the rate constant of the k-th reaction. The resulting process X(t) is a MJP in continuous time and discrete state-space, and its Kolmogorov's forward equation, or chemical master equation (CME) (Gillespie, 1992), provides an exact description of the system. As it can only be explicitly solved in rare cases (see the review in McQuarrie, 1967), approximations help to overcome the need for highly computationally demanding inferential procedures. The CLE, or diffusion approximation, exploits the multivariate normal approximation of the vector of independent Poisson random variables U. The approximation holds under suitable assumptions concerning the number of reaction occurrences, and leads to the stochastic differential equation form (Anderson and Kurtz, 2011) dXðtÞ ¼ gðXðtÞ; cÞdt þ ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi AðXðtÞ; cÞ p dBðtÞ; (1) where dB(t) is a p-dimensional Wiener process, and gðXðtÞ; cÞ ¼ ShðXðtÞ; cÞ AðXðtÞ; cÞ ¼ S diagfhðXðtÞ; cÞgS T : The drawback of the CLE approximation is that explicit solutions for the transition densities are again rare (see e.g. Wilkinson, 2012).
Gaussian transition densities can be obtained with the LNA (Kurtz, 1972;Van Kampen, 1992) which also exploits the approximate normality of the underlying Poisson process, but replaces the hazard function by its first order Taylor expansion about the deterministic limit of the process, thus effectively eliminating non-linearities. Assuming Xð0Þ $ N ðqð0Þ; Pð0ÞÞ, where q denotes the deterministic limit, it can be shown that where J g is the Jacobian of gðÁÞ (see e.g. Anderson and Kurtz, 2011, for a rigorous derivation). The LNA matches exactly the first two moments provided by the CME for systems including reactions up to the first order, as well as for a subset of systems including secondorder reactions (Grima, 2015). In the other cases, the LNA also approximates the mean and variance, as they depend on the higher moments (see e.g. Gillespie and Golightly, 2016;Grima, 2012Grima, , 2015.

Filtering and inference for the LNA
Inference from experimental data requires the formulation of a measurement equation. If light intensities generated by fluorescent reporter constructs are recorded, we may assume that observations are proportional to the unobserved molecular abundance and subject to additive Gaussian measurement error with unknown variance (see Finkenstädt et al., 2013;Folia and Rattray, 2017;Hey et al., 2015;Komorowski et al., 2009) Filtering and inference for stochastic oscillators with distributed delays where t $ MVNð0; R Þ, F is a q Â p matrix (where q is the dimension of Y and p the dimension of X). The Markov structure of the process together with the measurement process in (4) formally leads to a statespace model, which provides a framework for inference. Let w denote the set of all parameters, namely the vector of reaction rates c, the variance matrix R and the scale parameters in F. For random variables X and Y, let pðxÞ define the probability density function of X, and pðxjyÞ the conditional probability density function of X given Y.
where D t is the time-interval of the observations, d t some suitable discretization interval for the unobserved states dynamics and T is the total observation time. Inference typically involves estimating the model parameters w and the hidden states X 0:T . According to Bayes' theorem, their joint posterior distribution is pðw; x 0:T jy 0:T Þ ¼ pðy 0:T jx 0:T ; wÞpðx 0:T jwÞpðwÞ pðy 0:T Þ : If the focus of inference lies on estimating the model parameters, their marginal posterior distribution pðwjy 0:T Þ can be obtained by integrating out the hidden states, while inference on the hidden states can be performed by computing the smoothing density pðx 0:T jy 0:T ; wÞ (see e.g. Doucet and Johansen, 2009;Wilkinson, 2012 and the Supplementary Section 1). In Gaussian dynamic linear models, all densities involved are Gaussian and the marginal likelihood pðy 0:T jwÞ is available in a closed form provided by the wellknown Kalman filtering methodology (Kalman, 1960). However, in reaction networks leading to the CLE representation of (1), Gaussianity is lost due to the dependence of the diffusion term AðÁÞ on the system state. Furthermore, when the hazard function hðÁÞ is non-linear, i.e. for reactions of order greater than one, only an approximate estimate of the mean and variance of the process is generally available.
In the filtering literature, non-linear and/or non-Gaussian systems have been approached with a variety of methods, e.g. the extended and second-order extended Kalman filter (see e.g. Jazwinski, 2007;Sä rkkä , 2013;Singer, 2002), the unscented Kalman filter (Julier and Uhlmann, 1997) and particle filters (see e.g. Andrieu et al., 2010;Doucet and Johansen, 2009;Golightly and Wilkinson, 2011). Here we focus on the extended Kalman filter for time-continuous unobserved states, namely the extended Kalman-Bucy filter (EKBF) (see e.g. Kulikov and Kulikova, 2014;Singer, 2002 and references therein). Such choice can be motivated by its link to the LNA as follows.
Assume, for ease of notation, that the parameters w are set to some fixed value. In practice, parameter estimation can be performed via e.g. a Bayesian Markov chain Monte Carlo (MCMC) algorithm. The EKBF performs an update (restart) of the LNA mean and variance estimates at each observation time-point. Suppose qðtÞ ¼ E½XðtÞjy 0:t and PðtÞ ¼ Var½XðtÞjy 0:t in (2) and (3), respectively. Assuming that pðx t jy 0:t Þ is approximately Gaussian, linearity of the LNA and of the measurement process implies that pðx tþDt jy 0:t Þ and pðy tþDt jy 0:t Þ are Gaussian, the latter with mean Fq tþDt , and variance FP tþDt F T þ R . Estimates of the mean and variance of pðx tþDt jy 0:tþDt Þ are obtained by a Kalman filtering step where C ¼ P tþDt F T ðFP tþDt F T þ R Þ À1 is the adaptive coefficient. The current mean and variance estimates q tþDt and P tþDt are then replaced by their optimal estimates q Ã tþDt and P Ã tþDt , and computations are iterated up to T. The update Equations (5) and (6) can be derived from approximate Gaussianity of the joint distribution pðx tþDt ; y tþDt jy 0:t Þ, and the subsequent conditioning upon Y tþDt . For further details see e.g. Jazwinski (2007), Särkkä (2013) and the Supplementary Section 2.

Extension to systems with distributed delays
Distributed delays can be introduced in the hidden state equation to account for a dependency on a (finite or infinite) collection of past states that are arbitrarily distant in time.
We focus on systems where a set of intermediate transformations of the species of interest can be well approximated by the Goodwin oscillator ordinary differential equations (ODEs) (Goodwin, 1965), which can be explicitly solved. The resulting system is a reduced reaction network, in which the hazard accounting for the transcriptional process receives as an input its past expression level, weighted according to the delay distribution. As a Hill function is assumed for transcription, this leads to a delay entering non-linearly in the transcriptional hazard. A formal derivation of the model is provided in the Supplementary Section 3. Further motivation or pursuing this modelling framework is provided in Section 3.1.
Brett and Galla (2013) derive a CLE and a (non-restarted) LNA for a modelling framework in which the delays are included linearly in the reaction hazards. Such scenario may arise under the assumption that the product of a reaction is not available for a non-negligible random time-interval, as well as under alternative modelling assumptions e.g. for an alternative target state in the Goodwin oscillator system.
To derive the LNA approach for our modelling scenario, we follow an approach analogous to Brett and Galla (2013), and divide the reactions into two groups: the set of w reactions with distributed delays, with stoichiometry S d of dimension p Â w and hazard vector h d of length w, and the set of z reactions not involving delayed species, with stoichiometry matrix S d of dimension p Â z, and hazard vector h d of length z. We have S ¼ ½S d jS d and for the hazards where KðÁÞ ¼ ½K 1 ðÁÞ; . . . ; K p ðÁÞ T is a vector of delay densities, one for each X i . If X i is not delayed, K i has a point mass density at 0. The hazard vector of the non-delayed reactions is simply h d ðXðtÞÞ ¼ ½h wþ1 ðXðtÞÞ; . . . ; h wþz ðXðtÞÞ T . We have dropped the dependence on the reaction rates c for ease of notation. It is practical to assume that the delay distribution is truncated at some maximum delay time s m . Given the knowledge of the past states of the system, the hazards are deterministically defined and the CLE for the reaction network is where dB(t) is as usual a p-dimensional Wiener process, and The full state-space model is hence given by the continuous approximation for the dynamics of the unobserved molecule counts in (7) together with the measurement Equation (4).
The derivation of the filtering algorithm poses two difficulties: (i) the linearization of the functions incorporating the delayed species, and (ii) the need to update, at each observation time t, all estimates in the past until time t À s m . Problem (i) is addressed by a first order Taylor expansion of gðXðtÞÞ and lðXðtÞÞ about qðtÞ, and of f Ð t tÀsm XðsÞ Á Kðt À sÞds and q Ð t tÀsm XðsÞ Á Kðt À sÞds about Ð t tÀsm qðsÞ Á Kðt À sÞds. The expansions are plugged into the mean and variance equations, thus allowing their propagation under linearity. A detailed derivation of the mean and variance equations is provided in the Supplementary Section 4. Optimal estimates of qðtÞ and P(t) for t 2 ½0; s m , required for the initialization of the algorithm, are generally not available in practice and require further modelling. Problem (ii) can be approached by performing a Kalman update analogous to (5) and (6). Assuming approximate Gaussianity of the joint distribution pðx tþDtÀsm:tþDt ; y tþDt jy 0:t Þ and conditioning on Y tþDt , implies that pðx tþDt Àsm:tþDt jy 0:tþDt Þ is approximately Gaussian with mean q Ã tþDtÀsm:tþDt and variance P Ã tþDt Àsm:tþDt , where q Ã tþDt Àsm:tþDt ¼ q tþDt Àsm:tþDt þ Cðy tþDt À Fq tþDt Þ P Ã tþDt Àsm:tþDt ¼ P tþDt Àsm:tþDt À CFP tþDt;tþDt Àsm:tþDt At t þ D t we therefore update, conditional on y 0:tþDt , the unobserved state mean and variance backwards until t þ D t À s m . At the end of the observation time T such updates effectively provide a 'partial' smoothing density, as the moments of x TÀsm:T are conditional on y 0:T , those of x TÀsmÀDt :TÀsm are conditional on y 0:TÀDt , etc.
The complexity introduced by the distributed delay comes at higher computational cost. When the dimension of X and, more crucially, the number of unobserved states included in the maximum delay time is large, the algorithm can be significantly slower than in the non-delayed case. This point is further investigated in Section 3.2.

Model
We consider a stochastic transcriptional and translational feedback loop (TTFL) for a circadian clock that is represented by the selfinhibition of transcription after a Gamma distributed delay time s p . The delay accounts for nuclear export, protein synthesis and nuclear import (see the Supplementary Section 3 for a formal derivation). We assume maximum delay time, s m ¼ 30 h, such that possible dependence on past states is limited to just over a day, i.e. the cycle length of the system. Furthermore, we assume that a Hill function can approximate the relationship between the amount of available inhibitor and the transcriptional rate output, i.e. where is thus the transcription function. The parameter R max is the maximum achievable transcription rate, n is the Hill coefficient and is related to the number of binding sites present in the promoter region of the regulated gene and K pc is called the dissociation coefficient, or threshold, and represents the amount of input required to decrease the output of the transcriptional function by 50%. The Hill function can be formally derived under the assumption that binding and unbinding reactions of the inhibitor to the promoter happen at a fast time-scale, if compared to the time-scale of the transcriptional reaction (see e.g. Tka cik and Walczak, 2011). The elimination of the intermediate species by means of the distributed delay and the use of a Hill type transcription function, leads to a reduced reaction network which consists of the following two reactions where X denotes the mRNA, KðÁÞ is the truncated Gamma probability density function and l is the mRNA degradation rate. Note that a key assumption has been made by considering the transcriptional process as the relevant source of intrinsic stochasticity. The continuous state-space approximation of this model, combined with a measurement equation, is where t $ N ð0; r 2 Þ, and j is the proportionality factor relating the unobserved molecular process to the light intensity. Here, the measurement Equation (4) has been modified to reflect a measurement process which involves integration of the light signal (see also Folia and Rattray, 2017). The model requires the specification of the initial condition, i.e. the mean and variance of pðx 0:sm jy 0:sm Þ. To keep the dimensionality of the parameter space to a minimum, we assume a step function for the transcription rate , thus eliminating the dependence on past mRNA (Hey et al., 2015;Jenkins et al., 2013). Further details are provided in the Supplementary Section 5.

Simulation study
We use the stochastic simulation algorithm (SSA) (Gillespie, 1977) to simulate approximately the dynamics of this reaction network where the hazard for the reaction in (8) is computed by evaluating the integral accounting for the delay up to the maximum delay time, each selected reaction is then assumed to take place immediately. As initial condition, we adopt the first 30 h of real observations for the Cry1-luc imaging data, which was rescaled, aggregated and detrended. Values of mRNA are simulated and stored at fixed timeintervals of duration of length 0.01 h, and, to mimic the integration of camera light, are summed over 0.5 h, divided by their mean level, and corrupted with normal measurement error for two chosen levels of signal to noise ratio, i.e. 20 and 100. Figure 1 shows the simulated time-series for 10 replicates of the simulation algorithm for signal to noise ratio 100. Supplementary Figure S1 shows the simulated trajectories for signal to noise ratio 20.

Filtering performance
We start by verifying the performance of the filtering methodology by setting the parameters to their true values. Computational practice requires some time-discretization of the ODEs involved. We adopt the Euler discretization method assuming d t to control the step-size of the approximation. Supplementary Figure S2 provides a graphical illustration of the filter performance for d t ¼ 0:1 on one sample simulated time-series. We note overall a good precision and, as expected, a decreased variability in the smoothing distribution. To compare the behaviour of the filter for different values of d t , we compute the empirical coverage of both the predictive and the partial smoothing densities at level 95%. Results for the two levels of measurement error are provided in Supplementary Table S1. We observe an improved precision, i.e. a decreased discrepancy between nominal and empirical coverage, as d t approaches its true value of 0.01 h, and particularly for the partial smoothing density under the lower measurement error scenario. The improvement is slower for values of d t below 0.1 h, at which coverages for all cases are already between 94 and 96%. Furthermore, any improved coverage comes at higher computational cost. Supplementary Table S2 provides average running times of the filter. We observe that the rate of increase of the computational expense increases as d t decreases, where the computation of the covariance of the unobserved states accounts for a significant proportion of the total running time. We note that alternative ODE solvers may be employed, and that the balance between precision and computational costs needs to be assessed in each case.

Inference performance
We study the performance of the proposed filter for the purpose of Bayesian inference by designing a MCMC algorithm and applying it to the last five cycles of the simulated data ( Fig. 1) with the aim of retrieving the values of the parameters used for the simulations. The MCMC scheme details are given in the Supplementary Section 6. Based on our results above and to balance the trade-off between realistic running times and filtering performance we choose to work with values d t ! 0:1 h. Visual investigation of the univariate loglikelihoods (see Supplementary Figs S3 and S4) suggests a minor effect of the choice of d t on most of the parameters involved, except r and, more marginally, j. This result motivates the design of a delayed acceptance MCMC algorithm (Christen and Fox, 2005;Golightly et al., 2015), which allows to exploit the fast likelihood computation provided by the filter for d t ¼ 0:5 h to explore the parameter space, but finally accepts values according to the likelihood provided by the filter for d t ¼ 0:1 h.
For more efficient sampling we reparameterize the model by moving j from the observation to the state equation. We generally adopt a N ð0; 10 2 Þ prior density for all model parameters on the logarithmic scale, with the exceptions of the Hill coefficient, where we assume logðnÞ to have prior N ðlogð1:5Þ; 5 2 Þ and the degradation rates during and after the initial condition, logðl 0 Þ and logðlÞ, for which we specify a N ðlogð0:58Þ; 0:5 2 Þ prior. The initial condition dispersion coefficient logðbÞ, variance logðbVar½X 0 Þ and logðjÞ are assumed to have a N ð0; 20 2 Þ prior (see Supplementary Section 5 for further details on the initial condition model specification). An informative prior centred at the true simulation value and with unit SD is assumed for the measurement error SD, logðr Þ. Finally, the delay mean and SD are assigned U (0, 23) and U (0, 20) prior densities, respectively, both expressed in hours. This can be justified by assuming that the cellular product of the previous circadian cycle is feeding into the dynamics of the next cycle. Hence, most priors have been set to be diffuse within biologically realistic ranges. A certain amount of prior information seems to be required to robustify the inferential process. In particular, prior information on the reporter protein half-life is provided by Yamaguchi et al. (2003) although here we assume a larger SD to cope with the approximate nature of our model in particular as the reporter process is not explicitly modelled. An informative prior for the SD of the measurement error has been deduced by measurements from an additional light channel (M.Unosson, personal communication) where we adopt again a larger variance to indirectly account for processes not explicitly modelled. The initial conditions for the parameter chains are randomly drawn from the prior densities. Figure 2 shows the estimated posterior densities of the model parameters for the simulated data seen in Figure 1. Two chains have been excluded due to lack of convergence, but inference seems overall satisfactory for the remaining chains. The results for the signal to noise ratio 20 scenario are provided in Supplementary Figure S5, and lead to analogous conclusions.
We have also investigated filter coverage and inference performance for increasing degrees of non-linearity by varying the Hill coefficient and setting it ¼7, 9, 11, and found that the estimation results are overall consistent with the results shown for n ¼ 5 (see Supplementary Tables S3-S5 and Figs S6-S11). We postulate that this is due to the use of the filtering step in the LNA, which seems to effectively counteract the effect of the increasing error in the LNA prediction that can be expected for higher degrees of non-linearity.

Inference for the circadian feedback loop in Cry1
The available data are time-series for the circadian gene Cry1, observed by mean of a transcriptional luciferase reporter construct Cry1-luc in a mouse SCN tissue over five days (Brancaccio et al., 2013) (see bottom panel of Fig. 1). Light intensities are recorded by  (8) and (9). Center: molecule counts are rescaled by their mean level, integrated over 0.5 h and corrupted with measurement error. The assumed levels of signal to noise ratio is 100. Bottom: experimental Cry1-luc imaging time-series, aggregated, detrended and normalized EM-CCD Camera (Hamamatsu) with an exposure time of 0.5 h, i.e. the image is the result of the photons hitting the camera in 0.5 h intervals (see Brancaccio et al., 2013). A mean trend due to consumption of the luciferin substrate is accounted for by dividing the observations by a time-varying proportionality factor which assumes a linear decrease of $30% over five days. We apply the proposed model to a sample location in the SCN where the data are aggregated over a 2 Â 2 pixel box which is broadly comparable to the size of a cell. Figure 2 provides the estimated posterior densities of the model parameters for the experimental data. Their interpretation has to be considered in the light of the fact that we have represented a complex genetic network of the circadian clock, which consist of several interwoven TTFs involving about 15 clock genes (Dibner and Schibler, 2015) by a delayed feedback loop fitted to imaging data of a single gene involved. The main achievement here over previous work is that we have provided the inferential methodology to identify such a model from time-series imaging data which can potentially serve as a realistic surrogate model to be fitted in a longitudinal or spatial fashion to quantify and compare intrinsic noise and to study between-cell variability. Of particular interest are the parameters associated with the delay distribution, where the mean influences the period of the oscillations, and the variance tunes the temporal precision of the clock period. The mean delay time here is estimated to be around 9.67 h [95% HPDI (8.46, 10.80)] which, although not statistically significant, is not far away from Koren ci c et al. (2012) who find that a value of 8.25 h achieves circadian periodicity in a deterministic feedback loop model with discrete delay for Per2 self-inhibition. Furthermore, we estimate a value of around 3.56 h [95% HPDI (2.01, 5.39)] for the SD of the delay distribution. We note that the inferred posterior distribution of the parameter is concentrated away from 0, which would correspond to a fixed delay, suggesting that the assumption of a distributed delay is indeed tenable. We also hypothesize that this estimate may be relatively small as the experimental data are from the SCN, which is known to be the main pacemaker of the mammalian clock, but may be spatially varying over the SCN. Such further questions can be addressed on the basis of the TTFL modelling and inference approach proposed here.
The model fit is checked through inspection of the data posterior predictive distribution and the standardized residuals, where we obtain samples using a thinned set of parameter samples from the MCMC algorithm. Results are displayed in Supplementary Figure  S12. The normal q-q plot reveals a slightly heavy upper tail, although this result does not seem of major concern nor to be due to a systematic under-performance on the filter, as also investigated in further yet unpublished work. Visual investigation and model diagnostics reveal no residual 24-h periodicity in the residuals after the first 30 h of initial condition (which we recall is only instrumental into obtaining a first estimate of the mean and covariance matrix), which indicates that the circadian oscillation is well explained by the model. However, a non-negligible residual periodicity of around 12 h can still be found. As pointed out to us by an anonymous Fig. 2. Left two columns: results for simulation study. Kernel densities' estimates of the model parameters posterior densities, excluding the parameters of the initial condition. E½s and SD½s denote the mean and SD of the delay density K. The prior density is shown as a dashed line while the vertical line marks the true value. Results for the last five cycles of the simulated data shown in the central panel of Figure 1 (two chains are excluded due to non-convergence). Right two columns: results for observed Cry1-luc shown in bottom panel of Figure 1. Same notations and definitions as in panels on left reviewer, one possible explanation is that the model may not be capturing part of the non-linearity observed in the data. This may be caused by processes which are not explicitly included in the model such as the influence of inter-cellular signalling. The latter is relevant for Per transcription, and therefore for PER protein, which forms the PER/CRY complex effectively repressing Cry1 transcription. The transcription function assumed above can only account for repression by means of a single transcription factor. It may also be of relevance that Zhu et al. (2017) recently found a cell-autonomous 12 h clock in mammals.

Discussion
In this work we have addressed filtering and inference for statespace models with distributed delays, with a special focus on models arising from stochastic biochemical networks with stochastic oscillatory behaviour. The methodology is derived with focus on a stochastic self-inhibitory feedback loop with distributed delay, noting that this kind of model is known to achieve a substantial model reduction of complex gene networks and is of particular interest for modelling oscillatory molecular clocks. As such complex networks are never fully observed, model reduction is also important to facilitate inference from experimental data. The introduction of a distributed delay, as compared to a fixed delay, is beneficial as it provides a more realistic description of the process. The type of oscillatory processes here considered can indeed possess a certain degree of variability in their period that a fixed delay would not be able to characterize. We have extended the LNA and EKBF to provide a methodology which allows for sequential computation of the likelihood in such models. The resulting likelihood has a closed form and can be incorporated in a Bayesian MCMC algorithm for parameter inference. The performance of the methodology is tested on simulated data and first real results are shown here for an experimental Cry1-luc time-series from a mouse SCN (Brancaccio et al., 2013).
The closest approach to date is perhaps the work of Heron et al. (2007), which performs inference on a closely related dynamical model for gene expression, where inference is based on the CLE description of the process. In contrast to our methodology, their approach does not allow to obtain a closed-form likelihood and requires sampling the parameters conditional on a sampled path of the unobserved states in the MCMC algorithm and vice-versa, rendering the inference problem very high-dimensional. As the parameters and unobserved states trajectories tend to be strongly correlated, sampling is likely to be less efficient (Golightly and Wilkinson, 2011).
It should be noted that moment-closure approximations (MA) represent an inferential approach of comparable computational cost which can also be of interest for our scenario. MA provide ODEs for the time evolution of the approximate moments of the process by truncation of higher order moments, and have been applied for inferential purposes in e.g. Zechner et al. (2012) andFrö hlich et al. (2016). The method can lead to physically implausible results, such as negative mean and variances, but Schnoerr et al. (2017) have recently shown an improved precision over the LNA for some simulation scenarios.
Alternative specifications for the transcription function can be investigated, e.g. Kim et al. (2014) assume that the inhibitory processes arise as a consequence of sequestration of the activator by the inhibitor. To date, modelling of the TTFL of circadian genes in the mammalian clock is either based on deterministic approaches, possibly comprising a simplification using a delay, or on stochastic descriptions which have not yet attempted statistical parameter inference from experimental time-series data (see Ananthasubramaniam et al., 2014;DeWoskin et al., 2015;Gonze et al., 2005;Kim and Forger, 2012;Koren ci c et al., 2012;Reló gio et al., 2011). Our approach extends over existing mathematical modelling in that it provides a novel stochastic inferential approach for such system.
We note that our approach does not specifically take into account the fact that we observe reporter protein rather than the protein of interest, which is possibly a strong but necessary simplification. Inference is challenging due to the strong parameter correlation structure, where we hope to show in future work that joint estimation of the parameters for data at many more spatial locations across the SCN may substantially aid the inferential process.
The ability of the proposed methodology to address intrinsic stochasticity is of particular importance in models of biochemical oscillators. Indeed, the idea that noisy systems are more easily entrained to an external input has been investigated both theoretically (Steuer et al., 2003) and experimentally (Ko et al., 2010). The latter have studied the role of intrinsic noise in the SCN, and provided experimental evidence from a Bmal1-null mutant mouse that noise and extracellular signalling are sufficient to produce oscillations when the TTFL is disrupted. Further insight into the stochastic biochemical oscillators and their synchronization may be achieved by performing inference on the spatial distribution of the model parameters over the SCN. The methodology presented here will form the basis to be able to perform such an analysis.