1 Introduction

Ordinary differential equations are widely used to model complex dynamical systems in many areas of science and technology. For example, ODE models have been used in the study of reconstruction of transcription factor activity for gene regulatory activities (Khanin et al. 2007; Rogers et al. 2007; Nie et al. 2019). Although ODE models are often proposed based on expert knowledge of the dynamical process of interest, the solver of the ODE relies heavily on the ODE parameters and the values of the ODE parameters are rarely known. It is an important but challenging statistical problem to estimate the ODE parameters from observational (noisy) data since that most ODEs have no analytic solutions and it is often computationally intensive to solve ODE numerically.

For estimating the ODE parameters, many efficient methods have been developed. For instance, Liang and Wu (2008) developed a two-step method using local polynomial regression. Ramsay et al. (2007), Cao and Ramsay (2007), and Cao et al. (2008) proposed a generalized profiling approach to estimate the ODE parameters. Cao et al. (2011) proposed a robust method for estimating ODE parameters when the data have outliers. Hall and Ma (2014) suggested a class of fast, easy-to-use, genuinely one-step procedures for estimating unknown parameters in dynamical system models. Brunel et al. (2014) developed a gradient matching approach for estimating ODE parameters. Zhang et al. (2015) proposed a least squares approximation method for ODE model selection. In the Bayesian framework, many researches have been made for estimating the ODE models; for example, Campbell and Steele (2012) proposed a Bayesian smooth functional tempering method for the ODE models; Bhaumik and Ghosal (2015) considered a Bayesian two-step estimation approach; Dass et al. (2017) suggested a Laplace approximation method for obtaining the posterior inference of ODE parameters; and Liu et al. (2018) proposed a stochastic approximation Monte Carlo method for estimating ODE models when the likelihood has multiple local modes, among others. However, all of the above methods assume that the data are collected for one individual dynamical process and cannot address the problem when multiple individuals dynamical processes are observed.

When the dynamical process are measured for multiple subjects, mixed-effects ODE models are developed to include random effects to account for the within-subjects correlations in the models. For instance, Huang and Wu (2006) studied a parametric hierarchical HIV dynamic model and provided an MCMC algorithm to sample from the posterior distribution of ODE parameters. Guedj et al. (2007) used the maximum likelihood approach directly to estimate unknown parameters in mixed-effects ODE models. Fang et al. (2011) proposed a fast two-stage estimating procedure for mixed-effects dynamical systems and applied it to analyze longitudinal HIV virus data. Wang et al. (2014) proposed a semiparametric method to estimate a mixed-effects ODE model for the HIV combination therapy study. A common fundamental assumption of aforementioned methods is that the observations for the dynamical process follow a normal distribution, but this assumption may lack robustness when abnormal data exist, which may lead to biased inference. Liu et al. (2019) proposed an MCMC method to estimate the mixed-effects ODE model using heavy-tailed distributions. On the other hand, the aforementioned papers only studied the parametric mixed-effects ODE models, while the semiparametric dynamic system which includes time-varying coefficients in ODE models can have many valuable applications, especially in gene transcription experiments in which the regulatory activity of the transcriptional factor is often modeled by a varying function over time [see Eq. (4.1) in Section 4].

To introduce the motivation of this paper, we revisit the experiment that studies statistical reconstruction of transcription factor activity (TFA) from time-course gene expression data (Khanin et al. 2007; Rogers et al. 2007). It is a central but difficult problem to link transcription factors (TFs) to their target genes in post-genomic biology, since that the changes in the expression of a TF are subtle and its activity is often controlled at levels of other than expression, for example, via post-transcriptional and post-transcriptional modifications.

Under the log-normal noise assumptions, Khanin et al. (2007) made statistical inference on the ODE parameters using a conjugate gradient method, and Rogers et al. (2007) developed a fully Bayesian approach to reconstruct the functional TFA from time-course gene expression data. However, this assumption may lack the robustness against departures from normality and/or abnormal data. Fig. 1a displays the histogram of the obtained residuals for the semiparametric mixed-effects gene regulation ODE model (4.1) by assuming that the observations errors and random effects both follow normal distributions. It seems that the obtained residuals present a heavy-tailed feature. This is also confirmed by the normal Q–Q plot of the residuals shown in Fig. 1b. In addition, the p-value from the Shapiro–Wilk test is \(4.232\times 10^{-11}\). Thus, a normal distribution assumption is quite doubtful and may be too restrictive to provide an accurate inference of ODE models.

Fig. 1
figure 1

The histogram and the normal Q–Q plot of the obtained residuals assuming normal distributions for observations and random effects in the gene regulation ODE model (4.1) with the fission yeast cells

To account for this departure from normality, we propose to apply the scale mixture of multivariate normal (SMN) distributions (Andrews and Mallows 1974), to model the observations errors of the dynamical process and random effects of ODE parameters. The SMN distributions are flexible heavy-tailed distributions which include the multivariate normal distribution as a special case. In the literature, the SMN distributions have been extensively applied to regression models (Lange and Sinsheimer 1993; Liu 1996), linear mixed-effects models (Choy and Smith 1997; Rosa et al. 2004) and nonlinear mixed-effects models (Meza et al. 2012; De la Cruz 2014), to obtain robust estimates against outlying observations. In this article, we intend to pursue the robust inference of the ODEs model based on the SMN distributions.

This article has three main contributions. (i) We develop a hierarchical semiparametric mixed-effects ODE model which considers the within-subject and between-subject variations simultaneously and makes the statistical inference by borrowing more information from all subjects. (ii) Our model applies a class of heavy-tailed distributions for random-effect ODE parameters and observations errors for the dynamical process, which is robust against the atypical subjects and the abnormal observations within individual subjects. (iii) Our proposed method can detect the subjects which are atypical or have abnormal observations by estimating latent parameters in the model.

The remainder of this article is organized as follows. Section 2 briefly reviews the scale mixture of multivariate normal distributions. Section 3 introduces our proposed hierarchical semiparametric mixed-effects ODE models and provides a Bayesian estimation method. Section 4 demonstrates our proposed method in comparison with the conventional method by studying the reconstruction of transcription factor activity from a set of real time-course gene expression measurements. Section 5 evaluates the finite sample performance of our proposed method using some simulation studies. We end this article with conclusions and some discussion of future research in Sect. 6. The MATLAB code for our simulation studies can be downloaded at https://www.github.com/caojiguo/SMODE.

2 Scale Mixture of Multivariate Normal Distributions

The class of SMN distributions consists of a group of heavy-tailed distributions and has been applied to make robust inference in the statistical analysis. In this section, we provide some brief knowledge of the SMN distributions. More details refer to the book of Azzalini and Capitanio (2014).

Let \({\varvec{\mu }}\) be an m-dimensional vector and \({\varvec{\Sigma }}\) be an \(m\times m\) positive definite symmetric matrix. An m-dimensional random vector \({\mathbf {Y}}\) is said to follow a scale mixture of multivariate normal distribution, if it has a probability density function

$$\begin{aligned} p({\mathbf {y}}|{\varvec{\mu }},{\varvec{\Sigma }},{\varvec{\nu }}) = \frac{1}{\sqrt{|2\pi {\varvec{\Sigma }}|}} \displaystyle \int _0^\infty u^{m/2} \exp (-\dfrac{u D^2({\mathbf {y}})}{2}) \mathrm{d}H(u|{\varvec{\nu }}), \end{aligned}$$
(2.1)

where \(D^2({\mathbf {y}})=({\mathbf {y}}-{\varvec{\mu }})^{\mathrm{T}}{\varvec{\Sigma }}^{-1}({\mathbf {y}}-{\varvec{\mu }})\), and \(H(u|{\varvec{\nu }})\) is a univariate probability distribution function which depends on a scalar- or vector-valued parameter \({\varvec{\nu }}\) and satisfies \(H(0|{\varvec{\nu }})=0\). We use the notation \({\mathbf {Y}}\sim SMN_m({\varvec{\mu }}, {\varvec{\Sigma }}, H)\) to indicate that \({\mathbf {Y}}\) has the density (2.1). This class of distributions is flexible and includes many symmetric distributions as special cases, for instance, the normal, the Student’s t, etc. When the mixture distribution function H is degenerate, \(SMN_m({\varvec{\mu }}, {\varvec{\Sigma }}, H)\) reduces to the usual multivariate normal distribution \(N_m({\varvec{\mu }},{\varvec{\Sigma }})\).

Let \({\mathbf {Z}}\sim N_m({\varvec{0}}, {\varvec{\Sigma }})\) be an m-dimensional random variable and \(U\sim H(u|{\varvec{\nu }})\) be an independent mixture variable. The SMN distributed random variables \({\mathbf {Y}}\) can be written as Azzalini and Capitanio (2014)

$$\begin{aligned} {\mathbf {Y}}= {\varvec{\mu }}+U^{-1/2}{\mathbf {Z}}. \end{aligned}$$
(2.2)

From (2.2), applying the iterative law of expectations, the mean and covariance of \({\mathbf {Y}}\) are given, respectively, by

$$\begin{aligned} \mathrm{E}({\mathbf {Y}}) = \mathrm{E}[\mathrm{E}({\mathbf {Y}}|U)]={\varvec{\mu }}, \end{aligned}$$

and

$$\begin{aligned} \mathrm{Cov}({\mathbf {Y}}) = \mathrm{E}[\mathrm{Cov}({\mathbf {Y}}|U)]+\mathrm{Cov}[\mathrm{E}({\mathbf {Y}}|U)]=\mathrm{E}(U^{-1}){\varvec{\Sigma }}. \end{aligned}$$
(2.3)

Trivially, when \(\mathrm{E}(U^{-1})<\infty \), \({\mathbf {Y}}\) has a finite positive definite covariance matrix.

In fact, a more convenient stochastic representation is to use the following hierarchical representation (De la Cruz 2014)

$$\begin{aligned} {\mathbf {Y}}|U \sim N_m({\varvec{\mu }}, U^{-1}{\varvec{\Sigma }}),\quad U \sim H(u|{\varvec{\nu }}). \end{aligned}$$
(2.4)

A special distribution of the SMN class is the multivariate t distribution (Lange et al. 1989) that has been extensively applied in robust regressions. In the rest of this paper, we use the notation \(Ga(\alpha ,\beta )\) to denote a Gamma distribution with shape parameter \(\alpha \) and rate parameter \(\beta \), which has the following density

$$\begin{aligned} p(x|\alpha ,\beta ) = \dfrac{\beta ^\alpha }{\Gamma (\alpha )}x^{\alpha -1}\exp (-\beta x),\quad x>0. \end{aligned}$$

Then, the multivariate t distribution is obtained by assuming that \(U\sim Ga(\nu /2,\nu /2)\) in (2.4), where the parameter \(\nu \) corresponds to the degrees of freedom of the multivariate Student’s t distribution. If further letting \(\nu \rightarrow \infty \), the Gaussian distribution is recovered.

3 Estimating Semiparametric Mixed-Effects ODE

3.1 Bayesian Framework

Consider the dynamical process \(X_i(t)\) for the i-th subject, \(i=1,\ldots ,n\), which is defined by

$$\begin{aligned} \dfrac{\mathrm{d}X_i(t)}{\mathrm{d}t} = F(X_i(t)|\eta (t),{\varvec{\theta }}_i), \end{aligned}$$
(3.1)

where t is continuous in some interval [0, T], F is a known function, \(\eta (t)\) is the population time-varying function, and \({\varvec{\theta }}_i\) is a q-dimensional vector of individual ODE parameters for the i-th subject. Without loss of generality, we assume that \(X_i(t)\) is one-dimensional dynamical curve in this article. We also assume that the subject-specific ODE parameters are expressed as \({\varvec{\theta }}_i = {\varvec{\xi }}+\mathbf {b}_i\), where \({\varvec{\xi }}\) is a q-dimensional fixed effect and \(\mathbf {b}_i\) is a q-dimensional random effect. Let \({\mathbf {Y}}_i = (y_{i1},\ldots ,y_{in_i})^{\mathrm{T}}\) denote the vector of observations or measurements for the i-th subject at the observation times \({\mathbf {t}}_i=(t_{i1},\ldots ,t_{in_i})^{\mathrm{T}}\) and \({\mathbf {X}}_i=(X_i(t_{i1}),\ldots ,X_i(t_{in_i}))^{\mathrm{T}}\) with \(X_i(t)\) being the solution of the ODE (3.1) conditional on \(\{X_i(0), \eta (t), {\varvec{\theta }}_i\}\). Generally, we assume that \({\mathbf {Y}}_i = h({\mathbf {X}}_i)+{\varvec{\epsilon }}_i\), where \(h(\cdot )\) is a known function (e.g., \(h(\cdot )=\log (\cdot )\) in many statistical analysis) and \({\varvec{\epsilon }}_i\) are measurement errors. In conventional methods, a common assumption is that the random effects \(\mathbf {b}_i\) and the data errors \({\varvec{\epsilon }}_i\) both follow the multivariate normal distributions. However, as discussed in Sect. 1, such normality assumptions are vulnerable in the presence of atypical observations, which can seriously affect the estimation accuracy of the mixed-effects ODE model. Thus, more flexible distributions are necessary to replace the normality assumption. To this aim, we propose to use the scale mixture of multivariate normal distributions for ODE random effects \(\mathbf {b}_i\) and within-subject data errors \({\varvec{\epsilon }}_i\). In other words, we assume \(\mathbf {b}_i \sim SMN_q({\varvec{0}}, {\varvec{\Sigma }}, H_1)\) and \({\varvec{\epsilon }}_i \sim SMN_{n_i}({\varvec{0}}, \sigma ^2_\epsilon {\mathbf {I}}_{n_i}, H_2)\) where \({\varvec{\Sigma }}\) is an unknown positive definite symmetric matrix.

Applying the stochastic representation (2.4), the ODE model of interest is presented as the following hierarchical structure

$$\begin{aligned} \begin{array}{ccl} {\mathbf {Y}}_i|{\varvec{\theta }}_i,U_i &{} {\mathop {\sim }\limits ^{ \mathrm{ind.} }} &{} N_{n_i}(h({\mathbf {X}}_i|{\varvec{\theta }}_i, \eta (\cdot )), U^{-1}_i\sigma ^2_\epsilon {\mathbf {I}}_{n_i})\;,\\ {\varvec{\theta }}_i|W_i &{} {\mathop {\sim }\limits ^{ \mathrm{ind.} }} &{} N_q({\varvec{\xi }}, W^{-1}_i{\varvec{\Sigma }}) \;,\\ U_i &{} {\mathop {\sim }\limits ^{ \mathrm{ind.} }} &{} H_1(\cdot |{\varvec{\nu }})\;,\\ W_i &{} {\mathop {\sim }\limits ^{ \mathrm{ind.} }} &{} H_2(\cdot |{\varvec{\kappa }})\;,\\ \end{array} \end{aligned}$$
(3.2)

where \(U_i\) and \(W_i\) are two latent variables with distributions \(H_1(\cdot |{\varvec{\nu }})\) and \(H_2(\cdot |{\varvec{\kappa }})\), respectively. One popular choice for the latent variable distribution \(H_1\) (and \(H_2\)) is to use the gamma distributions (Azzalini and Capitanio 2014). For example, in this article, we assume that \(U_i\sim Ga(\nu /2, \nu /2)\) with the shape parameter \(\nu /2\) and the rate parameter \(\nu /2\), which leads to a multivariate t distribution for \({\mathbf {Y}}_i\). Similarly, we assume that \(W_i\sim Ga(\kappa /2, \kappa /2)\) with the shape parameter \(\kappa /2\) and the rate parameter \(\kappa /2\), which leads to a multivariate t distribution for \({\varvec{\theta }}_i\). When \(U_i\) and \(W_i\) have degenerate distributions, model (3.2) reduces to the conventional model with the normal distribution assumption. However, when some \(U_i^{-1}\) has a large value, it indicates that the i-th subject may have outlying observations. When some \(W_i^{-1}\) has a large value, it indicates that the i-th subject may be an atypical subject with outlying ODE parameters. This outlier detection will be demonstrated in our applications in Sect. 4. Hence, our proposed model (3.2) is more flexible than the conventional model with the normal distribution assumption.

We estimate the time-varying parameter \(\eta (t)\) using a linear combination of spline basis functions:

$$\begin{aligned} \eta (t) = \sum _{l=1}^J \phi _l(t)\zeta _l = {\varvec{\zeta }}^{\mathrm{T}} {\varvec{\phi }}(t)\,, \end{aligned}$$
(3.3)

where \({\varvec{\phi }}(t) = (\phi _1(t),\ldots ,\phi _J(t))^{\mathrm{T}}\) is a vector of basis functions with dimension J, and \({\varvec{\zeta }}=(\zeta _1,\ldots ,\zeta _J)^{\mathrm{T}}\) is the corresponding vector of basis coefficients. In this article, we choose cubic B-splines as basis functions for expanding \(\eta (t)\), and the smoothness of \(\eta (t)\) is controlled by penalizing the difference of adjacent B-splines coefficients. In the Bayesian framework, this penalization is conveniently implemented by assuming a kth-order random walk prior on the splines coefficients \(\zeta _1,\ldots ,\zeta _J\) (Lang and Brezger 2004). The first-order random walk is defined as

$$\begin{aligned} \zeta _l = \zeta _{l-1}+e_l, \quad l\ge 2, \end{aligned}$$
(3.4)

with \(e_l \sim N(0, \lambda _\eta ^{-1})\) and a diffuse prior \(\zeta _1 \propto constant\). The second-order random walk is defined as

$$\begin{aligned} \zeta _l = 2\zeta _{l-1}-\zeta _{l-2}+e_l, \quad l\ge 3, \end{aligned}$$
(3.5)

with \(e_l \sim N(0,\lambda _\eta ^{-1})\), and a diffuse prior \(\zeta _1 \propto { constant}\) and \(\zeta _2 \propto { constant}\). The smoothness of \({\hat{\eta }}(t)\) is controlled by the additional parameter \(\lambda _\eta >0\). In this article, we apply the second-order random walk prior (3.5). To avoid the possibility of improper posteriors of \({\varvec{\zeta }}\), we assign weakly informative priors on \((\zeta _1,\zeta _2)\): \(\zeta _1 \sim N(0,100)\) and \(\zeta _2 \sim N(0,100)\). Then, the prior of \({\varvec{\zeta }}\) can be equivalently written in the following form

$$\begin{aligned} p({\varvec{\zeta }}|\lambda _\eta ) \propto \lambda _\eta ^{(J-2)/2} \exp \left( -\dfrac{1}{2}{\varvec{\zeta }}^{\mathrm{T}}{\mathbf {M}}(\lambda _\eta ){\varvec{\zeta }}\right) , \end{aligned}$$
(3.6)

where \({\mathbf {M}}(\lambda _\eta )=\mathrm{diag}(0.01, 0.01, \lambda _\eta {\mathbf {D}}_2{\mathbf {D}}^{\mathrm{T}}_2)\) with

$$\begin{aligned} {\mathbf {D}}_2 = \left( \begin{array}{ccccccc} 1 &{}\quad -2 &{}\quad 1 &{}\quad 0 &{}\quad 0 &{}\quad \cdots &{}\quad 0\\ 0 &{}\quad 1 &{}\quad -2&{}\quad 1 &{}\quad 0 &{}\quad \cdots &{}\quad 0\\ \vdots &{}\quad \vdots &{}\quad \vdots &{}\quad \vdots &{}\quad \vdots &{}\quad \cdots &{}\quad \vdots \\ 0 &{}\quad 0 &{}\quad 0 &{}\quad 0 &{}\quad 0 &{}\quad \cdots &{}\quad 1 \end{array} \right) _{(J-2)\times J}. \end{aligned}$$
(3.7)

The unknown parameter \(\lambda _\eta \) is assigned a hyper-prior \(Ga(a_\lambda ,b_\lambda )\) with the pre-specified values of the hyper-parameters \((a_\lambda , b_\lambda )\).

In nonparametric regressions, the number of spline coefficients, J, may potentially impact the fitted results. In the Bayesian scheme, there are two approaches to deal with the choice of J. One is fixing J, and the other is treating J as an unknown variable to be estimated (for example, using the reversible jump MCMC algorithm, see Biller and Fahrmeir 2001). Lang and Brezger (2004) compared these two methods and found that the approach that fixes J performed better for functions with moderate curvature, while the approach that treats J as random performed better for highly oscillating functions. In this article, J is fixed and pre-determined in our approach [generally from 20 to 40, see Lang and Brezger (2004)]. The simulation results showed that our method is robust on the choice of J, see Figure S5 in the Supplementary file.

Define \({\varvec{\Theta }}=({\varvec{\theta }}^{\mathrm{T}}_1,\ldots ,{\varvec{\theta }}^{\mathrm{T}}_n)^{\mathrm{T}}\) and let \({\mathbf {U}}=(U_1,\ldots ,U_n)^{\mathrm{T}}\) and \({\mathbf {W}}=(W_1,\ldots ,W_n)^{\mathrm{T}}\) be the latent variables. Then, the joint likelihood is given by

$$\begin{aligned} L({\mathbf {Y}},{\varvec{\Theta }},{\mathbf {U}},{\mathbf {W}}|{\varvec{\zeta }},{\varvec{\Sigma }},{\varvec{\xi }},\sigma ^2_\epsilon ,\nu ,\kappa ) = \prod _{i=1}^n L_i({\mathbf {Y}}_i,{\varvec{\theta }}_i,U_i,W_i|{\varvec{\zeta }},{\varvec{\Sigma }},{\varvec{\xi }},\sigma ^2_\epsilon ,\nu ,\kappa ), \end{aligned}$$

where \(L_i(\cdot |\cdot )\) is the likelihood function of the i-th subject, that is,

$$\begin{aligned} L_i({\mathbf {Y}}_i,{\varvec{\theta }}_i,U_i,W_i|{\varvec{\zeta }},{\varvec{\Sigma }},{\varvec{\xi }},\sigma ^2_\epsilon ,\nu ,\kappa ) = L_i({\mathbf {Y}}_i,U_i|{\varvec{\theta }}_i,{\varvec{\zeta }},\sigma ^2_\epsilon ,\nu ) L_i({\varvec{\theta }}_i,W_i|{\varvec{\xi }},{\varvec{\Sigma }},\kappa ), \end{aligned}$$

with

$$\begin{aligned} L_i({\mathbf {Y}}_i,U_i|{\varvec{\theta }}_i,{\varvec{\zeta }},\sigma ^2_\epsilon ,\nu ) = p({\mathbf {Y}}_i|U_i,{\varvec{\theta }}_i,{\varvec{\zeta }},\sigma ^2_\epsilon )H_1(U_i|\nu ), \end{aligned}$$

and

$$\begin{aligned} L_i({\varvec{\theta }}_i,W_i|{\varvec{\xi }},{\varvec{\Sigma }},\kappa ) = p({\varvec{\theta }}_i|{\varvec{\xi }},{\varvec{\Sigma }},W_i)H_2(W_i|\kappa ). \end{aligned}$$

To complete the Bayesian specification of the proposed model, the following prior distributions are assigned: \(\sigma ^{-2}_\epsilon \sim Ga(a_0,b_0)\), \({\varvec{\xi }}\sim N({\varvec{\xi }}_0, {\varvec{\Psi }}_0)\) and \({\varvec{\Sigma }}\sim IW({\mathbf {S}}_0, df)\), where the Gamma distribution \(Ga(a_0,b_0)\) has the shape parameter \(a_0\) and the rate parameter \(b_0\), and the inverse Wishart distribution \(IW({\mathbf {S}}_0, df)\) has the scale matrix \({\mathbf {S}}_0\) and degrees of freedom df. The hyper-parameters \(a_0\), \(b_0\), \({\varvec{\xi }}_0\), \({\varvec{\Psi }}_0\), \({\mathbf {S}}_0\) and df are pre-specified. The priors for \(\nu \) and \(\kappa \) depend on which distributions chosen for \(H_1(\cdot |\nu )\) and \(H_2(\cdot |\kappa )\). If both \(H_1(\cdot |\nu )\) and \(H_2(\cdot |\kappa )\) are chosen as gamma distributions, then the values of \(\nu \) and \(\kappa \) must be chosen to ensure \(\mathrm{E}(U^{-1})<\infty \) and \(\mathrm{E}(W^{-1})<\infty \) which further lead to both \({\mathbf {Y}}_i\) and \({\varvec{\theta }}_i\) have finite positive definite covariance matrices. For convenient conjugacy, in this article, we specify the priors of \(\nu \) and \(\kappa \) as truncated exponential priors, \(\lambda _\nu \exp (-\lambda _\nu \cdot \nu )I(\nu >2.0)\) and \(\lambda _\kappa \exp (-\lambda _\kappa \cdot \kappa )I(\kappa >2.0)\), with pre-specified values \(\lambda _\nu \) and \(\lambda _\kappa \), respectively.

The joint posterior distribution of the parameters of the model conditional on the data is obtained by combining the joint likelihood and the prior distributions using the Bayes’ theorem. The full conditional posterior distributions are sampled using the Monte Carlo methods which are presented in the Supplementary file.

3.2 Model Comparisons

To determine the best fitting model in a class of candidate models, in this article, we apply two common measures of model adequacy: the deviance information criterion (DIC; see, Spiegelhalter et al. 2002; Celeux et al. 2006) and conditional predictive ordinate (\(\mathrm{CPO}\); see, Chen et al. 2000). In this section, we briefly review the theory of DIC and \(\mathrm{CPO}\) under a general Bayesian hierarchical framework.

The DIC has been extensively applied in Bayesian statistics, which measures the fit and the complexity of the candidate model. Suppose that \({\mathbf {y}}=(y_1,\ldots ,y_n)^{\mathrm{T}}\) is a sample with the probability distribution function \(f({\mathbf {y}}|{\varvec{\vartheta }})\) which depends on the parameter vector \({\varvec{\vartheta }}\). Then, the deviance is given by

$$\begin{aligned} D({\varvec{\vartheta }})=-\,2\log f({\mathbf {y}}|{\varvec{\vartheta }})+2\log g({\mathbf {y}}), \end{aligned}$$

where \(g({\mathbf {y}})\) is some fully specified normalized term. Define the effective number of parameters

$$\begin{aligned} p_D=\overline{D({\varvec{\vartheta }})}-D({\bar{{\varvec{\vartheta }}}}), \end{aligned}$$

where \(\overline{D({\varvec{\vartheta }})}\) is the posterior mean deviance which is defined as

$$\begin{aligned} \overline{D({\varvec{\vartheta }})}=E_{{\varvec{\vartheta }}|{\mathbf {y}}}[D({\varvec{\vartheta }})]=E_{{\varvec{\vartheta }}|{\mathbf {y}}}[-2\log f({\mathbf {y}}|{\varvec{\vartheta }})]+2\log g({\mathbf {y}}), \end{aligned}$$

and \({\bar{{\varvec{\vartheta }}}}\) is a posterior mean estimate of \({\varvec{\vartheta }}\) depending on \({\mathbf {y}}\). Then, the DIC statistic is defined as

$$\begin{aligned} \mathrm{DIC}=\overline{D({\varvec{\vartheta }})}+p_D=2\overline{D({\varvec{\vartheta }})}-D({\bar{{\varvec{\vartheta }}}}). \end{aligned}$$
(3.8)

A smaller DIC value indicates a better model.

The CPO is a Bayesian diagnostic tool which is based on leave-one-out-cross-validation. The CPO can be used to detect the surprising observations. Let \({\mathbf {y}}_{-i}=(y_1,\ldots ,y_{i-1},y_{i+1}, \ldots ,y_n)^{\mathrm{T}}\) which is obtained from the sample \({\mathbf {y}}\) after omitting \(y_i\). The CPO for \(y_i\) is defined as

$$\begin{aligned} \mathrm{CPO}_i=f(y_i|{\mathbf {y}}_{-i})=\dfrac{f({\mathbf {y}})}{f({\mathbf {y}}_{-i})}=\int f(y_i|{\varvec{\vartheta }}, {\mathbf {y}}_{-i}) p({\varvec{\vartheta }}|{\mathbf {y}}_{-i})\mathrm{d}{\varvec{\vartheta }}, \end{aligned}$$
(3.9)

which gives the predictive distribution of each data point conditional on the remainder of the data. Since that \(\mathrm{CPO}_i\) involves a multiple integration, we estimate it based on the MCMC samples of \({\varvec{\vartheta }}\) (Carlin and Louis 2008). Let \({\varvec{\vartheta }}_1,\ldots ,{\varvec{\vartheta }}_M\) be the posterior samples from the posterior distribution \(p({\varvec{\vartheta }}|{\mathbf {y}})\) with the size M after the burn-in. A Monte Carlo estimate of \(\mathrm{CPO}_i\) is given by

$$\begin{aligned} \widehat{\mathrm{CPO}}_i = \left\{ \dfrac{1}{M}\sum _{\ell =1}^M \dfrac{1}{f(y_i|{\varvec{\vartheta }}_{\ell })}\right\} ^{-1}, \end{aligned}$$

where \(\{{\varvec{\vartheta }}_{\ell }\}_{\ell =1}^M\) are the posterior samples of \({\varvec{\vartheta }}\) (De la Cruz 2014). More conveniently, a common summary statistic of \(\mathrm{CPO}_i\)’s is defined as \(B = \sum _{i=1}^n \log (\widehat{\mathrm{CPO}}_i)\), which is often called the logarithm of the pseudo-Bayes factor. A larger value of B indicates a better model.

4 Application: Gene Regulation Study

In this study, we consider the cell-cycle microarray data of S. pombe or fission yeast (Rustici et al. 2004) which studied three transcription factors that were involved in regulating three different groups of genes in the fission yeast cell cycle. A single input motif (SIM) is a class of small regulatory subnetwork which consists of one transcription factor regulating a set of \(i=1,\ldots ,n\) target genes. In this article, we only analyze one SIM, but our proposed model can be extended to multiple SIMs. Rustici et al. (2004) studied a SIM with one transcription factor, Sep1p and 14 target genes. In addition, gene ace2, which is also coded as Ace2p, is known to be the target gene of Sep1p (Bahler 2005). Hence, we include gene ace2 in the SIM as another target gene. Thus, we have totally 15 target genes regulated by the same transcription factor, Sep1p.

The expressions of all genes are measured at 20 equally spaced time points on [0, 285]. Let \({\mathbf {X}}_i = (X_i(t_{i1}),\ldots ,X_i(t_{in_i}))^{\mathrm{T}}\) be the expressions of the i-th gene at the common observation time points \({\mathbf {t}}_i=(t_{i1},\ldots ,t_{in_i})^{\mathrm{T}}\). Let \({\mathbf {Y}}_i = (y_{i1},\ldots ,y_{in_i})^{\mathrm{T}}\) be the logarithms of noisy measurements of gene expressions with \(n_i=20\). As discussed in Sect. 1, it is doubtful that \({\mathbf {Y}}_i\) follows a multivariate normal distribution. Therefore, in this article, we assume that \( {\mathbf {Y}}_i\) follows the scale mixture of multivariate normal distributions \(SMN_{n_i}(\log {\mathbf {X}}_i, \sigma ^2_\epsilon {\mathbf {I}}_{n_i}, H_1)\), where \(H_1\) is a gamma distribution with the shape parameter \(\nu /2\) and rate parameter \(\nu /2\). This is equivalent to assume a hierarchical structure \( {\mathbf {Y}}_i|U_i \sim N_{n_i}(\log {\mathbf {X}}_i, U^{-1}_i\sigma ^2_\epsilon {\mathbf {I}}_{n_i})\) and \(U_i \sim Ga(\nu /2, \nu /2)\).

One of the often-used semiparametric gene regulation dynamical systems is given by Khanin et al. (2007) and Rogers et al. (2007)

$$\begin{aligned} \dfrac{\mathrm{d}X_i(t)}{\mathrm{d}t} = \alpha _i+ \beta _i \dfrac{\eta (t)}{\eta (t)+\gamma _i} - \delta _i X_i(t),\quad i=1,\ldots ,n, \end{aligned}$$
(4.1)

where \(\eta (t)\) is the activity function of the transcriptional factor, Sep1p. To remove the positive constraints on the ODE parameters \((\alpha _i,\beta _i,\gamma _i,\delta _i)^{\mathrm{T}}\), they are reparameterized in the logarithmic scales. Besides, the initial condition \(X_i(0)\) is assumed unknown and also estimated together with the ODE parameters. Let \({\varvec{\theta }}_i=(\log (X_i(0)), \log (\alpha _i), \log (\beta _i),\log (\gamma _i),\log (\delta _i))^{\mathrm{T}}\). We assume that \({\varvec{\theta }}_i\) follows the scale mixture of multivariate normal distribution \(SMN_5({\varvec{\xi }}, {\varvec{\Sigma }}, H_2)\), where \({\varvec{\xi }}=(\log (X(0)), \log (\alpha ), \log (\beta ),\log (\gamma ),\log (\delta ))^{\mathrm{T}}\) and \(H_2\) is a gamma distribution with the shape parameter \(\kappa /2\) and the rate parameter \(\kappa /2\). This is equivalent to the hierarchical structure: \({\varvec{\theta }}_i|W_i \sim N_5({\varvec{\xi }}, W_i^{-1}{\varvec{\Sigma }})\) and \(W_i \sim Ga(\kappa /2, \kappa /2)\).

For the identifiability problem, we rescale the transcriptional factor activity \(\eta (t)\) to satisfy \(\eta (0)=1\). We use cubic B-spline basis functions with 20 equally spaced knots on [0, 285]. Then, \(\eta (t)\) is approximated by a linear combination of B-spline basis functions:

$$\begin{aligned} \eta (t) = \exp \{\phi _0(t)\zeta _0+\sum \limits _{l=1}^J \phi _l(t)\zeta _l\} = \exp \{\phi _0(t)\zeta _0+{\varvec{\zeta }}^{\mathrm{T}} {\varvec{\phi }}(t)\}, \end{aligned}$$

where \({\varvec{\phi }}(t)=(\phi _1(t),\ldots ,\phi _J(t))^{\mathrm{T}}\), \(\zeta _0\) and \({\varvec{\zeta }}= (\zeta _1,\zeta _2,\ldots ,\zeta _J)^{\mathrm{T}}\) are the corresponding basis coefficients. The property of B-spline basis functions allows that \(\eta (0)=1\) is equivalent to \(\zeta _0 = 0\). We assign a prior for other basis coefficients \(\zeta _1,\ldots ,\zeta _J\): \(p({\varvec{\zeta }}|\lambda _\eta ) \propto \lambda _\eta ^{(J-2)/2} \exp (-\frac{1}{2}{\varvec{\zeta }}^{\mathrm{T}}{\mathbf {M}}(\lambda _\eta ){\varvec{\zeta }})\) where \({\mathbf {M}}(\lambda _\eta )=\mathrm{diag}(0.01, 0.01, \lambda _\eta {\mathbf {D}}_2{\mathbf {D}}^{\mathrm{T}}_2)\) and \({\mathbf {D}}_2\) is given by (3.7).

We set a gamma prior Ga(0.01, 0.01) for \(\sigma ^{-2}_\epsilon \), a gamma prior Ga(1, 0.01) for \(\lambda _\eta \), an inverse Wishart prior \(IW({\mathbf {S}}_0, f_0)\) for \({\varvec{\Sigma }}\) and a multivariate normal prior \(N_5({\varvec{\xi }}_0, {\varvec{\Psi }}_0)\) for \({\varvec{\xi }}\). Moreover, we chose the following values for the hyper-parameters: \({\varvec{\xi }}_0 = (0, 0, 0, 0, 0)^{\mathrm{T}}\), \({\mathbf {S}}_0 = \mathrm{diag}(10, 10, 10, 10, 10)\), \({\varvec{\Psi }}_0 = \mathrm{diag}(1000,1000,1000,1000,1000)\), and \(f_0 = 6\). For the hyper-priors of \(\nu \) and \(\kappa \), we choose \(\frac{1}{3}\exp (-\frac{\nu }{3})I(\nu >2.0)\) and \(\frac{1}{3}\exp (-\frac{\kappa }{3})I(\kappa >2.0)\).

We compare our proposed SMN model which assumes both the individual ODE parameters and the observations errors to follow the scale mixture of multivariate normal distribution with the conventional normal model which assumes both the individual ODE parameters and the observations errors to follow the normal distributions. To improve mixing convergence of the posterior samples, the adaptive proposals are applied in the Metropolis–Hastings algorithm. The details are provided in Section S1 of the Supplementary document. For each of SMN model and normal model, the proposed MCMC algorithm is run for 150,000 iterations. With the ‘burn-in’ of the first 50,000 iterations, the samples of \((\alpha ,\beta ,\gamma ,\delta )^{\mathrm{T}}\) are chosen with a lag of size 20 from the rest of Markov chains under each of the models. The convergence of the MCMC chains was monitored using trace plots and Gelman–Rubin convergence diagnostic (\({\hat{R}}\)). We report the results of the convergence diagnostic results in Table S1 of the Supplementary document. Gelman and Rubin (1992) suggested that diagnostic \({\hat{R}}\) values greater than 1.2 for any of the model parameters should indicate non-convergence. Here, we use a rule of \({\hat{R}}<1.1\) to declare convergence of the Markov chains. All these diagnostics indicate a good convergence of the Markov chains. We calculate the values of the DIC using the formula (3.8) and the pseudo-Bayes factor \(B = \sum _{i=1}^n \log (\widehat{\mathrm{CPO}}_i)\) based on the formula (3.9). The results are: For the SMN model, \(\mathrm{DIC}=-\,2186.90\) and \(B= 174.16\), whereas for the normal model, \(\mathrm{DIC}=-\,1502.30\) and \(B=146.04\). Obviously, our proposed model using the SMN model has a smaller value of DIC and a large value of the logarithm of the pseudo-Bayes factor \(B = \sum _{i=1}^n \log (\widehat{\mathrm{CPO}}_i)\) than the conventional model using the normal distribution, which indicates that our proposed model is better. The values of \(\mathrm{CPO}_i\) for each individual are calculated and reported in Table S2 of the Supplementary document. Obviously, the overall CPO is driven by a few overly influential values under the multivariate normal model, e.g., the 12th gene. In contrast, the influence of those points is greatly attenuated by assuming a scaled mixture of normal distributions. As the requirement of a referee, we also calculate \(\mathrm{CPO}_i\) using the bridge sampling method (Meng and Wang 1996) and the results are: For the SMN model, \(B=702.24\), whereas for the normal model, \(B=110.57\), which further demonstrated that the SMN model is better than the normal model.

Table 1 shows the posterior means, the standard errors and the corresponding 95% equal-tail credible intervals for the fixed effects \((\alpha ,\beta ,\gamma ,\delta )^{\mathrm{T}}\) in our proposed model. Table 2 displays the estimated latent parameters in the gene regulation mixed-effects ODE model (4.1) under the assumption that the individual ODE parameters and observations errors follow the SMN distributions. All of \({\widehat{W}}^{-1}_i\) and most of \({\widehat{U}}^{-1}_i\) are around 1.0, but Gene 12 has very large values of \({\widehat{U}}^{-1}_i\). This implies that Gene 12 may have abnormal observations. The left panel of Fig. 2 displays the measured gene expressions and the estimated \(X_i(t)\) for Gene 12. As expected, the estimate of gene expression under the normal model is much affected by the abnormal observations. In contrast, the estimate of gene expression under the SMN model is robust against the abnormal observations. Furthermore, from Fig. 1a, it seems that the residuals present a feature of skewness. We performed a mixed-effects ODE model by assuming that the ODE parameters and observations errors follow the scale mixture of multivariate skew normal distributions; however, it is found that the skewness is not significant. Finally, the estimated transcriptional factor activity is displayed in the right panel of Fig. 2. It shows that the transcriptional factor activity has two periodic cycles, which coincided with the results of Rogers et al. (2007).

5 Simulation Studies

In this section, we implement four simulation studies to evaluate the finite sample performance of our proposed semiparametric mixed-effects ODE models.

5.1 Scenario I

As stated in Sect. 1, the semiparametric ordinary differential equations have valuable applications in the gene transcription experiments. To mimic the transcription activity in gene transcription experiments, we consider the following semiparametric ODE (Barenco et al. 2006):

$$\begin{aligned} \dfrac{\mathrm{d}X_i(t)}{\mathrm{d}t} = \alpha _i + \beta _i \eta (t)-\delta _i X_i(t), \quad t\in [0, 8], \end{aligned}$$
(5.1)

where the changing level of a gene i’s expression, \(X_i(t)\), is given by a combination of a basal transcription rate, \(\alpha _i\), a sensitivity, \(\beta _i\), to its governing TF’s activity, \(\eta (t)\), and the decay rate of the mRNA, \(\delta _i\). We generate the logarithm of the individual ODE parameters \({\varvec{\theta }}_i = \log \{(\alpha _i, \beta _i, \delta _i)^{\mathrm{T}}\}\) from a multivariate normal distribution \(N({\varvec{\xi }}, {\varvec{\Sigma }})\) with the mean vector \({\varvec{\xi }}\) and the covariance matrix \({\varvec{\Sigma }}=0.04\cdot \!{\mathbf {R}}\), where the true values of \({\varvec{\xi }}\) are chosen as \({\varvec{\xi }}=\log \{(\alpha ,\beta ,\delta )^{\mathrm{T}}\}\) with \(\alpha =1.2\), \(\beta =3.5\) and \(\delta =1.0\), and the correlation matrix \({\mathbf {R}}=(r_{ij})_{3\times 3}\) with \(r_{12}=0.45\), \(r_{13}=-\,0.25\) and \(r_{23}=-\,0.45\). The activity function of the transcriptional factor is given by a mixture of three Gaussian-like bumps, \(\eta (t) = \exp [-(t-2)^2/1.44]+\exp [-(t-5)^2]+0.5\exp [-(t-6)^2/0.64]-0.062\), which was considered in Chapter 9 of Lawrence et al. (2010) with a similar formula.

Table 1 The posterior means, standard deviations (SD) and 95% credible intervals of the population ODE parameters \((\alpha ,\beta ,\gamma ,\delta )^{\mathrm{T}}\) in the gene regulation mixed-effects ODE model (4.1) under the SMN model and normal model, respectively
Table 2 The estimated latent parameters in the gene regulation mixed-effects ODE model (4.1) under the SMN model which assumes that the ODE parameters and observations errors follow the scale mixture of multivariate normal distributions
Fig. 2
figure 2

Example of inference applying the SMN model (solid lines) and normal model (slashed lines) with real microarray data from fission yeast. a The observations of gene expressions (circles) and the numerical solutions of the gene regulation mixed-effects ODE model (4.1) for Gene 12; b the posterior mean estimates of \(\eta (t)\); the shaded area denotes the 95% credible intervals of the TFA \(\eta (t)\) under the SMN model

The true initial condition \(X_i(0)\) is generated as \(X_i(0) \sim N(1.0, 0.04)\). Then, our simulated data are generated as \( Y_i(t_{ij})= X_i(t_{ij})+\epsilon _{ij}\), where \(X_i(t_{ij})\) is the numerical solution of ODE (5.1) via the fourth-order Runge–Kutta algorithm evaluated at 201 equally spaced time points on [0, 8], and \(\epsilon _{ij}\) are the measurement errors which are independently generated from a normal distribution N(0, 0.25). We then estimate the mixed-effects ODE (5.1) by assuming the ODE parameter \({\varvec{\theta }}_i\) and the data error \(\epsilon _{ij}\) follow the SMN distributions. This proposed model is also compared with the conventional model which assumes \({\varvec{\theta }}_i\) and \(\epsilon _{ij}\) follow the normal distribution. With the ‘burn-in’ of the first 10,000 iterations, we obtain 1000 posterior samples from the rest of the Markov chains with a lag of size 10. The above procedure is repeated for 100 simulation replicates.

To evaluate the performance of the estimate for the time-varying ODE parameter \(\eta (t)\), we define the following mean absolute deviation error (MADE)

$$\begin{aligned} \mathrm{MADE}({\hat{\eta }}) = \dfrac{1}{MN}\sum _{\ell =1}^M\sum _{j=1}^{N}|{\hat{\eta }}^{(\ell )}(t_j)-\eta (t_j)|, \end{aligned}$$

where \(t_j, j=1,\ldots ,N\), are \(N=201\) equally spaced time points in [0, 8], and \({\hat{\eta }}^{(\ell )}(t)\) is the estimate of \(\eta (t)\) in the \(\ell \)-th simulation replicate for \(\ell =1,\ldots ,M\). Table S3 of the Supplementary file shows the posterior means, standard deviations as well as the mean absolute deviation errors for the fixed effect \((\alpha ,\beta ,\delta )^{\mathrm{T}}\). In Table S3, for \(\alpha \) and \(\eta (\cdot )\), our proposed model based on the SMN distribution has a little larger MADE values but has smaller MADE values for \(\beta \) and \(\delta \) in comparison with the conventional model using the normal distribution for \(n=50\) and 100. We think that our proposed model using the SMN distribution is compatible with the conventional model using the normal distribution.

Table 3 The bias, standard deviation(SD) and mean absolute deviation error (MADE) of estimates for the fixed effects of the mixed-effects ODE model (5.1) under Scenario II based on 100 simulation replicates

5.2 Scenario II

To study the robustness of the proposed method, we generate the logarithm of the individual ODE parameters \({\varvec{\theta }}_i = \log \{(\alpha _i, \beta _i, \delta _i)^{\mathrm{T}}\}\) from a multivariate Student’s t distribution \(t_\kappa ({\varvec{\xi }}, {\varvec{\Sigma }})\) with the location parameter \({\varvec{\xi }}\), shape matrix \({\varvec{\Sigma }}=0.04\!\cdot \!{\mathbf {R}}\) which are the same as in Scenario I and degrees of freedom \(\kappa =3\). The time-varying fixed-effect function \(\eta (\cdot )\) is also the same as in Scenario I.

The true initial condition \(X_i(0)\) is generated from the \(X_i(0)=1.0+0.1b_{i0}\) where \(b_{i0}\) follows a Student’s t distribution, t(3), with degree of freedom 3. Then, our simulated data are generated as \( Y_i(t_{ij})= X_i(t_{ij})+\epsilon _{ij}\), where \(X_i(t_{ij})\) is the numerical solution of ODE (5.1) via the fourth-order Runge–Kutta algorithm evaluated at 201 equally spaced time points on [0, 8], and \(\epsilon _{ij}\) are the measurement errors. We consider two cases for generating the measurement errors: (i) \(\epsilon _{ij} \sim 0.3 \times t(3)\); (ii) \(\epsilon _{ij} \sim 0.6 \times t(3)\). With the ‘burn-in’ of the first 10,000 samples, we obtain 1000 posterior samples from the rest of the Markov chain with a lag of size 10. Table 3 displays the simulation results for the estimates of fixed-effect \((\alpha , \beta , \delta )^{\mathrm{T}}\) and time-varying function \(\eta (t)\) based on 100 simulation replicates. It shows that our proposed model using the SMN distribution is preferred than the conventional model using the normal distribution.

5.3 Scenario III

To study the robustness of the proposed method for mis-specified ODEs model, we consider the following two experiment designs:

Fig. 3
figure 3

The MISEs for \(X_i(t)\) in the Scenario III by performing the SMN model and normal model, respectively

III-(a): Suppose that the true ODEs are given by

$$\begin{aligned} \frac{\mathrm{d}X_i(t)}{\mathrm{d}t} = -\delta _i X_i(t)+\frac{D_i \delta _i Ka_i}{Cl_i(\delta _i - Ka_i)} \exp (-Ka_i t) + \beta _i \frac{(t-5)^2+119}{144},\quad t\in [0, 12],\nonumber \\ \end{aligned}$$
(5.2)

where \(D_i=400, 1\le i \le 50\), and the logarithms of the individual ODE parameters \(\{Ka_i,\beta _i,\delta _i,Cl_i\}\) follow a multivariate normal distribution \(N({\varvec{\xi }}, {\varvec{\Sigma }})\) with \({\varvec{\xi }}=\log \{(1.0,0.5,0.3,21.0)^{\mathrm{T}}\}\) and \({\varvec{\Sigma }}=\mathrm{diag}\{0.1,0.1,0.1,0.25 \}\). The true initial condition \(X_i(0)\) is generated from a normal distribution N(1.0, 0.01). Then, our simulated data are generated as \( Y_i(t_{ij})= X_i(t_{ij})+\epsilon _{ij}\), where \(X_i(t_{ij})\) is the numerical solution of ODE (5.2) via the fourth-order Runge–Kutta algorithm evaluated at 201 equally spaced time points on [0, 12], and the measurement errors \(\epsilon _{ij} \sim N(0, 0.25)\).

III-(b): Suppose that the true ODEs are given by

$$\begin{aligned} \frac{\mathrm{d}X_i(t)}{\mathrm{d}t} = -\delta _i X_i(t)+\frac{D_i \delta _i Ka_i}{Cl_i(\delta _i - Ka_i)} \exp (-Ka_i t),\quad t\in [0, 12], \end{aligned}$$
(5.3)

where the values of individual parameters are taken the same as in Scenario III-(a).

To implement the traditional method which is based on the normal distributions assumptions, the working ODEs model is specified by

$$\begin{aligned} \frac{\mathrm{d}X_i(t)}{\mathrm{d}t} = -\delta _i X_i(t)+\frac{D_i \delta _i Ka_i}{Cl_i(\delta _i - Ka_i)} \exp (-Ka_i t), \end{aligned}$$
(5.4)

which is a mis-specified model in Scenario of III-(a) and is a correct model in Scenario of III-(b). For our proposed method, the working ODEs model is specified by

$$\begin{aligned} \frac{\mathrm{d}X_i(t)}{\mathrm{d}t} = -\delta _i X_i(t)+\frac{D_i \delta _i Ka_i}{Cl_i(\delta _i - Ka_i)} \exp (-Ka_i t)+\beta _i \eta (t), \end{aligned}$$
(5.5)

We use the following mean integrated square error (MISE) criterion:

$$\begin{aligned} \mathrm{MISE}= \dfrac{1}{50}\sum _{i=1}^{50} \int _0^{12} [{\hat{X}}_i(t)-X_i(t)]^2\mathrm{d}t, \end{aligned}$$

to evaluate the performances of our proposed method and the traditional method. The results are displayed in Fig. 3, which show that our proposed method is robust on the mis-specified ODEs model.

6 Conclusions and Discussion

Ordinary differential equation models are elegant and popular models for describing the mechanism of complex dynamical systems. In this paper, we propose a semiparametric mixed-effects ODE model that contains both constant and time-varying parameters and considers the within-subject and between-subject variations simultaneously. We propose to use a class of scale mixture of multivariate normal distributions to model the random effects of ODE parameters and measurement errors in the data to obtain a robust estimation for the ODE parameters when the abnormal subjects and the abnormal measurement errors exist in the data.

Our proposed model can be framed in a Bayesian hierarchical model in which two latent variables are introduced to identify abnormal individuals and abnormal measurement errors. As one reviewer points out, one interesting feature of the proposed hierarchical model is the potential interpretability of the latent parameters as indicators of abnormal individuals. Another benefit of the proposed hierarchical model is feasibly performing an MCMC method to estimate the ODE parameters and latent variables. To improve the mixing convergence of the algorithm, an adaptive proposal was applied in the MH step. Finally, our proposed method is demonstrated by studying a genetic regulation experiment in which the regulatory activity of the transcriptional factor is modeled by a varying function over time. We show that our proposed model using the scale mixture of multivariate normal distribution is preferred in comparison with the conventional model using the normal distribution. Our simulation studies also show that our proposed model can obtain a more robust estimation for ODE parameters when using the scale mixture of multivariate normal distributions. As an example of applications, a class of often-used gene regulatory dynamic systems was studied in real data analysis; however, our proposed approach is very flexible and can be applied to provide robust inference results for other semiparametric mixed-effects dynamic systems.

Furthermore, the ODEs are generally designed based on the experts knowledge or the prior research. Our simulation results showed that there will have serious bias when applying a mis-specified ODEs model. A conservative approach is to consider a flexible semiparametric ODEs model:

$$\begin{aligned} \frac{\mathrm{d}X_i(t)}{\mathrm{d}t} = F(X_i(t); {\varvec{\theta }}_i)+r_i(t), \end{aligned}$$
(6.1)

where the parameters \({\varvec{\theta }}_i\) belong to a parameter space \({\varvec{\Theta }}\), the function F is known based on the experts knowledge, and \(r_i(t)\) describes the uncertainty of the ODEs. In the simulation studies, we have investigated the case of \(r_i(t)=\alpha _i+\beta _i\eta (t)\) and demonstrated the efficiency of our proposed method. We will go on to study this issue in the future. It is possible that the heavy-tailed residuals arise from stochastic transcriptional factor activity rather than how it is currently modeled as a smooth deterministic trend. We will study how to distinguish them in our future study.