Modeling a Mixture of Linear and Changepoint Trajectories for Longitudinal Time-Series Data

Longitudinal changepoint data naturally arise in many applications. Examples include transition of core body temperature following the hypothermia therapy and prostate-specific antigen levels following treatment. Note that the trend change occurs due to a shock (e.g., treatment) to the system. Thus, an individual exhibiting a linear trend could be an indication of insignificant effects of the shock. One of the goals of this type of study is to investigate whether the shock is significantly associated in changing the trend of a trajectory. The bent-cable model characterizes the shock-through data using three phases: (a) an incoming phase characterizing the trend before the shock comes into effect, (b) a transition due to the shock, and (c) an outgoing phase due to the aftershock effects. In this article, we develop bent-cable methodology accounting for trajectories exhibiting either a linear trend or a trend change characterized by gradual or abrupt transition.


Introduction
Longitudinal changepoint trajectories with three phases -an incoming phase followed by a transition and an outgoing phase -are common in many applications including medical, biological and environmental sciences. Examples include the atmospheric concentrations of chlorofluorocarbons (CFCs) monitored from different stations all over the globe following the Montrèal Protocol's ban on CFC products (Khan et al., 2009;, transition of core body temperature following the hypothermia therapy to prevent brain damage of patients who have survived a cardiac arrest (Khan et al., 2013;Reynolds & Chiu, 2010), and prostate-specific antigen levels of the prostate cancer patients following treatment (Bellera et al., 2008). This type of data may be referred to as shock-through data, as the trend change occurs due to a shock (e.g., treatment) to the system. Note that an individual exhibiting a linear trend could be an indication of insignificant effects of the shock. Thus, among other related questions, one of the goals of this type of study is to investigate whether the shock has significant effects in effectively changing the trend of a trajectory. Therefore, proper characterization of the trajectories is of paramount importance to the researchers.
To illustrate the changepoint problem discussed above, we reintroduce here the example presented by Khan et al. (2013) to understand the effects of the hypothermia therapy on core body temperature (T c ). Hypothermia is a fatal condition which can occur when T c falls below 35 o C. By reducing a patient's T c , systematic metabolism, and therefore tissue oxygen demand, can be lowered. This is believed to attenuate the effects of ischemia and the subsequent reperfusion at the cellular level (Gordon, 2001;Rincon & Mayer, 2006). Therefore, hypothermia is frequently used by the clinicians as a therapeutic tool for cardiac arrest, stroke and brain injury. In contrast, since T c is typically associated with a critical threshold associated with a breakdown in the compensatory homeostatic mechanisms following severe hemorrhage (Rincon & Mayer, 2006;Connett et al., 1986), quantification of the transition of T c to early hypothermia is of great clinical interest. With this objective in mind, an experiment was conducted on 38 rats to collect information about the state of hypothermia and resuscitation strategy (Reynolds & Chiu, 2010;Reynolds et al., 2007). The rats were hemorrhaged from the carotid catheter with a constraint of a mean arterial pressure threshold of 40 mm Hg. The experiment was continued until the target shed blood volume (60% of the total blood volume) was achieved. In this process, T c was logged by automated remote data collection every 15 seconds for the duration of the trial. Figure 1 shows three T c profiles. We see that some rats may exhibit a seemingly linear trend (e.g., Figure 1a). In addition to roughly linear incoming and outgoing phases at either end of a profile, we also see that some rats may exhibit an abrupt transition in T c (e.g., Figure 1b), while others, a gradual transition (e.g., Figure 1c). That is, we have samples potentially coming from three different populations, say, L (linear), A (abrupt) and G (gradual), respectively, according to the type of transition for the underlying T c trend.
Tc Time (each one unit increment is 15 seconds) Time (each one unit increment is 15 seconds) Time (each one unit increment is 15 seconds) Figure 1. Core body temperature of three rat profiles out of 38, recorded during hemorrhage in a study to collect information about the state of hypothermia and resuscitation strategy.
The so-called broken-stick is a popular piece-wise linear model, and is a natural candidate to describe a continuous trend with an abrupt change (Hall et al., 2003;Lange et al., 1992;Kiuchi et al., 1995;Slate & Turnbull, 2000). Broken-stick models, however, may not be realistic in many applications where the longitudinal trajectories exhibit gradual changes (e.g., atmospheric concentration of CFCs (Khan et al., 2012)), or simply linear trends. So, it is desirable to relax the assumption of abruptness beforehand, and to formulate a model that is flexible enough to handle either type of changes, gradual or abrupt, as well as linear trends over time.
The challenge in longitudinal changepoint methodology lies not only in deriving a model that allows useful information about the population of interest, but also in developing method for statistical inference in the absence of standard regularity conditions (Chiu et al., 2006). In addition, existence of different types of transition (e.g., gradual and abrupt) across the individuals makes it an even more challenging problem to characterize the individual profiles. This is because it necessitates a flexible formulation of the model and inference technique to identify the type of transition exhibiting by a particular profile and to estimate the parameters associated with it.
Developed by Chiu et al. (2006) and Chiu & Lockhart (2010), bent-cable is an appealing changepoint model to characterize shock-through data with greatly interpretable regression coefficients that provide commonly sought information about a changepoint problem. It models a trajectory using three phases: a linear incoming phase characterizing the trend before the shock comes into effect, followed by a transition (gradual or abrupt) due to the shock, and a linear outgoing phase due to the after-shock effects (see Section 2 for details). Because of its appealing features in terms of flexibility, parsimoniousness and great interpretability, Khan et al. (2013) extended the bent-cable regression for longitudinal data by taking into account gradual and abrupt transitions as components of a mixture model; the outcome is the flexible mixture bent-cable methodology for samples that potentially come from two populations defined by the two types of transition: gradual and abrupt. They developed Bayesian inference implemented via Markov chain Monte Carlo (MCMC) techniques (Smith & Roberts, 1993). In analyzing the rat data, Khan et al. (2013) recognized that a few rats may exhibit neither an obviously gradual nor abrupt transition, but rather a linear decreasing trend. They reported that the rats exhibiting linear decreasing trends throughout were estimated to have arisen from Population A. Even if a few profiles exhibit linear decreasing trend in T c , improper characterization of Population L by A could be misleading to the clinicians in taking appropriate resuscitation strategy. Therefore, In addition to the identification and characterization of the populations A and G, characterization of the profiles which exhibit simply linear trends would be valuable to the clinicians. Another assumption of Khan et al. (2013)'s longitudinal bent-cable methodology is that the populations share common intercept and slope parameters to describe the linear incoming and outgoing phases. Because of this assumption, the methodology is not expected to properly characterize the populations, especially in describing the linear phases. In fact, our preliminary simulation study reveals that serious estimation bias can occur under the assumption of sharing common www.ccsenet.org/ijsp International Journal of Statistics and Probability Vol. 4, No. 3; linear parameters when, in reality, those differ across the populations.
In this article, we show that the simple linear regression model is a special case of the bent-cable model. We then develop our longitudinal bent-cable methodology to simultaneously account for trajectories exhibiting either a linear trend or a trend change characterized by gradual or abrupt transition. In particular, we take into account (a) samples potentially coming from three different populations: Populations G, A and L; (b) population-specific linear parameters to allow proper inference at the population levels; and (c) hierarchical Wishart priors to model the between-individual covariance structure.
Extensions (a)-(c) to the Khan et al. (2013)'s model constitute our flexible mixture bent-cable model. Note that Khan et al. (2013) assumed Wishart priors to model the between-individual covariance structure. As pointed out by Daniels & Kass (1999), our simulation study (Section 4) reveals that Wishart priors can be sensitive to the choices of the hyperparameters, while implementing (c) with sensible choices of the hyperparameters provides robustness in estimating the between-individual covariance structure.
The rest of the paper is organized as follows. We describe the formulation of our longitudinal bent-cable model in Section 2. Technical details of the MCMC algorithm are described in Section 3. In Section 4, we present extensive simulation studies to demonstrate that (i) the proposed methodology performs well to characterize the three populations, (ii) the method based on population-specific linear parameters performs quite well even when the populations share common intercept and slope parameters in reality, and (iii) the Wishart priors can be sensitive to the choices of the hyperparameters, whereas the hierarchical Wishart priors provide robust estimates of the covariance matrics under sensible choices of the hyperparameters. We then illustrate our methodology with an application to the aforementioned rat data (Section 5). We conclude this article with a discussion in Section 6.

The Flexible Mixture Bent-Cable Model
Suppose we have m individuals. For the j th individual ( j = 1, 2, . . . , m), let there be n j measurements, and let t jk denote the k th measurement occasion, k = 1, 2, . . . , n j . Without loss of generality, we assume that all individuals enter the study at time t j1 = 0. We model the corresponding response at time t jk , denoted by y jk , by the relationship where θ j is the vector of regression coefficients for the jth individual, f (·) is a function of t jk and θ j which characterizes the trend in the response over time, and ϵ jk is the random error component. The formulation of the model is completed using the bent-cable function (Chiu et al., 2006;Chiu & Lockhart, 2010), defined by where β j = (β 0 j , β 1 j , β 2 j ) ′ , α j = (γ j , τ j ) ′ and θ j = (β ′ j , α ′ j ) ′ . Here, β j is the vector of the linear parameters with β 0 j being the y-intercept in the incoming phase, and β 1 j and β 1 j + β 2 j , the slopes in the incoming and outgoing phases, respectively. The function q(.) characterizes the transition via two parameters τ j and γ j , and can be expressed as where 1(·) is an indicator function. Note that both τ j and γ j are positive for gradual transition, representing the center and the half-width of the bend, respectively; γ j = 0 reduces the bent-cable function to a piece-wise linear model for which q(t jk , α j ) = (t jk − τ j )1{t jk − τ j ≥ 0}; and τ j = γ j = 0 reduces the bent-cable function to a simple linear regression model for which q(t jk , α j ) = t jk and f (t jk , θ j ) = β 0 j +(β 1 j +β 2 j )t jk (see Figure 2). Chiu & Lockhart (2010) also defined the critical time point (CTP) at which the slope of the bent-cable function changes signs: the CTP is given by τ j − γ j − 2β 1 j γ j /β 2 j for gradual transition (Figure 2a), and τ j for abrupt transition (Figure 2c). For gradual transition, the CTP does not represent anything meaningful when the slope of the bent-cable function does not change signs, and therefore is not marked in Figure 2b. For abrupt transition, τ j is the time at which the slope changes signs (Figure 2c) or the slope changes magnitudes but not signs (Figure 2d).  Bayesian modeling of longitudinal data involves three levels of hierarchy (Davidian & Giltinan, 1995;Wakefield et al., 1994): (a) Level 1 that characterizes the longitudinal trajectories using the regression equations (1)-(3), (b) Level 2 that characterizes the between-individual variability via the models assigned to θ j 's, and (c) Level 3 that quantifies prior information about the random quantities of Levels 1 and 2. Note that the random error components, ϵ jk 's in (1) account for within-individual variability in the measurements around the regression f (t jk , θ j ). We assume at Level 1 that ϵ jk ∼ N(0, σ 2 j ). Under this model, the correlation of the repeated measurements from the same individual is induced by the individual-specific random coefficients θ j 's, and is called the conditionally independent specification of the within-individual variation (Laird & Ware, 1982). At Level 2, we assume that an individual potentially comes from one of three populations: G, A and L. Let each individual has probability p i to have come from Population i, where i = 1, 2, 3 refer to G, A , and L, respectively. Under this setting, the individuals form a sample from a mixture model. We can express the mixture model under a hierarchical structure, using the unobserved data where P(z i j = 1) = p i . Note that the artificial random variables z i j 's are introduced to facilitate implementing the Gibbs sampling for Bayesian inference (Robert , 1996).
Based on our choices of distributions for the relevant quantities at Levels 1-3, we write the hierarchical model as follows: where and Σ are, respectively, the i th population mean and the common covariance matrix (shared by all three populations) of β j ; ξ = (ξ 1 , ξ 2 ) ′ and Ψ are the mean and covariance of log (α j ) (for Population 1); η and σ 2 η are the mean and variance of log (τ j ) (for population 2); N r , LN r , W r , G and U stand for r-variate normal, r-variate lognormal, r-dimensional Wishart, gamma and uniform distributions, respectively; and δ stands for the Dirac delta measure (Previato, 2002), defined as In Equation (6), the vectors h are Gaussian means and matrices H are Gaussian covariances; d 0 2 and d 1 2 are, respectively, the shape and the rate parameters of the gamma distribution G( d 0 2 , d 1 2 ); and ν and A −1 are, respectively, the degrees of freedom and the mean of the Wishart distribution W p (ν, (νA) −1 ). Note that we prefer the Wishart distribution in the formulation of our model, because it is the conjugate prior of the inverse of a Gaussian covariance matrix. We must have ν ≥ r for a proper Wishart distribution, and setting ν = r is common in Bayesian inference which makes the Wishart prior nearly flat (Wakefield et al., 1994). However, Daniels & Kass (1999) pointed out that the specification of A can be quite influential; since A −1 is the mean of the Wishart distribution, A should be a close approximation of the covariance matrix. Our simulation study also reveals that the estimation of the covariance matrix is highly sensitive to the choices of A. In order to overcome this problem, we propose a hierarchical Wishart prior in (6), where A is a diagonal matrix with each of the diagonal elements modeled using a uniform distribution.
In our formulation of the model, Levels 1 and 2 are (4) and (5), and Level 3 is (6) with all the hyperparameters at Level 3 assumed known. Note that µ 1 and µ 2 characterize the linear parameters for Populations 1 and 2, respectively. For example, µ 11 and µ 11 + µ 12 are, respectively, the slopes of the incoming and outgoing phases for Population 1. Our choices of distributions also lead to the fact that (β 0 j , β 1 j + β 2 j ) ′ for Population 3 has a normal distribution with mean (µ 30 , µ 31 + µ 32 ) ′ ; therefore, we use µ 30 and µ 31 + µ 32 to characterize the linear trend for Population 3. Since our assumption for the α j 's involves lognormal distributions, we can use Level 2 medians, namely exp {ξ 1 } and exp {ξ 2 } for Population 1 and exp {η} for Population 2, to describe the transition locations (Khan et al., 2013).

Bayesian Inference
Bayesian inference is made based on the posterior distribution of a parameter; the posterior mean or median can be considered a point estimate of the parameter, whereas the posterior standard deviation measures the uncertainty of the parameter a posteriori. The Bayesian analogue of a frequentist confidence interval is usually referred to as a credible interval. A 100(1 − 2a)% credible interval is (r 1 , r 2 ), where r 1 and r 2 are the a th and (1 − a) th quantiles of the posterior density, respectively.
Statistical inference is carried out by MCMC, where we sample from the posterior distribution by the Metropolis within Gibbs algorithm (Smith & Roberts, 1993). We denote all the model parameters collectively by Θ, with Θ (0) being an arbitrary starting point. We generate an instance from the full conditional distributions of each of the components of the model parameters conditional on the current values of the remaining components. This completes a transition from Θ (0) to Θ (1) . Iterating through this cycle of generating data from the full conditional distributions gives a Markov chain {Θ (s) , s = 1, 2, . . . , T } (Davidian & Giltinan, 1995 (Lemieux, 2009) is then applied to the Markov chain for Bayesian inference. For example, the marginal posterior means of the intercept and the slope parameters for Population 3 can be approximated bŷ where the initial S − 1 iterations are considered as burn-in samples.
The implementation of the Metropolis within Gibbs algorithm for our flexible mixture bent-cable model is described as follows.
} is an arbitrary starting point, and {Θ (s) , z (s) } is its update at iteration s. Given {Θ (0) , z (0) }, our choice of order to update the components leads to the following sequence of drawing samples from the full conditional distributions π(·|·) (the full conditional distributions are presented in Appendix).
can be expressed up to a proportionality constant; is gamma.
2) Repeat step 1 for j = 2, 3, . . . , m, which lead to β (s+1) , α (s+1) , z (s+1) and σ (s+1) ; The above cycle completes a transition from Iterating through this cycle of generating data from the full conditional distributions leads to a Markov chain. Note that the full conditional distributions for all the components except (α j , z j ) can be expressed in closed-form expressions, and therefore we employ a Metropolis step to draw sample for (α j , z j ). We have written our own code in C to generate MCMC samples, which we subsequently analyze using the coda package (Plummer et al., 2012) in R (R Development Core Team, 2014). The interface between C with R is achieved using Rtools (R Development Core Team, 2014).

Simulations
We present three scenarios to evaluate the performance of the proposed methodology in analyzing longitudinal samples coming from three different populations: G, A and L.
• Scenario 1. We generate data using (1a) p 1 = 0.40, p 2 = 0.40 and p 3 = 0.20, (1b) p 1 = 0.40, p 2 = 0.20 and p 3 = 0.40, and (1c) p 1 = 0.20, p 2 = 0.40 and p 3 = 0.40. Note that L is dominated by G and A in scenario 1a, A is dominated by G and L in scenario 1b, and G is dominated by A and L in scenario 1c. We then fit the proposed model to the simulated data to investigate the performance of our methodology when one population is dominated by the other two populations. • Scenario 2. Recall that our proposed methodology takes into account population-specific linear parameters to allow proper inference at the population level; we present scenario 2 to demonstrate that the proposed methodology performs well even if the populations share common intercept and slope parameters, which is one of the assumptions of Khan et al. (2013)'s methodology. As of scenario 1, we consider (2a) p 1 = 0.40, p 2 = 0.40 and p 3 = 0.20, (2b) p 1 = 0.40, p 2 = 0.20 and p 3 = 0.40, and (2c) p 1 = 0.20, p 2 = 0.40 and p 3 = 0.40 • Scenario 3. We present scenario 3 to illustrate that the Wishart priors can be sensitive to the choices of the scale matrices A (scenario 3a), whereas the hierarchical Wishart priors lead to robust estimates of the covariance matrices under sensible choices of the hyperparameters (scenario 3b). We only consider simulated p 1 = 0.40, p 2 = 0.40 and p 3 = 0.20 for both scenarios 3a and 3b. We investigate the performance of our methodology with fairly vague, minimally informative priors. For multivariate normal, the hyperparameters -mean vector and covariance matrix -are commonly chosen to be a zero vector and a diagonal matrix, say, H, respectively, such that H −1 ≈ O, where O is a matrix with all its elements zero. We follow this convention which leads to a flat prior (Davidian & Giltinan, 1995). Similarly, we consider flat prior for the univariate normal distribution: a 0 = 0 and a 1 = 10 5 . For the hierarchical Wishart priors in scenarios 1 and 2, we choose (A 1 ) 11 ∼ U(100, 150), (A 1 ) 22 ∼ U(0, 10), (A 1 ) 33 ∼ U(0, 10), (A 2 ) 11 ∼ U(0, 10) and (A 2 ) 22 ∼ U(0, 10). For a gamma prior, small values of the hyperprior parameters lead to a flat prior; we consider d 0 = e 0 = 0.01 and d 1 = e 1 = 0.01 in all simulation scenarios. Finally, for the mixture probabilities (p 1 , p 2 , p 3 ), we choose the dirichlet prior Dirichlet(ω 1 , ω 2 , ω 3 ) with ω 1 = ω 2 = ω 3 = 1.
Simulations for each scenario are performed using both a large number of repeated measurements (n = n j = 100) and a moderate number of repeated measurements (n = n j = 24). Model parameter values are chosen to allow reasonable generalization and are given in Tables 1-4. Note that σ 2 j 's ( j = 1, 2, . . . , 40) were randomly drawn from www.ccsenet.org/ijsp International Journal of Statistics and Probability Vol. 4, No. 3; the U(0, 0.01) distribution. For each simulation, 500 data sets were generated, and 100, 000 MCMC iterations after burn-in were used to approximate posterior distributions per set. Posterior summaries were averaged over the 500 sets for each parameter, and the coverage probability of 95% credible intervals (proportion of such credible intervals out of 500 that capture the truth) was calculated.

Results for Scenario 1
Numerical results for scenario 1a are presented in Table 1, with the top three panels depict the results for the parameters relevant to only Populations G, A and L, respectively, the second last panel presents the results for the covariance matrix Σ which is shared by all the three populations, and the last panel shows results for the mixing proportions (p 1 , p 2 , p 3 ). We see that our methodology performs well with respect to all the parameters for both large and moderate number of repeated measurements (i.e., n = n j = 100 and n = n j = 24): averages of posterior medians are all close to the true parameter values, and coverage probabilities are all reasonably close to the nominal 0.95. In particular, coverage probabilities range from 0.938 to 0.992 for n = 100 and from 0.946 to 0.996 for n = 24. We observed similar results for scenarios 1b and 1c (numerical results for scenarios 1b and 1c are not presented in this article).

Results for Scenario 2
As mentioned in Section 1, the assumption of the Khan et al. (2013)'s methodology that the populations share common intercept and slope parameters is restrictive, because the bent-cable methodology under this assumption cannot properly characterize the populations, especially in describing the linear phases. Our preliminary simulation study supports this fact: serious estimation bias can occur under this assumption when, in reality, those differ across the populations. Simulation scenario 2 is introduced to evaluate the performance of our methodology under the assumption that the linear parameters do not differ across the populations, that is, we analyze the simulated data using our methodology which are generated assuming that the populations share common intercept and slope parameters. Numerical results for scenario 2a are summarized in Table 2. We see that our methodology performs www.ccsenet.org/ijsp Vol. 4, No. 3; well with respect to all the parameters for scenario 2a: averages of posterior medians are all close to the true parameter values, and coverage probabilities are all reasonably close to the nominal 0.95, ranging from 0.940 to 0.982 for n = n j = 100 and 0.938 to 0.996 for n = n j = 24. We observed similar results for scenarios 2b and 2c (numerical results for scenarios 2b and 2c are not presented in this article). Table 3. Simulation scenario 3a (Wishart priors) results with simulated p 1 = 0.40, p 2 = 0.40 and p 3 = 0.20, and n j = 24 for all j and m = 40: average of 500 posterior medians, and coverage of 95% credible intervals.

Results for Scenario 3
First, we present simulation results when the covariance matrices are modeled using the Wishart distributions (scenario 3a). To make the Wishart priors for Σ −1 and Ψ −1 nearly flat, we choose ν 1 = 3 and ν 2 = 2. The scale matrices A 1 and A 2 are considered diagonals in each simulation. Note that the diagonal elements of true Σ and Ψ are (125.00, 0.03, 0.03) and (0.02, 0.03), respectively. To investigate the sensitivity of the specification of Wishart distributions in modeling the covariance matrices, we consider three sets of priors: the diagonal elements of A 1 and A 2 are all relatively close to those of the true Σ and Ψ, respectively, with the closest one specified as Prior 1 1 :  Table 3. We see that our methodology with Prior 1 1 performs well with respect to all the parameters: averages of posterior medians are all close to the true parameter values, and coverage probabilities are all reasonably close to the nominal 0.95. However, as the priors for A 1 and A 2 deviate slightly from those of Prior 1 1 but still reasonably close to the truth, under-coverage and large bias result for at least one of the diagonal elements (variance parameters) of Σ and Ψ. For example, under Prior 1 1 and Prior 1 2, the coverage probabilities for (Ψ) 11 are 0.996 and 0.752, respectively; thus, the coverage under Prior 1 1 is 77% closer (Note 1) to the nominal 0.95 when compared with Prior 1 2. We see the worst scenario under Prior 1 3: the coverage probabilities for (Ψ) 11 , (Ψ) 22 , (Σ) 22 and (Σ) 33 are 0.052, 0.718, 0.494 and 0.532, respectively, each of which is substantially lower than the nominal 0.95. The above results support the fact that the bent-cable methodology can be sensitive to the choices of the hyperparameters for the Wishart priors: the model performs well when A 1 and A 2 are very close to the truth, whereas slight deviations from the truth can result in drastically lower coverage probabilities for the variance parameters in Σ and Ψ. Scenario 3a with n = n j = 100 also leads to the same conclusion (numerical results for n = n j = 100 are not presented in this article).
www.ccsenet.org/ijsp International Journal of Statistics and Probability Vol. 4, No. 3; Now, we consider the case when the covariance matrices are modeled using the hierarchical Wishart priors (scenario 3b). We consider three sets of priors as presented in Table 4, with Prior 2 1 is relatively the most informative, Prior 2 3 is the least informative, and Prior 2 2 is in between Prior 2 2 and Prior 2 3. Numerical results for n = n j = 24 are summarized in Table 4. Under all the three sets of priors, our methodology performs well with respect to all the parameters: coverage probabilities range from 0.946 to 0.996 for both Prior 2 1 and Prior 2 2, and from 0.948 to 0.998 for Prior 2 3. Averages of the posterior medians are all very close to the true parameter values as well. Scenario 3b with n = n j = 100 also leads to similar results (numerical results for n = n j = 100 are not presented here). In summary, the proposed methodology based on the Wishart priors for the covariance matrices performs well as long as A 1 and A 2 almost accurately approximate the covariance matrices Σ and Ψ, respectively, (e.g., Prior 1 1 in Table 3). Note that such accurate approximation can rarely be achieved in practical situations. Slight departures of A 1 and A 2 from the truth may result in severe under-coverage for the variance parameters in Σ and Ψ. In contrast, the proposed methodology based on the hierarchical Wishart priors performs well under sensible choices of the priors for A 1 and A 2 .
Numerical results for the σ 2 j 's are not presented in this article. For each simulation scenario considered in this study, the average of the posterior medians for each σ 2 j were very close to the truth, and the coverage probabilities were all close to the nominal 0.95.

Example: Rat Data Analysis
In Section 1, we introduced an experiment on 38 rats to understand the effects of the hypothermia therapy on core body temperature, T c . This dataset was analyzed by Khan et al. (2013) using their longitudinal bent-cable methodology which accounts for only gradual and abrupt transitions. In this section, we re-analyze this dataset using our proposed methodology which simultaneously accounts for transitional (gradual and abrupt) as well as linear trends over time. We also compare the fits of the proposed model with that of Khan et al. (2013)'s using the deviance information criterion (DIC) (Spiegelhalter et al., 2002).
As in Khan et al. (2013), we define time by t jk , k = 0, 1, . . . , n j where t j1 = 0 refers to the starting point of the www.ccsenet.org/ijsp International Journal of Statistics and Probability Vol. 4, No. 3; study for rat j ( j = 1, 2, . . . , 38), and each subsequent time increment is 15 seconds. Since any function h(Θ) of the parameters has its own posterior distribution, we consider the posterior median of h(Θ) as its estimate. For example, the CTP for Population G is h(Θ) = exp {ξ 2 } − exp {ξ 1 } − 2µ 11 exp {ξ 1 }/µ 12 , and therefore the posterior median of {exp {ξ (l) 2 } − exp {ξ (l) 1 } − 2µ (l) 11 exp {ξ (l) 1 }/µ (l) 12 , l = S , S + 1, . . . , T } is considered as its estimate. Similarly, the MCMC sample median of f (t jk , θ j ) is considered as the fitted valueŷ jk at time t jk for individual j. A fitted population curve is produced based on the estimates of the theoretical medians for β j and α j from Level 2.

Results
The DIC for the proposed model and the Khan et al. (2013)'s model fit to the rat data are approximately −16, 739 and −14, 729, respectively, suggesting that our methodology provides substantial improvement in fit, with DIC considerably less than that reported by Khan et al. (2013).   Rat 34 exhibits a linear increasing trend in T c , followed by a gradual transition and a linear decreasing trend thereafter, whereas rat 13 exhibits linear decreasing trends at either end of the profile, with a continuous transition between the two phases (i.e., the fitted slope of the bent cable changes only its magnitudes but not signs). In www.ccsenet.org/ijsp International Journal of Statistics and Probability Vol. 4, No. 3; total, sixteen rats are characterized by gradual transition of which twelve exhibit sign changes in the fitted slope of the bent cable function (i.e., an incoming phase with linear increasing trend and an outgoing phase with linear decreasing trend are joined by a quadratic bend; see rat 34 in Figure 3), and only four show no sign change in the fitted slope of the bent cable function (e.g., rat 13). Note that rat 32 exhibits a gradual drop in T c from the beginning of the study period. The missing incoming phase for this profile suggests that it does not obviously follow the shape of the bent cable. Our methodology picks up this anomaly adequately -a gradual transition is in progress for this rat from the beginning of the study period. Note that one of the appealing features of the multilevel model is that pooling information from many individuals leads to shrinkage towards the population, so that mild deviations of observed profiles from the bent cable structure do not hinder model fitting (Khan et al., 2013). This feature leads the estimates of the parameters characterizing a missing phase as in rat 32 to shrink towards their population counterparts. Rats 33 and 6 exhibit abrupt transition for whichγ = 0 andτ > 0. In total, eighteen rats are characterized by abrupt transition of which sixteen exhibit sign changes in the fitted slope of the bent cable function (i.e., an incoming phase with linear increasing trend is joined by an outgoing phase with linear decreasing trend; see rat 33 in Figure 3), and only two exhibit no sign change in the fitted slope of the bent cable function (e.g., rat 6). Finally, only four rat profiles show linear decreasing trend throughout (e.g., rat 8). Note that about 40%, 50% and 10% of the rats belong to Populations G, A and L, respectively, based on the posterior medians of p 1 , p 2 and p 3 (p 1 = 0.40,p 2 = 0.50 andp 3 = 0.10; see Table 5). In summary, the fits appear very reasonable as the observed data agree closely with the respective fitted curves, and the estimated τ and γ demonstrate that our methodology picks up the three types of trend adequately.     Our analysis reveals very similar results for Σ as reported by Khan et al. (2013). Small variation in the slope parameters (the posterior medians for the standard deviations of β 1 j and β 1 j + β 2 j are 0.008 and 0.011, respectively) suggests that all rats exhibit very similar rates of increase/decrease before/after the transition period. The posterior www.ccsenet.org/ijsp International Journal of Statistics and Probability Vol. 4, No. 3; median for the standard deviation of the intercept parameter β 0 j is 0.541, suggesting some variation in T c 's at the time of administering hemorrhage. Significant negative correlation between β 1 j and β 2 j ( corr(β 1 j , β 2 j ) = −0.425 with 95% credible interval (−0.695, −0.101)) indicates that the steeper the incoming slope going up, the bigger the drop in slope for the outgoing phase.
We summarize here a few interesting findings of our analysis.
• If a rat exhibits a significant increase in T c at the rate of approximately 0.005 o C per 15 seconds in the incoming phase, it is highly likely that it belongs to Population G. On the other hand, a rat exhibiting approximately a constant trend in T c in the incoming phase is more likely to have come from Population A or L. These phenomena are supported by the facts that the incoming slope for Population G is statistically significant, but the incoming slope for Population A and the slope for Population L are both statistically insignificant.
• If a rat exhibits approximately a constant trend in T c in the incoming phase, the T c will either remain roughly constant over time (Population L for which the rate of decrease is not significant) or will decrease at a significant rate of approximately −0.009 o C per 15 seconds after around 14.11 minutes from hemorrhage (Population A for which the rate of decrease after around 14.11 minutes from hemorrhage is significant).
• If a rat exhibits a gradual but nonlinear drop in T c after hemorrhage (i.e., missing incoming phase), it is more likely that it belongs to Population G.
• The hypothermia therapy appears to have no significant effects in reducing T c for about 10% of the rats which belong to Population L.

Conclusion
In many longitudinal studies, an individual may exhibit simply a linear trend over time or may initially exhibit a linear increasing/decreasing trend, followed by a transition period and then a linear decreasing/increasing trend. As far as the transition period is concerned, some individuals may exhibit a gradual transition, whereas for others an abrupt transition. Statistical techniques for this context revolve around broken-stick (piecewise linear) regression, typically ignoring temporal autocorrelation. Reynolds & Chiu (2010) provided a brief critique of the restrictiveness of an abrupt transition as imposed by the broken stick, and of the shortcomings of ignoring autocorrelation. They subsequently attempted to improve the statistical inference by applying the bent-cable time series model (Chiu & Lockhart, 2010). . This approach is a time-series extension of frequentist bent-cable regression of Chiu et al. (2006). For both models, asymptotic theory was developed; the focus was to quantify the nature of the transition between approximately linear phases by modelling the transition as a quadratic phase with unknown, non-negative width. However, inference is only based on a single observational unit at a time. Moreover, the authors admitted that the segmented nature of the model leads to poor asymptotic approximation in many practical settings involving finite samples. This shortcoming partially caused Reynolds & Chiu (2010) to discard various observational units from their analysis. Khan et al. (2013) extended the single-profile bent-cable model for longitudinal data which allows the transition of each individual's underlying temporal trend to be either abrupt or gradual, through inference for whether the individual belongs to Population A or Population G. Thus, inference for transition type is data driven rather than pre-assumed, the latter being required by many changepoint analysis methods. A Bayesian framework was considered for statistical inference, so that the concern about unsatisfactory performance of bentcable asymptotics is irrelevant.
In this manuscript, we develop a substantial extension of the Khan et al. (2013)'s bent-cable methodology for longitudinal data by taking into account samples potentially coming from three different populations: Populations G, A and L. We have tailored our work for the scientific context of the rat model as described in Section 1 to understand hypothermia therapy for human populations. Note that abruptness of change modeled by either a piecewise linear model or a smooth function is unrealistic as discussed in Khan et al. (2013). Instead, it would be more meaningful to distinguish the three types of trends as being samples from three different populations. The populations may also differ with respect to the rates of linear changes in the incoming and outgoing phases. As opposed to the Khan et al. (2013)'s bent-cable methodology, the extended bent-cable methodology presented in this article by taking into account the population-specific intercept and slope parameters in addition to the three types of transitions is expected to more appropriately characterize the populations under investigation. Moreover, the proposed hierarchical Wishart prior is extremely important to achieve robust inference, even when little is reliably known about the between-individual covariance structure. The performance of our methodology is evaluated www.ccsenet.org/ijsp International Journal of Statistics and Probability Vol. 4, No. 3; using extensive simulation study: our simulation study reveals that the proposed methodology performs well to allow proper inference at the population level by relaxing the assumption of universal abruptness, gradualness or linearity. Note that there is also penalized spline regression (Ruppert et al., 2003) to flexibly model the shape of the individual trends, though the flexible mixture bent-cable model is more appealing in the context described in this paper due to its parsimoniousness and greatly interpretable regression coefficients.
Our methodology provides a general framework to analyze longitudinal data that exhibit either a transitional trend (gradual or abrupt) between two approximately linear phases or simply a linear trend over time. No previous work has been done under a general regression modelling framework to classify observational units in the same longitudinal study as exhibiting either an abrupt or gradual transition or simply a linear trend. Such classification provides not only inference that is more realistic, but also insights into the underlying behaviour within a population. Aside from crucial information at the population level, another aspect of our longitudinal framework which researchers would find valuable is the inference for the temporal trend exhibited by individual observational units. Although our work is motivated by a case study in therapeutic hypothermia, our model can be applied to other longitudinal data that exhibit similar statistical characteristics as the rat study.
Some caution is required in using our proposed methodology for the following reasons: (1) the proposed methodology should be used if there is strong reason to believe that the sample potentially comes from the three populations mentioned above; Khan et al. (2013)'s bent-cable methodology with extensions (b) and (c) mentioned in Section 1 is the more appropriate technique when the sample potentially comes from populations G and A only, and Khan et al. (2009)'s bent-cable methodology with extension (c) of Section 1 is the appropriate technique when all the trajectories exhibit gradual transition, and (2) the proposed methodology is intended for longitudinal timeseries data, and may not be efficient to identify the three types of trends for longitudinal data with a small number of repeated measures.