A MIXED-EFFECTS JOINT MODEL WITH SKEW-T DISTRIBUTION FOR LONGITUDINAL AND TIME-TO-EVENT DATA: A BAYESIAN APPROACH

,


INTRODUCTION
In follow-up studies, a group of subjects can be followed over time to determine the outcome of exposures, processes, or effects of a characteristic. In such studies, longitudinal and eventtime data can be recorded simultaneously. In the literature, there are numerous approaches for modelling and evaluating the time-to-event and longitudinal processes separately. When the two processes are related, however, modelling them separately may give biased results. A joint modelling technique is needed to obtain an unbiased result, make valid statistical inferences, and assess the association between them [1].
Modelling the time-to-event and longitudinal data jointly is therefore becoming essential in many follow-up studies. In medical research, joint models of longitudinal biomarkers and time-to-event data have been a crucial statistical methodology and active research area [2,3,4,5,6,7,8]. In the formulation of joint models, the most common approaches postulated as submodels are mixed-effects models [1,9,10,11,7] for longitudinal measurements and a Cox's proportional hazard submodel [12,13,14,15,16] for survival data.
This study was motivated by a follow-up data on chronic kidney disease (CKD), which is a major global health problem that affects around half a billion people worldwide [17]. The majority of the prevalence of CKD is in low-and middle-income countries, and according to Shiferaw et al. [18], an estimated 35.52 percent of people with diabetes in Ethiopia had CKD prevalence. Several biomarkers of the progress of a patient's renal function can be measured over time, and event times can also be recorded. For instance, serum creatinine, albuminuria, and other biomarkers can be measured at each follow-up visit, and the glomerular filtration rate (GFR) of the patient can be estimated. In addition, the time to an event, i.e., time to endstage renal disease (ESRD), death, or transplant, can be recorded if the patient experiences an event during the follow-up period. Here, the longitudinal outcome (eGFR, for example) and the survival outcome (time-to-ESRD) are biologically related. That is, the chance of developing ESRD can rise when a CKD patient's eGFR declines. As a result, a joint modelling approach can be more appropriate to model the two outcomes and evaluate their associations.
Numerous studies have been conducted in the literature to analyse and assess the kidney function of CKD patients using cross-sectional data. However, joint models on CKD follow-up data have not been extensively studied in the literature. Armero et al. [19] developed a joint model for longitudinal measurements and competing event times to analyse CKD data in children. They proposed Cox PH and linear mixed-effects submodels in the construction of their joint model. A study conducted by Yang et al. [20] proposed a joint model with mixed-effects and linear regression submodels for longitudinal and event-time CKD data, respectively. Teixeira et al. [21] also postulated a joint model for longitudinal and competing-events peritoneal dialysis data. The majority of these earlier studies postulated mixed-effects models with normal distributions for longitudinal outcomes.
However, longitudinal outcomes in some applications may have asymmetric (skewed) distributions. For instance, in this paper's application data, the longitudinal outcome eGFR exhibits a skew distribution (Figure 1). Figure 1 also demonstrates the existence of between-and withinpatient variations and the asymmetric distribution of eGFR. Proposing a normal distribution for such skewed longitudinal data may yield biased results and invalid statistical inference [22]. As a result, recent studies suggest that more flexible distributional assumptions of model errors may be needed in order to accurately describe and model such complex longitudinal outcomes and make a valid statistical inference [23,24]. In this paper, therefore, we propose a mixed-effects joint model with a skew-t distribution for longitudinal and time-to-event data under the Bayesian approach.
The reminder of this paper is organized as follows. Section 2 briefly describes the methods. In this section, the construction of the joint model, parameter estimation and inference, and model comparison are briefly described. In Section 3, the simulation studies are presented. Analysis of the CKD data is presented in Section 4. Section 5 includes conclusion and suggestions for future work. and t i = (t i1 , ...,t i j , ...,t im i ) T , respectively. Let S i , T i and C i denote the observed event-time, true event-time, and censoring time, respectively, for the i th subject. Where T i can be computed as T i = min(S i ,C i ). Let ρ be an event indicator that can be 1, the event of interest, or 0, the censoring event.
The construction of the joint model consists of the longitudinal submodel (1) and the eventtime submodel (2). A skewed mixed-effects submodel is proposed for the longitudinal outcome y i and is defined by Where X i = (x 1i , ..., x pi ) T and R i = (r 1i , ..., r li ) T represent fixed-effects and random-effects covariate design matrices, respectively. Where p and l are dimensions of the design matrices.
β and ξ i are associated parameter vectors of the fixed-and random-effects covariates. Due to its computational simplicity, in most previous studies, the model errors were considered to follow the commonly used Gaussian distribution. However, as shown in the introduction section ( Figure 1), the real longitudinal outcome data of this study follow an asymmetric distribution. As a result, we propose a multivariate skew-t distribution [25] for model errors Where ϑ ε , µ ε = 0, Σ ε = σ 2 ε I m i , and δ ε = δ ε 1 m i represent the degree freedom , mean vector, covariance matrix, and skewness vector of ε i , respectively. Σ ξ represents the covariance matrix of ξ i ; and 1 m i = (1, ..., 1) T represents an identity vector.
For the time-to-event process, we propose an extended proportional hazard model: Where ξ i represents the effects of subject-specific longitudinal outcomes (random effects), Z i is the baseline covariates vector with coefficient vector γ, λ i (t; .) represents the risk rate of the event of interest at time t, and λ 0 (t) is the baseline hazard function. Given ξ i , the longitudinal outcome y i and the event-time T i are assumed to be conditionally independent. η is a parameter vector that denotes the level of association between the hazard rate of the event of interest and the subject-specific longitudinal outcome.
The survival function, the likelihood that subject i won't experience the event of interest after time t, is given by: In this paper, the piecewise constant function is used to specify the baseline hazard function λ 0 (t) and model the event-time more flexibly [26]. By partitioning the event-time into P intervals, 0 = l 0 < l 1 < . . . < l p−1 < l P < ∞, the baseline hazard can be specified as

Estimation and Inference.
The likelihood (ML) and Bayesian methods are the two approaches that are most commonly employed for parameter estimation in the literature. Since we proposed a joint model with multivariate skew-t distribution, estimating the parameters from the joint likelihood using the ML-method can be quite time-consuming. Hence, in this paper, to estimate all the parameters simultaneously, we adopted the Bayesian technique, which can reduce computational load and allow for the inclusion of prior knowledge for the parameters.
For Markov chain Monte Carlo (MCMC) computation, it is necessary to specify the proposed mixed-effects longitudinal submodel (1) with a skew-t distribution. As a result, to represent the skew-t distribution based on the stochastic representation [25], we introduced a random vector W εi = (W εi1 , . . . ,W εim i ) T and a random variable v ε . Thus, the hierarchical reformulation of the joint model can be given by: Let D D D be the overall observed longitudinal and event-time data, and Ω = {β , σ 2 ε , Σ ξ , δ ε , ϑ ε , λ , γ, η} be the set of all parameters of the joint model. Then, the joint likelihood function of D D D is given by The prior distribution for each parameter, hence, must be specified in order to approximate the posterior distribution of the hierarchically constructed joint model. Independent normal , and N(0, κ δ ε ) prior distributions are assumed for β , γ, η, and δ ε , respectively. Inverse-Wishart IW l (D ξ , ν ξ ) prior is assumed for Σ ξ and Inverse-Gamma IG(ρ ε1 , ρ ε2 ) prior is assumed for σ 2 ε . Independent gamma G(φ 1 , φ 2 ) priors are assumed for λ p . For the degree freedom ϑ ε , a truncated exponential Exp(ϑ ε0 )I(ϑ ε > 3) prior distribution is assumed. The hyperparameter matrices Ω β , D ξ , Ω γ and Ω η are assumed diagonal for convenient implementation.
Given the joint likelihood f (D D D|Ω) and joint prior distribution π(Ω), the joint posterior density can be derived as where π(Ω) is the product of the prior distributions of the parameters, To draw a sample from the conditional posterior distribution and estimate the posterior mean and standard deviation of each of the parameters, we utilised the Metropolis-Hastings algorithm within Gibbs sampler and implemented the MCMC algorithm in WinBUGS software.

Model comparison.
We compare joint models by considering different distributional specifications of the model errors from the longitudinal submodel: • Model II: A joint model with multivariate skew-normal (SN) distribution of ε i .
• Model III: A joint model with multivariate normal (N) distributions of ε i .

SIMULATION STUDIES
To assess and compare the performance of the joint models, simulation studies are carried out. To conduct the simulation, 400 individuals with eleven measurement times (equallyspaced), i.e., a total of 4,400 observations, are taken into account. The mixed-effects longitudinal submodel (1) with two binary covariates was used to simulate the longitudinal outcome data. In order to obtain skewed longitudinal data, each component of the model error vector ε i is first generated from a gamma distribution G(2, 1), and then deducted by two [17]. We set The Cox proportional hazard model (2) with constant baseline hazard (λ 0 (t) = 0.2) and four predictors (one continuous and three binary predictors) was used to simulate event-time data.
Censoring time C i is generated from an exp(0.5) distribution. We generated the random effects vector ξ i from a four-variate normal distribution with a mean of 0 and an identity diagonal covariance matrix.
Weakly-informative prior distributions for the parameters were taken into consideration when performing the Bayesian inference. That is, for each component of β , δ e , γ, λ , and η, a N(0, 100) prior is assumed. For σ 2 e , Σ ξ , and ϑ ε , an IG(0.01, 0.01), an IW (diag(0.01, 0.01), 2), and Exp(0.5) prior distributions, respectively, are assumed.    To specify and apply the proposed models to the CKD data, log-transformed eGFR is used as the longitudinal outcome, and diabetes, hypertension, and log-transformed measurement time are taken into account as associated covariates for the longitudinal submodel (1). Both the random intercept and random slope of time-effects are also considered. For the time-to-ESRD joint model (2), the baseline covariates age, sex, diabetes, hypertension, and the subject-specific eGFR process (the random effects) are taken into account to predict the hazard rate of ESRD for each patient. The time that a CKD patient does not yet experience an ESRD event until the end of the data collection period or withdraws from the follow-up is considered as a right-censoring time. We considered four intervals based on the quantiles of the observed event time to specify the baseline hazard function using piecewise constant functions.
In order to fit and compare the three joint models using the real CKD data, we employed the same specifications of the priors, MCMC computation techniques, and convergence assessment tools as in the simulation section (Section 3). The summary results of the fitted joint models' parameters with different distributions of model errors are presented in Table 2.   Since the 95% credible intervals of most of the parameters do not include zero, as we can clearly see from Table 2, each joint model results in slightly different but statistically significant estimates. In general, compared to the skew-t joint models (JModels-I), the Gaussian joint model (JModels-III) yield larger parameter estimates. In particular, the last two joint models yields relatively large estimates of the within-and between-subject variations of the longitudinal outcome. For example, the estimated values of σ 2 ε (within-patient variation of eGFR) and σ 2 ξ 1 (inter-patient variation of eGFR) are relatively large in JModel-III.
The estimate of the skewness parameter ( Table 2) is statistically significantly different from zero (δ ε = −0.54; 95%CI : −0.60, −0.48), confirming that the joint model with a skew-t distribution is more appropriate than the Gaussian joint model. In addition, we also used DIC to select the best-fitting joint model. As we can see from Table 2, the proposed joint model (JModel-I), in comparison to the other joint models, has a relatively lower DIC value. Thus, the smaller variance estimates, lower DIC value, and existence of significant skewness lead us to conclude that JModel-I is the best Bayesian joint model that fits the CKD data well, and we use its findings to interpret the results and draw conclusions.
Convergence diagnostic checking was done before interpreting the fitted chosen joint model's results and drawing conclusions. Figure 3 presents the trace plots, and Figure 4 shows the plots of the BGR, ACF, and density of the parameters from the proposed joint model with skew-t distribution. All the figures clearly show convergence. We assessed the proposed joint model's performance using simulation studies and applied the model to real chronic kidney disease data. The proposed joint model with a skew-t distribution was compared with joint models with skew-normal and normal distributions of model errors. The relative bias, root mean square error, coverage probability, and DIC were used as performance evaluation tools. A joint model with skew-t distribution better fitted the data compared to the other joint models. We then evaluated the association between or the impact of the patient-specific eGFR process on the time to ESRD and other covariates and interpreted the results.
The association parameters of the subject-specific longitudinal outcome eGFR and the timeto-ESRD processes were statistically significant, indicating that proposing a joint modelling approach for the two processes is reasonable. The findings of this study also suggest that the specifications of the distributional assumptions of model errors require special attention. According to the application's findings, diabetes, hypertension, and measurement time were significant predictors of and had a negative association with kidney function. Hypertension and diabetes are also significantly associated with high risks of experiencing end-stage renal disease.
We note that one can use our methodology by considering other additional biomarkers of kidney function and assessing their associations with the time-to-events and making valid statistical inferences and predictions (which are out of the scope of this study). In addition to the motivating CKD follow-up data, our methodology has broader applications whenever continuous outcomes and associated biomarkers are repeatedly measured, the time-to-event is recorded, and the basic submodels and joint model specifications are met. Our simulation and application studies revealed that our work contributed to this interesting study area by making use of a more flexible methodology to model skewed longitudinal and time-to-event data.
The methodology proposed in this paper has some extensions for future research. (i) In this paper, we considered only the time-to-ESRD as an event of interest, assuming independent censoring. That is, other failure events, such as a patient's death, are taken into account as independent censoring of the time-to-ESRD process. However, death can be regarded as a competing risk for ESRD because its occurrence may prevent the occurrence of ESRD. Thus, our methodology can be expanded by taking into account multiple failure types and proposing competing risk failure time submodels for the time-to-event processes. (ii) For the longitudinal outcome, eGFR, we proposed a fully parametric (linear) mixed-effects submodel. However, in some applications, the exact form of the relationship between the longitudinal outcomes and the time effects may be non-linear (irregular). For instance some CKD patients in our application data have non-linear trajectories of the outcome eGFR over time. As a result, a fully parametric modelling approach may not be flexible enough to model such types of complex longitudinal data. Thus, by considering non-parametric smoothing functions of time, our modelling approach can be extended to a more flexible semi-parametric modelling approach and ensure future work. We note that the investigation of these two issues has also been completed and is currently available. (iii) The methodology of this work could also be extended to a multivariate setting to accommodate multiple longitudinal outcomes that are repeatedly measured for each subject.