Journal Pre-proof in trials

Objective: Correlated longitudinal and time-to-event outcomes, such as the rate of cognitive decline and the onset of Alzheimer’s disease, are frequent (co-)primary and key secondary endpoints in randomized clinical trials (RCTs). Despite their biological associations, these types of data are often analyzed separately leading to loss of information and increases in bias. In this paper, we set out how joint modelling of longitudinal and time-to-event endpoints can be used in RCTs to answer various research questions. Study Design and Setting: The key concepts of joint models are introduced and illustrated for a completed trial in amyotrophic lateral sclerosis. Results: The output of a joint model can be used to answer different clinically relevant research questions, where the interpretation of effect estimates and those obtained from conventional methods are similar. Albeit joint models have the potential to overcome the limitations of commonly used alternatives, they require additional assumptions regarding the distributions as well as the associations between two endpoints. Conclusion : Improving the uptake of joint models in RCTs may start by outlining the exact research question one seeks to answer, thereby determining how best to prespecify the model and defining which parameter should be of primary interest.


Introduction
Time-to-event and longitudinal outcomes are common endpoints in randomized clinical trials (RCTs). Clinical events may vary from the onset of a particular disease to the occurrence of disease-related events such as hospitalization, adverse reactions or death. As the time to reach the event may be considerable, it is common practice in RCTs to collect longitudinal information on intermediate outcomes such as physical functioning and quality of life alongside the time-to-event endpoint. In many instances, the longitudinal outcomes themselves are important (co-)primary or key secondary endpoints for evaluating the benefit of treatment. These outcomes are often strongly related to the event of interest; for example, cognitive decline and the onset of Alzheimer's disease [1], or a poorer quality of life and an increased risk of death in oncology trials [2]. Nevertheless, time-to-event and longitudinal endpoints obtained in RCTs are often analyzed separately without consideration of their interrelationships. This may not only lead to a lower consistency and efficiency of estimating treatment effects in RCTs, but also to increased bias and suboptimal use of information [3,4].
Combined assessment of longitudinal and time-to-event data using a joint model is a powerful approach to better characterize the effect of treatment [3][4][5]. Joint modelling refers to the simultaneous assessment of two or more outcomes while accounting for their interrelationships using a statistical model [6]. Typically, time-to-event endpoints are assessed using Cox models, whereas linear mixed effects models are commonly used for longitudinal data. In the joint modelling framework, the linear mixed model is linked (or joined) to the risk for the event. Joint models thereby explicitly address the association between the longitudinal and time-to-event endpoint [6]. One can use joint models not only to investigate how a longitudinal variable is related to the event of interest, but also utilize its value in RCTs to address informative missing data in longitudinal endpoints [7,8], obtain J o u r n a l P r e -p r o o f more efficient estimates of the treatment effect [2,3,9], or get a better understanding of the interaction between treatment and the disease [1,10,11].
Despite these clear benefits, and the widescale adoption of joint models in observational studies [12,13], their prospective use in RCTs remains negligible. Table 1 provides an overview of the statistical analysis conducted in 61 consecutive RCTs published in three major medical journals. Only three RCTs (4.9%) considered a combined analysis of endpoints, though none of them used a joint model. The main barriers for implementation in RCTs may be attributed to perceived difficulties in the interpretation of effect estimates and the apparent complexity of the model [14,15]. In this paper, therefore, we aim to explain the key concepts of joint models, and set out how joint modelling of longitudinal and time-toevent endpoints can be used in RCTs to answer various research questions, illustrated for Amyotrophic Lateral Sclerosis (ALS).

The valproic acid trial
The valproic acid (VPA) study in ALS will be used to illustrate the joint modelling framework [16]. ALS is a progressive neurological disorder leading to severe muscle weakness and eventually death. Median survival is three to five years, but can range from a few months to over several decades [17]. The extent of physical functioning loss, and its rate of progression, are strongly related to survival time [17,18]. Physical functioning is commonly quantified by a 12-item questionnaire, the ALS functional rating scale (ALSFRS-R) [19], with lower scores reflecting poorer function. While improving overall survival is the ultimate objective [20][21][22], preventing functional loss may be of equal importance to patients.
Moreover, one could hypothesize that a treatment that reduces or prevents functional decline would also prolong survival time. In Fig. 1 we provide the observed individual ALSFRS-R J o u r n a l P r e -p r o o f trajectories and survival curves in each treatment arm of the VPA study. The primary objective was to evaluate the effect of VPA on overall survival compared to placebo, with the ALSFRS-R as key secondary endpoint. The trial followed a sequential design; it was stopped early for futility and possible harm due to VPA.

The joint modelling framework
In Fig. 2A we provide the ALSFRS-R scores for three placebo-allocated patients in the VPA study. As can be seen, the ALSFRS-R trajectories over time are strongly diverging: while one patient remains stable over the course of 12 months, the other patient deteriorates rapidly.
This variability matches the variability observed in survival time, and the natural and biological association between the two endpoints [17]: a patient with minimal functional loss has, on average, a longer survival time than a patient with a more aggressive disease trajectory [9]. In the joint modelling framework, we explicitly model this connection between functional loss and the risk of death, and directly account for their interrelationship. As the name suggests, the joint model consists of two models: (1) a model that describes the patientspecific ALSFRS-R trajectory over time, and (2) a model that describes the risk of death based on the modelled ALSFRS-R trajectories.
This process is illustrated in Fig. 2A, in which the observed ALSFRS-R scores (dots) are used to estimate an underlying longitudinal trajectory for each individual patient (solid lines).
These modelled trajectories are subsequently used as input, or covariate, in a second model to determine the risk of death (or probability of survival). As a consequence, the patient's ALSFRS-R trajectory alters the instantaneous risk of death over time (Fig. 2B), which subsequently changes their probability of survival (Fig. 2C). In other words, the joint model links the ALSFRS-R model with the risk function that models survival time. N.B., Cox models with time-varying covariates are frequently used as alternatives to joint models. Cox models, however, are not appropriate in these settings as they assume that (1) measurement of the longitudinal outcome is not affected by the occurrence of death, (2) the longitudinal outcome remains constant between two visits or until death, and (3) the longitudinal outcome contains minimal measurement error or biological variation [6]. The joint model improves on this by estimating an underlying 'true' trajectory for the longitudinal endpoint, estimating its value at the exact time of death and linking this to the risk function for the event of interest.

Defining treatment response and the primary objective
In RCTs, we can use a joint model to disentangle the effect of treatment [23]. In the VPA study, for example, treatment may reduce the progression rate on ALSFRS-R and, as a consequence, lead to an improvement in survival time. In addition, VPA treatment may also affect important prognostic mechanisms that are not captured by the ALSFRS-R (e.g., weight loss or cognitive decline). As such, treatment may improve survival through two distinct pathways: (1) a pathway that is driven by a change in ALSFRS-R, and (2) a pathway that is driven by a mechanism independent of the ALSFRS-R. From this it follows that we may use the same joint model to address different research questions regarding the effect of treatment, namely: 1. What is the effect of treatment on ALSFRS-R progression? 2. What is the overall effect of treatment on survival time?
Importantly, the answer to each of these questions may be different, possibly leading to different conclusions from the same study results. In the following sections, we will provide the joint model output for the VPA study, discuss how these different questions can be answered and how results differ from common conventional methods. The exact research question that is being answered closely coincides with the estimand framework [24], meaning 'what needs to be estimated to address the research question'. The estimand of an RCT defines, among other things, the primary outcome of interest (e.g., ALSFRS-R, survival, or both) and the strategy for handling events such as death [25]. As we will discuss, the above three questions loosely match the while-alive, treatment policy and composite estimand, respectively [26].

Joint modelling of the VPA study
In Table 2 we provide an overview of the variables required for the joint model; the data consist of two datasets: (1) ALSFRS-R data and (2) survival information. Defining the joint model starts with a linear mixed model for the ALSFRS-R data. To illustrate, we assume that VPA linearly reduces the ALSFRS-R progression rate, which can be modelled as Here, ( ) is the modelled ALSFRS-R score for the i th patient at time t, 0 the subjectspecific baseline score, and 1 and 2 a subject-specific quadratic trajectory over time (reflecting the individual curves in Fig. 2A). In addition, 0 is a common intercept for both J o u r n a l P r e -p r o o f treatment arms (due to randomization); it reflects the average ALSFRS-R score at time of randomization; 1 is the average progression rate in the placebo arm in ALSFRS-R points per month; and 3 is the mean difference in progression rates between treatment arms; it reflects the treatment effect on the ALSFRS-R.
The second step requires a model for the risk of death that contains the ALSFRS-R model as covariate. In addition, we need a term to model any additional effect of treatment on survival that is not captured by the ALSFRS-R. The survival model, or hazard function, can be defined as where ℎ 0 ( ) is a baseline risk function, the additional effect of treatment not captured by the ALSFRS-R, and the effect of ALSFRS-R on the risk of death. Similar to a Cox model, and are simply expressed as log hazard ratio. Exp ( ) indicates how the risk of death changes with each unit increase in ALSFRS-R. It is important to note that η i (t) corresponds to the ALSFRS-R score predicted by the longitudinal model, and is not the actual observed ALSFRS-R score at time t. The baseline risk, ℎ 0 ( ), can be left unspecified similar to a Cox model, but could also be defined using parametric survival distributions. Leaving ℎ 0 ( ) unspecified avoids the need for assumptions, but makes obtaining absolute measures of risk less convenient [6,15].
After defining the ALSFRS-R and survival sub-models, these are then fitted simultaneously, thereby jointly optimizing all model parameters and adjusting for the interrelationship between ALSFRS-R and survival. As such, model parameters are estimated such that the J o u r n a l P r e -p r o o f model optimally represents both the ALSFRS-R and survival data [6,27]. This can be achieved by straightforward application of readily available software packages (e.g. the JM or JMBayes package in R [28], the jmxtst, merlin or stjm commands in Stata, or the JMfit macro in SAS [29]). The Supplementary Appendix provides the source code and datasets to replicate the model output for the VPA study presented in Table 3. As can be seen, the joint model output confirms the highly significant association between ALSFRS-R and survival time (-0.08 or a HR of 0.92, p < 0.001). In the following sections we will discuss how to address the three research questions outlined above.

Question 1: What is the effect of treatment on ALSFRS-R?
This question coincides with the "while alive" estimand where we are primarily interested in the patient's functional decline during life. We aim to determine the between-group difference in mean rate of functional loss, as measured by the ALSFRS-R total score, between VPA and placebo, over the follow-up period or until death (whichever occurs first).
From the model output in Table 1  The key challenge addressed by the joint model here is that the ALSFRS-R scores for patients who die, do not exist, and are 'truncated due to death' [30]. Other methods that ignore the informative missing data often lead to biased results [31]. For example, limiting the analysis to only patients who survived until week 20 (N = 24, Fig. 1A) would result in an average progression rate of -0.71 ALSFRS-R points per month in the placebo arm and -0.59 in the VPA arm. As the missing data mechanism (death) is associated with the ALSFRS-R, such an analysis not only underestimates the actual progression rate by more than 25%, but also reverses the effect of VPA, which now seems beneficial due to the imbalance in death rate between treatment arms (Fig. 1B).

Question 2: What is the overall effect of treatment on survival?
In this setting, we are primarily interested in VPA's overall treatment effect on survival, irrespective of the patient's functional decline, and estimate a "treatment policy" estimand [24]. By substituting ( ) in the hazard function (Section 3.1), it follows that VPA's overall treatment effect on survival at time t is the effect mediated through the ALSFRS-R plus the effect not captured by the ALSFRS, or + 3 , and reflects the log hazard ratio of VPA at a certain time point. Significantly, the hazard ratio of VPA is time-dependent in our model. This is a positive feature of joint models and distinctive from conventional survival models, which assume a constant treatment effect over time [1]. From Table 1  Compared to conventional survival models, it seems straightforward that simply comparing the survival curves in Fig. 1B should also reflect the overall treatment effect , or at least the J o u r n a l P r e -p r o o f average effect on survival during follow-up. If we fit a Cox model, however, we obtain a HR of 1.68 for VPA; 10-20% smaller than estimated by the joint model. This is due to fitting an incorrect model, where the difference between a Cox and joint model depends on the degree of association between the two endpoints [3,4]. This has important consequences for RCTs: detecting a hazard ratio of 1.68 with 80% power requires 388 patients (assuming a survival probability of 70%), whereas detecting a hazard ratio of 1.80 requires only 302 patients [32]; a difference in sample size of 22.1%.

Question 3: What is the combined effect of treatment on ALSFRS-R and survival?
Finally, in some settings, rather than having particular interest in one endpoint, we could also state two co-primary endpoints without a specific preference for either. In this setting, 'trial success' could be defined as a change in either the longitudinal and/or time-to-event endpoint, which coincides with a composite estimand. In a joint model, this could be accommodated by comparing a model with and without the treatment terms. In our example, we could compare a model where we remove the terms 3 and (Table 3) versus a model that includes these terms. This results in an ANOVA-like p-value, which tests whether the combined effect of treatment improves survival and/or ALSFRS-R, independent of their association (see Supplementary Appendix for an example) [9].
This approach is similar to combining both endpoints into a composite endpoint and testing whether this differs between treatment arms. An example is the Combined Assessment of Function and Survival [33]; the patient who died first is ranked worst, whereas the patient who survived and had the least ALSFRS-R loss, ranked best. In contrast to the joint model,

Final remarks
In this paper, we have illustrated how joint models can address different research questions in RCTs and have the potential to overcome the limitations of some commonly used conventional methods. Despite the considerable number of publications, and the common cooccurrence of correlated time-to-event and longitudinal data in RCTs, the uptake of joint modelling in actual trials remains negligible [34]. As illustrated in this paper, the interpretation of joint model parameters and those obtained by conventional methods are very similar, can be linked to clinically relevant research questions and, as shown by others, have nowadays been implemented in many software solutions [15].
The main challenges when formulating a joint model are the assumptions required regarding the distributions as well as the associations between the two endpoints [3,35]. As with any other statistical model, attention will need to be given to potential biases due to the inaccuracy of what we assume, and how this could impact our estimate of the treatment effect [34]. At the design stage of a trial it would be important, for example, to evaluate the impact of changing the trajectory of the longitudinal outcome, the survival pattern, the correlation structure between patients, or how treatment interacts with each endpoint [35].
Many of these considerations start with the exact research question one seeks to answer.
Positioning the joint model into the estimand framework, therefore, may better guide one in defining the exact research question, in how best to prespecify the analysis, and in determining which parameter should be of primary interest. This decision process should be a J o u r n a l P r e -p r o o f collaborative effort between experts, statisticians, clinicians, and regulatory scientists to increase mutual understanding of joint models, further mediate their implementation in realworld settings and, ultimately, better utilize the value of joint modelling in clinical trials.