Personalized predictive modeling for patients with Alzheimer’s disease using an extension of Sullivan’s life table model

Background Alzheimer’s disease (AD) progression varies substantially among patients, hindering calculation of residual total life expectancy (TLE) and its decomposition into disability-free life expectancy (DFLE) and disabled life expectancy (DLE) for individual patients with AD. The objective of the present study was to assess the accuracy of a new synthesis of Sullivan’s life table (SLT) and longitudinal Grade of Membership (L-GoM) models that estimates individualized TLEs, DFLEs, and DLEs for patients with AD. If sufficiently accurate, such information could enhance the quality of important decisions in AD treatment and patient care. Methods We estimated a new SLT/L-GoM model of the natural history of AD over 10 years in the Predictors 2 Study cohort: N = 229 with 6 fixed and 73 time-varying covariates over 21 examinations covering 11 measurement domains including cognitive, functional, behavioral, psychiatric, and other symptoms/signs. Total remaining life expectancy was censored at 10 years. Disability was defined as need for full-time care (FTC), the outcome most strongly associated with AD progression. All parameters were estimated via weighted maximum likelihood using data-dependent weights designed to ensure that the estimates of the prognostic subtypes were of high quality. Goodness of fit was tested/confirmed for survival and FTC disability for five relatively homogeneous subgroups defined to cover the range of patient outcomes over the 21 examinations. Results The substantial heterogeneity in initial patient presentation and AD progression was captured using three clinically meaningful prognostic subtypes and one terminal subtype exhibiting highly differentiated symptom severity on 7 of the 11 measurement domains. Comparisons of the observed and estimated survival and FTC disability probabilities demonstrated that the estimates were accurate for all five subgroups, supporting their use in AD life expectancy calculations. Mean 10-year TLE differed widely across subgroups: range 3.6–8.0 years, average 6.1 years. Mean 10-year DFLE differed relatively even more widely across subgroups: range 1.2–6.5 years, average 4.0 years. Mean 10-year DLE was relatively much closer: range 1.5–2.3 years, average 2.1 years. Conclusions The SLT/L-GoM model yields accurate maximum likelihood estimates of TLE, DFLE, and DLE for patients with AD; it provides a realistic, comprehensive modeling framework for endpoint and resource use/cost calculations. Electronic supplementary material The online version of this article (doi:10.1186/s13195-017-0302-6) contains supplementary material, which is available to authorized users.


L-GoM Extension of Sullivan's Model
Under Sullivan's model [2], the ω-period (e.g., 10-year) DLE is computed for each individual i in the study cohort for discrete equally-spaced follow-up time intervals (e.g., half-years in Predictors 2), indexed by t (or τ), 0,1, , t   , using: 11   x i j t using individual-specific convex combinations of K time-varying probabilities (implicitly introducing, e.g., The two assumptions are used to specify the associated irreversible disability model depicted in Figure  . These assumptions are sufficient to generate the conditional DFLEs and DLEs, given the GoM scores for individual i, as shown in the Appendix. The model yields clinically interpretable conditional mortality and disability probabilities, and associated conditional timing measures, for individual AD patients. Section 2.5 discusses how these assumptions might be relaxed.

Conditional Probabilities of Disability and Death
Table S.1 displays the means, standard errors, and ranges of the 10-year mortality and disability probabilities for all participants and for subgroups 0-4 under the associated irreversible disability model, assuming that all survivors at the end of year 10 (Exam 21) died immediately thereafterconsistent with the 10-year censoring assumption in Table 1. Column 2 contains the average probabilities of being free of FTC at study intake among individuals in the indicated subgroups. Column 3 contains the average joint probabilities of being free of FTC at study intake and dying without ever transitioning to FTC. Column 4 contains the average joint probabilities of being free of FTC at study intake and transitioning to FTC prior to death. Column 6 contains the average probabilities of FTC at any point prior to death. Column 7 contains the average probabilities of FTC at study intake while column 8 contains the average probabilities of transitioning to FTC after study intake but prior to death. The probabilities in columns 4 and 8 represent the same two-stage 5 outcome: No FTC at intake followed by FTC followed by death. Column 5 repeats column 4 for later alignment with Table S.2. Overall, 91.8% of the sample was free of FTC at intake and 55.4% experienced FTC at some time during the 10-year follow-up. Subgroup 4 was distinguished by its relatively high average probability of FTC (68.4%), most of which was present at study intake (54.3%). This contrasted with subgroup 2 which also had a relatively high average probability of FTC (62.7%), but most FTC occurred after study intake (56.8%).

Conditional DFLEs and DLEs
Table S.2 displays the corresponding means, standard errors, and ranges of the 10-year conditional DFLEs, conditional DLEs, and related conditional times to death and FTC for all participants and for subgroups 0-4. Conditional DFLE (col. 2) is the average time lived between (1) study intake and (2) death, onset of disability, or time ω (i.e., 10 years), whichever occurred first, given that disability occurred after study intake. Among those with no FTC at study intake (col. 2), death and FTC represent competing risks [1]: Time to Death (col. 3) represents the case that death occurred prior to FTC and Time to FTC (col. 4) the case that FTC occurred prior to death. Conditional DLE (col. 6) is the average time lived between (1) the onset of disability and (2) death or time ω, whichever occurred first, given that disability occurred prior to time ω. The Conditional DLEs were further restricted to include (Conditional DLE_1; col. 7) or exclude (Conditional DLE_2; col. 8) disability present at study intake. Total Time to Death (col. 5) is the sum of Time to FTC (col. 4) and Conditional DLE_2 (col. 8).

Decomposing the Sullivan Life Expectancies
The irreversible disability model was designed to uniquely decompose the Sullivan life expectancies. Conversely, the Sullivan life expectancies can be regenerated from the associated irreversible disability model. Specifically, the DFLEs and DLEs in Table 1 Table 1 are the sums of the corresponding DFLEs and DLEs, it also follows that the TLEs are probability-weighted sums of the corresponding Conditional DFLEs and Conditional DLEs in Table S.2. Alternatively, the TLEs in Table 1 can be decomposed as probability-weighted averages of Time to Death, Total Time to Death, and Conditional DLE in Table S.2 (cols. 3, 5, and 7), using the corresponding probabilities in Table S.1. Using the above relationships, the average Sullivan DFLE of 4.03 can be decomposed into the product of a 91.8% average probability of no FTC at intake and the average Conditional DFLE of 4.39 years; the average Sullivan DLE of 2.06 can be decomposed into the product of a 55.4% average probability of FTC and the average Conditional DLE of 3.72 years; and the average TLE of 6.09 years can be decomposed into the probability-weighted average of 5.49, 6 Table 1. The associated irreversible disability model extracts them from the SLT/L-GoM model-important because Time to FTC among patients free of FTC at intake is a major endpoint in AD research [15][16][17].

Discussion
The Sullivan [2] life table calculations were designed to generate estimates of DFLE, DLE, and the associated survival functions, without having to specify an underlying transition model or to restrict the analytic sample to AD patients free of disability at the intake examination. The SLT/L-GoM extension allows these same estimates to be generated on an individual-specific basis. Moreover, because the L-GoM extension uses all relevant covariates from the longitudinal panel data for all AD patients meeting the intake criteria, it represents a highly efficient mode of analysis.
The associated irreversible disability model provides a unique and simple decomposition of the Sullivan life expectancies and it clarifies the relationship of the Sullivan life expectancies to other life-expectancy calculations that might be obtained from application of multi-state competing-risk models [1]. The assumptions of no recovery and conditional independence may be oversimplifications, but any resulting biases must be compensating because the Sullivan estimates of the DFLEs and DLEs are unique. Hence, the associated irreversible disability model provides a readily-generated baseline for evaluation of more complex disability models.
The conditional independence assumption is not required for the standard form of the Sullivan [2] model. It is required for the L-GoM application. For sufficiently large J with K ≥ 4, the assumption is expected to be valid within the set of J covariates measured at each examination. The quality of the assumption for each longitudinal sequence of morbidity/disability covariate values should be validated. In the case of FTC, our validation was based on the goodness of fit of the estimated vs. observed values within the five rational subgroups shown in Figure 3.
If the conditional independence assumption cannot be validated for a specific disability covariate, one can estimate separate L-GoM models for the disability states shown in Figure S.1, stratifying the intake population according to their initial disability status. In the case of FTC, the specification for the disabled subpopulation would look like the current specification except that FTC would be dropped as an internal variable. The specification for the nondisabled subpopulation be would similar except that the mortality/survival covariate would include transition to FTC as a competing risk for termination of the disability-free process [4]. The separate L-GoM models could be linked by joint (i.e., constrained) estimation of the individual GoM scores or by constraining the separately estimated GoM scores to be correlated for individuals observed in both components of the stratified analysis [18].
Finally, we emphasize that the irreversible disability assumption is not required for the standard form of the Sullivan [2]

APPENDIX
This Appendix has two sections. Section 1 presents the technical details of the L-GoM extension of the Sullivan life table model including treatment of genetic and other unchanging fixed covariates such as sex and ApoE status; this section also describes the weighted maximum likelihood estimation procedure and presents the estimated parameters and related test statistics. Section 2 characterizes the relationship of the DFLEs and DLEs obtained from the Sullivan life table model to the corresponding conditional DFLEs and DLEs obtained from the associated irreversible disability model. Thus, to generate nontrivial ML estimates, the observations must be pooled over individuals in the study population. Two methods are considered herein; both involve products over individuals of the likelihood contributions in eqn. (S.3).

L-GoM EXTENSION OF SULLIVAN MODEL
The first method assumes that the parameters it t pp  and it t   are constant over individuals; this is the defining assumption for Sullivan's cohort model [2,20]. ML estimates for the survival parameters it t pp  were generated using Kaplan and Meier's product-limit method [21]; ML estimates for the disability parameters it t   were generated using the observed prevalence rates at time t. Imai and Soneji [20] proved that Sullivan's cohort estimator is both consistent and unbiased while Zehna's theorem [5] implies that it is an MLE. The observed survival and disability rates, and their products, were calculated as indicated above for each time t for all participants and for select subgroups of AD patients in Predictors 2 (see . The observed survival and disability rates were needed to validate the corresponding model-based estimates produced by the SLT/L-GoM model. The second method assumes that the parameters it p and it  vary over individuals, which, with additional assumptions described below, yields the SLT/L-GoM model. This initial assumption was implemented using two sets of "hidden" parameters to generate it p and it  -with one set (denoted kt p and kt  ) shared over individuals and the other (denoted ik g ) not shared over individuals, where the subscripts k, k = 1,…, K, index the AD subtypes within the L-GoM model.

15
The hidden parameters are assumed to satisfy two fundamental bilinear relations: , which are the defining assumptions for L-GoM.
To apply the second method, we had to extend the scope of eqn. (S.3) beyond a single selected morbidity/disability variable to include other relevant covariates as internal variables. The extension was justified because L-GoM provides a realistic and comprehensive model of the AD process, as demonstrated in the current and prior papers [13,22]. Moreover, some extension was required for identifiability (see below); once identifiability is achieved, the accuracy of the estimates of the hidden parameters improves as additional covariates are introduced (Stallard and Sloan, 2016).
We require new notation both to extend the scope of eqn. (S.3) to include J > 1 internal covariates and to bridge from the Sullivan life table notation to the standard longitudinal GoM notation: 1. Define X(i, 1, t) as the dichotomous morbidity/disability covariate, with outcomes denoted by x(i, 1, t), and with corresponding conditional outcome probabilities denoted by 1 ( ,1, )    By reserving the index j = 0 for the prospective mortality/survival indicator, we highlight the special role of this variable within the extended Sullivan life table model. It must always be included within the set of internal variables in the SLT/L-GoM model. In contrast, specific morbidity/disability covariates indexed by j = 1,…, J may or may not be included in different versions of the SLT/L-GoM model.
Under these conditions, we can extend the scope of eqn. (S.3) as follows: The final assumption needed for the L-GoM model is that the temporal progression of AD is governed solely by a finite set of K×K upper triangular cumulative transition matrices Vt = [vkht], t = 0, …, ω, covering the time intervals from 0 to ω, that irreversibly change the probabilities faced by individuals at each time t from those of lower-numbered subtypes to higher-numbered subtypes. By convention, V0 = I, an identity matrix, and is a K×K upper triangular transition matrix associated with the unit time interval (t, t + 1). The Vt's govern the cumulative changes in the probabilities over time which induce a natural ordering on the subtypes with the K th subtype representing the endpoint of the AD process. Because the Vt's contain the shared u-parameters, they fully subsume the temporal variability of the ( , , ) jx i j t kt  's and this dramatically reduces the number of parameters needed to fit the model to the study data. Hence, eqn. (A.1c) can be rewritten as: 's constitute a second set of shared parameters. MLEs of the g-, v-, u-, and λ-parameters can be obtained using eqn. (A.2), as detailed in [9] and as modified in Section 2. The large sample statistical properties of L-GoM [9] apply directly to the Sullivan extension.
i JT  , was 520, which was far in excess of that needed for identifiability. The value of K must be sufficiently large that the conditional independence assumption holds, in which case the GoM scores constitute a set of underlying (latent) causal variables [23, 24]. K = 4 was found empirically to be the smallest K value that meets this condition for AD (see Section 2 and [13]).
The representation of the temporal progression of AD via eqn. (A.2) implies a duality wherein the temporal changes may be associated with either the g-or λ-parameters. Under the treatment above, the conditional probability that X(i, j, t) = x(i, j, t) = l, given the initial vector gi, is: . Thus, the temporal changes in jl kt  reflect those captured by the matrix . t V An equivalent alternative treatment introduces a set of time-varying GoM scores [13], defined as , which leads to the following expression: Under this treatment, the temporal progression of AD is governed solely by the cumulative transition matrices, Vt, which irreversibly move individuals (vs. probabilities in eqn. A.3) from lower-numbered to higher-numbered subtypes at rates determined by the vector gi. 3) also implies that the gik's are implicit functions of the measured covariates X(i, j, t), making L-GoM substantially more flexible than alternative models that rely on explicit functional forms (e.g., logistic regression) to meet the 0-1 boundary constraints on estimated probabilities. The parametrization in eqn. (A.4) emphasizes the role of the temporal changes in it γ as the primary drivers of AD progression; the outcome probability structure is represented as constant over time.

Fixed Covariates
Six fixed covariates required special treatment to satisfy L-GoM's conditional independence assumption. They were included in the likelihood only for Exam 1; the corresponding terms for subsequent exams can be deleted by rewriting eqn. (A.2) as:

Weighted Maximum Likelihood Estimation and Testing
Three versions of L-GoM were estimated for the Predictors 2 Study (Table A.1). For all three we set K = 4 which was established as the minimum acceptable size for K in prior analyses of the Predictors 1 and 2 cohorts where K = 4 was overwhelmingly favored over alternatives K = 1, 2, or 3 using AIC and BIC model selection rules [13]. K = 4 was also strongly preferred over K = 5 using AIC and BIC rules for suspected AD cases in the National Long Term Care Survey in our (unpublished) reanalysis of data compiled by Kinosian et al. [28], who had selected K = 5 over K = 4 based on Wilks' [29] chi-squared test (Wilks' test). Another application of GoM to crosssectional AD registry data using Wilks' test reported K = 5 for neuropsychological data and K = 6 for clinical data [30]. Stallard and Sloan [9] noted that Wilks' test effectively cuts the AIC penalty (i.e., twice the degrees of freedom; or, equivalently, twice the number of free parameters) in half for large J-which explained the different K-values resulting from Wilks' test and the AIC rule. Our choice of K = 4 for the Predictors Study was strongly supported by the AIC and BIC rules and is consistent with the principle of parsimony. There was no support for K < 4.
Model 1 (M1) was based on 80 internal variables which included the mortality/survival indicator, six fixed covariates, and 73 time-varying covariates (all had p < 0.05; all variablespecific p-values were computed using Wilks' test for K = 1 vs. K = 4). In addition, six timevarying summary variables were fitted separately as an external set; the associated λ-parameters were estimated conditionally on the g-and v-parameters previously estimated with the 80 internal variables for Model 1; all six were statistically highly significant (p < 0.001). The 73 internal timevarying covariates constituted a comprehensive set of variables spanning 11 symptom domains, including cognition, functioning, dependence, and behavior (see Table A.2). This set included all covariates that had been established as significant L-GoM predictors in prior analyses of the Predictors 1 and 2 cohorts [13]. The overall log-likelihood for Model  Model 3 (M3) was designed to improve the fit for Exam 1 over the fit obtained from Model 1 for two reasons: (1) to exploit Exam 1's status as the single examination whose GoM scores were most likely to span the full range of the GoM-score continuum between the prognostic subtypes; and (2) to facilitate the use of the results in subsequent personalized predictive modeling applications for which the only available data were from a single exam at or shortly after diagnosis which would substitute for Exam 1 in the GoM-score estimation. Model 3 was estimated using weighted MLEs [31, 32] for which the relative weights for Exam 1 vs. Exams 2-21 were increased from 1.000 to 3.993, at which point the loss in both the weighted and unweighted log-likelihood fit for Exams 2-21 (1,175) exactly equaled the number of parameter constraints implicitly induced by the unequal weighting in Model 3. Assuming that these conditions are maintained as the sample size increases, we can ensure that Model 3 has the same limit point as Model 1, and, hence, the same asymptotic properties as Model 1; see [9] for discussion of these properties.

Comment 1-Akaike
[33] showed that ML estimates are biased, i.e., the estimated log-likelihood exceeds the expected log-likelihood by an amount approximately equal to the number of free parameters (k) in the model, leading to his celebrated formula: AIC 2ln 2 Lk    . When multiple models are considered, the optimal estimates are obtained from the model with the largest expected log-likelihood or, equivalently, with the smallest AIC statistic. Hu and Zidek [32] presented a general approach for selecting optimal data-dependent weights in the context of weighted ML estimation for multiple independent but similar populations using Akaike's [33] approach as their starting point. We used Akaike's [33] approach to constrain the expected log-likelihood for Model 3 to exceed the expected log-likelihood for Model 1 using the data-dependent weights defined above. Comment 3-The Bayesian approach to mixed membership modeling [35] can be applied to L-GoM estimation if sufficient information is available to specify the Bayesian prior distributions. Wang [36] showed that weighted ML estimation can approximate Bayesian estimation when the information needed to specify the prior distributions is lacking. Our application of L-GoM aimed to estimate the GoM scores for individual AD patients enrolled in Predictors 2 and to do so using weighted ML estimation without having to specify a prior distribution for the GoM scores. The resulting GoM-score estimates (Table A.6) can be used to inform specification of the prior distribution of these scores for subsequent Bayesian analyses of new AD patient populations, providing an alternative to the standard Bayesian assumption that the prior distribution is Dirichlet-an assumption typically made because of its mathematical tractability rather than its external validity [37, 38]. The top ranked predictor using the BIC or AIC statistics, or the p-values, was Equivalent Institutional Care, which was the source variable for FTC. Among the external variables, the largest BIC and AIC statistics (and smallest p-values) were for Dependence Scale Score, BDRS Score, CDR Rating, MMSE Score, and Psychiatric Symptoms, in that order.  (Table 3).  4) show how these matrices may be combined with either the λ-parameters or the GoM scores to represent the temporal progression of AD. The GoM-score trajectories, defined using T it t i  γ V g , are weighted combinations of the K pure-subtype trajectories. Hence, the K puresubtype trajectories (Table A.5) are in one-to-one correspondence with the K rows of the transition matrices, i.e., the sequence of top rows, taken as a set, forms the pure-subtype trajectory for subtype 1; the sequence of second rows, taken as a set, forms the pure-subtype trajectory for subtype 2; etc.

GoM Scores
Table A.6 displays, for the 229 participants in Predictors 2, the age group at intake, sex, subgroup assignment, GoM scores, and the complete set of individual-specific LE and probability measures summarized in Tables 1, S.1, and S.2. These may be used in combination with the parameters in Tables A.3 and A.4 to reproduce the predicted outcomes in this paper as well as to simulate other outcomes not reported herein but covered by the variables in Table A.3.

Subgroups vs. Subtypes
Subgroups are distinct from subtypes in L-GoM. Subgroups 1-4 were defined by high proximity to L-GoM subtypes 1-4; subgroup 0 was defined by low proximity to all subtypes. Subgroup 2 was the largest ( 87 N  ) and subgroup 4 the smallest ( 11 N  ). Table A.7 displays the means and standard deviations of the GoM scores by subtype, both overall and by subgroup. The subgroup means ranged from 0.008 to 0.774, with the highest values along the main diagonal of the table where the subgroup and subtype indexes are equal. The subgroup standard deviations ranged from 0.027 to 0.173. The subtype means ("Total" row) were largest (0.397) for subtype 2 and smallest (0.099) for subtype 4. The subtype standard deviations ("Total" row) ranged from 0.169 for subtype 4 to 0.324 for subtype 1. The subgroup standard deviations were substantially smaller because the subgroups were substantially more homogeneous, due to their rational design [41], than the overall study sample.

Mortality and Disability Calculations
The observed survival values in Figure 2 were computed using the Kaplan-Meier product-limit estimator [21] with mortality recorded at the time of the next scheduled exam following the date of death. The estimated survival probabilities were computed by averaging the individual survival probabilities within subgroups, where the individual survival values at time t were computed as products of the pit's, using . The individual-specific estimated FTC rates increased monotonically over time; the occasional reversals for subgroups in Figure 3 were due to temporal changes in the composition of the subgroups due to different patterns of missing FTC assessments among survivors over time.
The estimated average survival probabilities on the left side of Figure 4 were computed as in Figure 2. The estimated average survival probabilities on the right side of Figure 4 are averages of individual-specific survival probabilities each of which was computed using Sullivan's method to apply the individual's estimated FTC-disability-free probabilities to his/her overall survival 24 function values at each examination; i.e., for each individual i at time t, the FTC-disability-free survival probability was computed as

Standard Error Calculations
The mean values in Table S.2 were computed as probability-weighted expected values of the indicated conditional life expectancy estimates for individuals, using the corresponding probabilities defined in Table S.1 as the weights. Potthoff et al. [44] showed that the usual calculation of the standard errors for the case of differential weighting was downwardly biased. We used their eqn. (1.11) to remove this bias.

ASSOCATED IRREVERSIBLE DISABILITY MODEL
The conditional DFLEs and DLEs are supplementary outputs from the SLT/L-GoM model; they are not required for calculating the Sullivan DFLEs or DLEs. They are, however, important outputs from standard transition models. We present the technical details of their calculations for completeness and to facilitate comparisons with similar outputs from other models.  which shows that the Sullivan DFLE (e.g., Table 3; col. 3) is a weighted average of (1) the conditional DFLE for individuals free of disability at intake (e.g., Table S.2; col. 2) and (2) zero for individuals who are disabled at intake.

Conditional DLE
The Sullivan DLE and the conditional DLE are similarly related. Let where e x denotes the exponential function exp( ) x , and where one can show via integration by parts that Di e is the continuous-time version of the quantity Di e , defined in eqn. (S.1), with the upper limit of integration extended from ω to ∞. Hence, 12) which shows that the Sullivan DLE (e.g., Table 3; col. 4) is a weighted average of (1) the conditional DLE for individuals who are disabled at intake or become disabled some time later (e.g., Table S.2; col. 6) and (2) zero for individuals who never become disabled (because they die prior to onset of disability).       Notes: j denotes variable number; l indexes response levels; N jl is the number of respondents for (j,l ); λ 1 -λ 4 are the hidden outcome probabilities for the K =4

Alternative Decomposition of Sullivan DLE
subtypes; E(π jl i ) is the model-based mean response rate; Obs. Rate is the observed mean response rate. Boldface red font indicates H jk > 0.5.  Notes: Time t runs from 0 to 20 half-years (0-10 years); all V t are upper triangular 4x4 matrices; V 0 is the identity matrix; V 10 is repeated at top of right-hand column. Boldface red font highlights every 5 th matrix. The 4 rows of the transition matrices are in one-to-one correspondence with the 4 pure-subtype trajectories shown in Table A.5, obtainable by re-sorting the rows of the 21 matrices so that t changes more rapidly than k .  Notes: The pure-subtype trajectories are sets of time-varying GoM scores that change from one examination to the next. Each pure-subtype has its own trajectory beginning with its unique unit value at time 0. Time t runs from 0 to 20 half-years (0-10 years). Boldface red font highlights every 5 th set of GoM scores. The pure-subtype trajectories are in one-to-one correspondence with the transition matrices shown in Table A.4, obtainable by re-sorting the rows of the trajectories so that k changes more rapidly than t .