Predicting Clinical Binary Outcome Using Multivariate Longitudinal Data: Application to Patients with Newly Diagnosed Primary Open-Angle Glaucoma

Primary open angle glaucoma (POAG) is a chronic, progressive, irreversible, and potentially blinding optic neuropathy. The risk of blindness due to progressive visual field (VF) loss varies substantially from patient to patient. Early identification of those patients destined to rapid progressive visual loss is crucial to prevent further damage. In this article, a latent class growth model (LCGM) was developed to predict the binary outcome of VF progression using longitudinal mean deviation (MD) and pattern standard deviation (PSD). Specifically, the trajectories of MD and PSD were summarized by a functional principal component (FPC) analysis, and the estimated FPC scores were used to identify subgroups (latent classes) of individuals with distinct patterns of MD and PSD trajectories. Probability of VF progression for an individual was then estimated as weighted average across latent classes, weighted by posterior probability of class membership given baseline covariates and longitudinal MD/PSD series. The model was applied to the participants with newly diagnosed POAG from the Ocular Hypertension Treatment Study (OHTS), and the OHTS data was best fit by a model with 4 latent classes. Using the resultant optimal LCGM, the OHTS participants with and without VF progression could be accurately differentiated by incorporating longitudinal MD and PSD. J o ur na l o f B iometrics & Bistatis t i c s ISSN: 2155-6180 Journal of Biometrics & Biostatistics Citation: Gao F, Miller JP, Beiser JA, Xiong C, Gordon MO (2015) Predicting Clinical Binary Outcome Using Multivariate Longitudinal Data: Application to Patients with Newly Diagnosed Primary Open-Angle Glaucoma. J Biom Biostat 6: 254. doi:10.4172/2155-6180.1000254 J Biom Biostat ISSN: 2155-6180 JBMBS, an open access journal Page 2 of 8 Volume 6 • Issue 4 • 1000254 field (VF) tests. This dataset constitutes the largest cohort of POAG with known date of diagnosis and provides a unique opportunity to evaluate disease progression over time. Detailed information on OHTS is described in Gordon et al. [13]. Our analysis cohort consisted data from the first eye of 277 participants who developed POAG and had at least 7 follow-up visits. This study has been approved by Washington University Institutional Review Board. The overall median follow-up was 13 years, with a median pre-diagnosis follow-up of 8 years and median post-diagnosis follow-up of 4.8 years. We used mean deviation (MD) and pattern standard deviation (PSD) as the longitudinal predictors. MD reflects the deviation of a subject’s vision from the age-matched reference population, while PSD reflects the perturbation of vision across different testing locations and is more sensitive for early visual damage. The primary endpoint was the status of VF progression at the end of follow-up, as determined by post-diagnosis VF using the 2-omitting algorithm [14], and 57 eyes (20%) were labeled as “progressive”. Table 1 showed the demographic and clinical characteristics at diagnosis for progressive and stable cohorts separately. The categorical data were summarized as counts and frequencies, while the continuous variables were summarized in means and standard deviations or median and inter-quartile range as appropriate. As an illustration, in the proposed prediction model we only included following 5 variables at diagnosis as the baseline covariates: age (Age, decade), intraocular pressure (IOP, mmHg), central corneal thickness (CCT, μm), pattern standard deviation at randomization (PSD0, dB), and horizontal cup/ disc ratio (HCD). These 5 covariates were chosen because they have been identified and validated as independent predictors for the risk of POAG development [13]. The baseline covariates were standardized to have mean 0 and variance 1 throughout the remainder of this paper. As such, for these variables the odds ratios (OR) from the regression models actually represented the effect per 1-SD change. data and clinical outcomes is fully captured by the latent class structure. Due to its flexibility in modeling even complex dependence and its ability to incorporate covariates, many methods have been proposed in recent years to incorporate longitudinal biomarkers under the framework of latent class models [8]. Garre et al. [9] fitted a joint latent class model for the prediction of time to graft failure using longitudinal serum creatinine concentration. They used two latent classes to capture the heterogeneity of serum creatinine trajectory. The serum creatinine in the first class was described by an intercept-only random effect model, while the trajectories in the second class was modeled using a segmented random-effect model with a random change-point indicating the initiation of rapid deterioration in patient’s renal function. Proust-Lima and Taylor [10] developed a dynamic prognostic tool that incorporates the evolution of prostate-specific antigen (PSA) to predict the risk of prostate cancer recurrence for patients who completed radiation therapy. The within-class PSA trajectory was described by a linear mixed model including two known parametric functions for shortand long-term PSA trajectories respectively. Their research also found that the predictive accuracy was not markedly influenced by the choice of different number of latent classes. In a recent report, Maruyama et al. [11] also applied a joint latent class analysis approach to identify distinct patterns of longitudinal craving scores and to predict the binary clinical outcome of smoking cessation. LCGM-based predictions are also computationally attractive. Once a joint model has been fitted, there is no need to integrate out the random effects and all the individualized prediction can be calculated analytically. This feature makes it especially appealing for individualized real-time dynamic prediction [8]. In this article, we developed a latent class growth model (LCGM) using multiple longitudinal biomarkers to predict visual field (VF) progression in patients with newly diagnosed POAG. We used the data from ocular hypertension treatment study (OHTS), a phase III randomized trial of the safety and efficacy of topical hypotensive medication in preventing the development of POAG. The proposed LCGM was developed taking a stage-wise approach and based on longitudinal mean deviation (MD) and pattern standard deviation (PSD). Specifically, a functional principal component (FPC) analysis [12] was first performed to the longitudinal MD and PSD separately, and FPC scores were estimated to approximate the trajectories of MD and PSD. Subgroups (latent classes) of participants with distinct patterns of MD and PSD trajectories were then identified based on the resultant FPC scores. The remainder of this paper was structured as follows. Section 2 described the OHTS data in more detail. Section 3 specified the latent class model for monitoring VF progression using longitudinal MD and PSD. Section 4 applied the method to data from OHTS and we concluded with a discussion in Section 5. Study Cohort: Ocular Hypertension Treatment Study (OHTS) OHTS is the largest randomized trial to date to test safety and efficacy of topical hypotensive medication in preventing the development of POAG. In OHTS, 1636 subjects were randomized to either observation or treatment with ocular hypotensive medication and followed for a median of 13 years. Outcome ascertainment was performed by specialized resource centers where readers were masked as to randomization assignment and information about the participant’s clinical status, and the attribution of abnormality due to POAG was performed by a masked Endpoint Committee. In OHTS, 362 eyes from 279 participants developed POAG during study. Disease progression was monitored regularly every 6 months using Humphrey 30-2 visual Variables Stable Eyes (N=220) Progressive Eyes (N=57) p-values race African America Others 68 (30.9) 152 (69.1) 23 (40.3) 34 (59.7) 0.176 gender M F 125 (56.8) 95 (43.2) 30 (52.6) 27 (47.4) 0.570 age (decades) 6.57 ± 0.95 6.64 ± 0.97 0.641 mean deviation (MD, dB) -1.28 ± 2.06 -2.90 ± 2.53 <0.001 pattern standard deviation (PSD, dB) 2.89 ± 1.26 3.68 ± 1.57 <0.001 intraocular pressure (IOP, mmHg) 21.6 ± 5.61 24.4 ± 7.77 0.003 central corneal thickness (CCT, μm) 561 ± 38.22 552 ± 39.0 0.123 horizontal cup/disc ratio (HCD) 0.53 ± 0.19 0.51 ± 0.20 0.373 median number of total follow-up visits (IQR) 25 (22-27) 23 (19-26) 0.127 median number of pre-diagnosis visits (IQR) 13 (8-19) 12 (7-17) 0.322 median number of post-diagnosis visits (IQR) 10 (6-15) 9 (5-13) 0.349 Table 1: Summary statistics of predictors at POAG diagnosis, where categorical variables are summarized as counts and frequencies, while continuous variables are summarized as means ± standard deviations (STD) or medians and interquartile range (IQR) as appropriate. Citation: Gao F, Miller JP, Beiser JA, Xiong C, Gordon MO (2015) Predicting Clinical Binary Outcome Using Multivariate Longitudinal Data: Application to Patients with Newly Diagnosed Primary Open-Angle Glaucoma. J Biom Biostat 6: 254. doi:10.4172/2155-6180.1000254 J Biom Biostat ISSN: 2155-6180 JBMBS, an open access journal Page 3 of 8 Volume 6 • Issue 4 • 1000254 Latent Class Growth Model (LCGW) for Monitoring VF Progression Figure 1 shows the diagram of the proposed LCGM for monitoring VF progression. The longitudinal series of MD and PSD are centered at the time of diagnosis, with black and red lines representing stable and progressive eyes respectively. Subgroups (latent classes) of participants are determined based on distinct patterns of MD and PSD trajectories. Rather than directly modeling the longitudinal change of MD and PSD in the LCGM, we first perform a functional principal component (FPC) analysis in MD and PSD separately to approximate its evolution over time. FPC analysis is chosen here because the trajectories in both MD and PSD, especially in these progressive eyes, tend to be non-linear and a co


Background
Primary open angle glaucoma (POAG) is a chronic progressive and potentially blinding optic neuropathy. Visual loss is often monitored using visual field (VF) test. Depending on different algorithms used, the VF test divides retina into 74 or 52 locations and the sensitivity of vision at each location is assessed. Glaucomatous damage is irreversible because nothing yet can restore the optic nerve cells once they are dead. However, the risk of blindness due to progressive VF loss varies substantially from patient to patient. A natural history study on a cohort of patients newly diagnosed with glaucoma found out that the mean deviation (MD) index, a global summary measure for visual field test, in some patients can deteriorate at an alarming rate of 10 decibels (dB) per year, while in others the MD virtually did not change in 6 years without any treatment [1]. Early identification of those patients destined to rapid progressive visual loss is crucial to prevent further irreversible visual field loss. Conversely, for patients whose visual fields remain stable, the cost and morbidity associated with over-treatment could be avoided. Unfortunately, differentiating progressive and stable visual function is challenging, partially due to the high intra-individual variability in visual field data [2]. As in many other chronic diseases, multiple longitudinal biomarkers are often collected. The usefulness of longitudinal biomarkers for early detection of disease or to monitor disease progression has been long recognized in many chronic diseases such as cancer or cognitive aging. It is therefore worth while to pursue whether the prediction of VF progression in patients with POAG could be enhanced using periodically recorded biomarkers.
In the last decade, many methods have been proposed to incorporate longitudinal biomarkers for the prediction of clinical outcomes in biomedical research. Henderson et al. [3] proposed a flexible joint model that connects longitudinal and survival models with two correlated latent Gaussian processes. They assumed that the longitudinal and survival data are conditionally independent given the linking latent process and covariates. Guo and Carlin [4] developed the above joint model using a fully Bayesian approach through MCMC methods. Rizopoulos et al. [5] developed a joint model via so called shared-parameter model. They assumed that all associations between longitudinal and survival processes are induced by random effects and that the two processes are conditionally independent given the random effects. Similar shared parameter models have also been used for the joint analysis of longitudinal data and binary outcomes [6] or multivariate longitudinal biomarkers and time-to-event data [7]. In all the above methods, univariate or multivariate linear mixed models are often used to describe the change of longitudinal biomarkers over time. These models usually assume that all individuals are drawn from a single homogenous population and that all the trajectories are smoothly distributed around the population average. An alternative approach for the joint modeling of longitudinal data and clinical outcome is a joint latent class model which uses a latent class growth model (LCGM) to describe trajectories of longitudinal biomarkers. It assumes that the population of subjects consists of heterogeneous but unobserved sub-populations (latent classes). A joint latent class model assumes that the dependence between longitudinal field (VF) tests. This dataset constitutes the largest cohort of POAG with known date of diagnosis and provides a unique opportunity to evaluate disease progression over time. Detailed information on OHTS is described in Gordon et al. [13].
Our analysis cohort consisted data from the first eye of 277 participants who developed POAG and had at least 7 follow-up visits. This study has been approved by Washington University Institutional Review Board. The overall median follow-up was 13 years, with a median pre-diagnosis follow-up of 8 years and median post-diagnosis follow-up of 4.8 years. We used mean deviation (MD) and pattern standard deviation (PSD) as the longitudinal predictors. MD reflects the deviation of a subject's vision from the age-matched reference population, while PSD reflects the perturbation of vision across different testing locations and is more sensitive for early visual damage. The primary endpoint was the status of VF progression at the end of follow-up, as determined by post-diagnosis VF using the 2-omitting algorithm [14], and 57 eyes (20%) were labeled as "progressive". Table 1 showed the demographic and clinical characteristics at diagnosis for progressive and stable cohorts separately. The categorical data were summarized as counts and frequencies, while the continuous variables were summarized in means and standard deviations or median and inter-quartile range as appropriate. As an illustration, in the proposed prediction model we only included following 5 variables at diagnosis as the baseline covariates: age (Age, decade), intraocular pressure (IOP, mmHg), central corneal thickness (CCT, μm), pattern standard deviation at randomization (PSD0, dB), and horizontal cup/ disc ratio (HCD). These 5 covariates were chosen because they have been identified and validated as independent predictors for the risk of POAG development [13]. The baseline covariates were standardized to have mean 0 and variance 1 throughout the remainder of this paper. As such, for these variables the odds ratios (OR) from the regression models actually represented the effect per 1-SD change.
data and clinical outcomes is fully captured by the latent class structure. Due to its flexibility in modeling even complex dependence and its ability to incorporate covariates, many methods have been proposed in recent years to incorporate longitudinal biomarkers under the framework of latent class models [8]. Garre et al. [9] fitted a joint latent class model for the prediction of time to graft failure using longitudinal serum creatinine concentration. They used two latent classes to capture the heterogeneity of serum creatinine trajectory. The serum creatinine in the first class was described by an intercept-only random effect model, while the trajectories in the second class was modeled using a segmented random-effect model with a random change-point indicating the initiation of rapid deterioration in patient's renal function. Proust-Lima and Taylor [10] developed a dynamic prognostic tool that incorporates the evolution of prostate-specific antigen (PSA) to predict the risk of prostate cancer recurrence for patients who completed radiation therapy. The within-class PSA trajectory was described by a linear mixed model including two known parametric functions for short-and long-term PSA trajectories respectively. Their research also found that the predictive accuracy was not markedly influenced by the choice of different number of latent classes. In a recent report, Maruyama et al. [11] also applied a joint latent class analysis approach to identify distinct patterns of longitudinal craving scores and to predict the binary clinical outcome of smoking cessation. LCGM-based predictions are also computationally attractive. Once a joint model has been fitted, there is no need to integrate out the random effects and all the individualized prediction can be calculated analytically. This feature makes it especially appealing for individualized real-time dynamic prediction [8].
In this article, we developed a latent class growth model (LCGM) using multiple longitudinal biomarkers to predict visual field (VF) progression in patients with newly diagnosed POAG. We used the data from ocular hypertension treatment study (OHTS), a phase III randomized trial of the safety and efficacy of topical hypotensive medication in preventing the development of POAG. The proposed LCGM was developed taking a stage-wise approach and based on longitudinal mean deviation (MD) and pattern standard deviation (PSD). Specifically, a functional principal component (FPC) analysis [12] was first performed to the longitudinal MD and PSD separately, and FPC scores were estimated to approximate the trajectories of MD and PSD. Subgroups (latent classes) of participants with distinct patterns of MD and PSD trajectories were then identified based on the resultant FPC scores. The remainder of this paper was structured as follows. Section 2 described the OHTS data in more detail. Section 3 specified the latent class model for monitoring VF progression using longitudinal MD and PSD. Section 4 applied the method to data from OHTS and we concluded with a discussion in Section 5.

Study Cohort: Ocular Hypertension Treatment Study (OHTS)
OHTS is the largest randomized trial to date to test safety and efficacy of topical hypotensive medication in preventing the development of POAG. In OHTS, 1636 subjects were randomized to either observation or treatment with ocular hypotensive medication and followed for a median of 13 years. Outcome ascertainment was performed by specialized resource centers where readers were masked as to randomization assignment and information about the participant's clinical status, and the attribution of abnormality due to POAG was performed by a masked Endpoint Committee. In OHTS, 362 eyes from 279 participants developed POAG during study. Disease progression was monitored regularly every 6 months using Humphrey 30-2 visual  Figure 1 shows the diagram of the proposed LCGM for monitoring VF progression. The longitudinal series of MD and PSD are centered at the time of diagnosis, with black and red lines representing stable and progressive eyes respectively. Subgroups (latent classes) of participants are determined based on distinct patterns of MD and PSD trajectories. Rather than directly modeling the longitudinal change of MD and PSD in the LCGM, we first perform a functional principal component (FPC) analysis in MD and PSD separately to approximate its evolution over time. FPC analysis is chosen here because the trajectories in both MD and PSD, especially in these progressive eyes, tend to be non-linear and a conventional linear mixed model may be inadequate to fully capture the changes of MD and PSD over time.

Functional principal component (FPC) analysis of longitudinal MD and PSD
Functional principal component (FPC) analysis for sparse longitudinal data is performed following methods developed by Yao et al. [12] and is briefly summarized in this sub-section. FPC analysis provides a powerful tool of dimension-reduction for functional data (i.e., curves) [15]. It examines the sample covariance structure and finds the set of orthogonal principal component functions that maximize the variance along each component. Taking MD as an example, let m ij denote the observation of MD for i th participant at the j th timepoint and X i (t) be the corresponding smoothed trajectory, then FPC analysis specifies that ( ) t µ represents the overall mean trajectory of MD; { } ik ξ is a vector of FPC scores for i th participant and acts as random effects in statistical models with FPCs. It is a set of uncorrelated random coefficients having mean 0 and variance λ k (i.e., eigenvalues); is the corresponding k th eigenfunction and describes the direction of deviation from the overall mean trajectory ( ) t µ , while an individual's FPC score ξ ik quantifies the extent to which their trajectory correlates with ( ) k t Φ ; Since the eigenvalues often decrease rapidly, the infinitedimensional curves can be well approximated by a very small number of FPCs.
In our FPC analysis, both MD and PSD series are smoothed using cubic B-splines with equally spaced knots, and the number of basis are selected based on an approximate leave-one-curve-out crossvalidation [12]. The first eigenfunction 1 The second principal component function can be obtained by first subtracting 1 ( ) t Φ from the original smoother function, * . This process can be performed iteratively for k=1, 2, …, K principle components.
Once the estimates of eigenfunction ( ) k t Φ and eigenvalue k λ are obtained, the FPC scores ik ξ of individual participants can be calculated via numerical integration by plugging in the estimated mean and eigenfunction. However, the conventional FPC procedure assumes that the density of measurements is sufficiently large. For sparse and irregularly measured longitudinal data like those in OHTS, the conditional expectation method [12] can be applied to obtain individual FPC scores, which are estimated as the best linear unbiased prediction (BLUP) given the observed trajectories and parameters, In this paper, we only extract the first FPC to approximate the longitudinal MD and PSD.

Developing LCGM based on the 1 st -FPC scores of longitudinal MD and PSD
Let Y i ={Y i1 , Y i2 } denote the resultant 1 st -FPC scores of MD and PSD for i th participant, Z i ={1, AGE i , PSD0 i , IOP i , CCT i , HCD i } be the vector of baseline covariates, C i represent the latent class membership of i th individual, and θ g be the vector of class-specific parameters that differentiate the G latent classes, with i =1, 2, …, N, and g =1, 2, …, G, respectively. Then the distribution of Y i is a mixture distribution defined as, The probability Pr( ) i C g = represents the size (mixing proportion) of latent class g in the mixture distribution and is modeled as a multinomial logistic regression, i i g f Y C g θ = describes the class-specific distribution of Y i and is assumed to follow a bivariate normal distribution. A common variance-covariance structure across all classes is often assumed in a typical latent class model. Due to high variability in both MD and PSD trajectories, we assume a class-specific variance in this model, but we also assume that the correlation between MD and PSD is fully captured by the latent classes, i.e., Given the estimated parameters g θ  and Y i , each individual can be assigned to the most likely class based on the probability of class membership (often termed as posterior class-membership probability), The best LCGM is selected by enumerating and comparing a set of competing models differing only in the number of classes. The model comparison is based primarily on the log likelihood values, including the Bayesian Information Criteria (BIC, with a smaller BIC indicating a better fit) and the Lo-Mendell-Rubin adjusted likelihood ratio test (LMR-LRT). A significant test of LMR-LRT indicates that the model with G-1 classes should be rejected in favor of the G-class GMM. In addition, we also specify that each class should include at least 5% of patients to ensure reliable within-class estimation.

Probability of VF progression and its 95% confidence interval
Once an optimal LCMG has been developed, the class-specific risk of VF progression can be readily obtained, then the probability of VF progression for a new patient with given MD and PSD series can be calculated in the following 3 steps: Estimating the 1 st -FPC score 1 i ξ and its variance 1 var( ) i ξ for the given MD and PSD series as outlined in Section 3.1; • Calculating the probability of class membership (posterior class-membership probability) given the baseline covariates and estimated 1 st -FPC scores; • Estimating the probability of VF progression as weighted average of class-specific risk, weighted by posterior class-membership probability.
Due to the complexity of modeling, we have to rely on numerical bootstrap resampling method to construct the 95% confidence interval of VF progression. Specifically, 1000 samples of the 1 st -FPC scores for MD and PSD are generated from a normal distribution, 1i ξ~N( In this article, FPC analysis is performed using the library "fpca" in the statistical package R [12], while the parameter estimation for LCGM is implemented using statistical package Mplus which is called via the library "MplusAutomation" in R [16].

Results: Application to Ocular Hypertension Treatment Study (OHTS) FPC analysis of longitudinal MD and PSD
Based on a leave-one-curve-out cross-validation [12], the trajectories in both MD and PSD could be best smoothed using 7 basis functions. The 1 st -FPC accounted for 89.7% and 90.8% of total variance in MD and PSD trajectories, respectively, and Figure 2 visualized the estimated trajectories based on FPC analysis. Figure 2A showed the mean trajectory of MD (solid line) and mean trajectory ± 1 steigenfunction of FPC (broken lines). In general, MD decreased over time, especially after POAG diagnosis. Figure 2A also indicated that the 1 st -FPC primarily captured the post-diagnosis variability [15]. To illustrate the performance of 1 st -FPC in approximating MD trajectories, we randomly selected 4 individuals who had different magnitude of estimated 1 st -FPC MD scores. All the observed trajectories (solid lines) were well approximated by the estimated curves (broken lines) based on the 1 st -FPC MD scores ( Figure 2B). A positive FPC score corresponded to a higher than average MD trajectory while a negative score corresponded to a lower than average MD trajectory.
Similarly, the 1 st -FPC of PSD captured the post-diagnosis variability and PSD increased over time in general ( Figure 2C). A positive FPC score corresponded to a lower than average PSD trajectory while a negative score corresponded to a higher than average PSD trajectory. Among these randomly selected 4 individuals, all the observed PSD trajectories (solid lines) were well approximated by the estimated curves (broken lines) using the 1 st -FPC PSD scores ( Figure 2D). Table 2 showed the fitting statistics of 7 competing LCGMs based on the resultant 1 st -FPC scores. The first 5 models differed only in the number of latent classes and they all assume class-specific variance with no correlation. Model 2 (3 classes) and Model 3 (4 classes) had similar performance based on Bayesian Information Criteria (BIC) and Lo-Mendell-Rubin adjusted likelihood ratio test (LMR-LRT). Since McLachlan and Peel [17] show that BIC tends to underestimate the number of classes, we chose Model 3 as the best fit to the OHTS data. Two additional models (both with 4 classes) were further considered, one (Model 6) assuming class-specific correlation and the other (Model 7) assuming a common variance-covariance structure across all classes, but none of them showed improved fitting statistics. Table 3 presented estimated parameters for the best model. The first class contained 84 eyes (30%), had the highest mean FPC score in MD and the lowest mean FPC score in PSD, and only 2 out of 84 eyes (2.4%) were progressed at the end of study. In contrast, the last class (16 eyes, 6%) had the lowest mean MD and the highest mean PSD, and all 16 eyes (100%) were progressed at the end of study. The 2 nd -and 3 rdclass were somewhere in between and accounted for 44% and 19% of the participants, with 10.6% (13 out of 123 eyes) and 48.1% (26 out of 54 eyes) being progressive, respectively. We therefore labeled Classes 1 to 4 as "Stable, low risk", "Stable, high risk", "Progressive, moderate", and "Progressive, rapid", respectively. Table 3 also showed the effects of baseline covariates on the classification of MD and PSD trajectories. Only age and PSD at diagnosis were predictive of MD and PSD trajectories. Treating Class 1 as the reference, for example, participants with old age were more likely to fall into the Classes 2, 3, or 4 (with OR=1.99, 2.27, and 5.93, respectively). Similarly, participants with high PSD at diagnosis were also more likely to be in the high risk groups (with OR=13.74, 35.87, and 78.26 for Classes 2, 3, and 4, respectively). Figure 3 presented the individual trajectories of MD and PSD within each class. Both MD and PSD series are centered at the time of diagnosis, with red and black lines representing progressive and stable eyes, respectively.

Individualized prediction of VF progression
Once the eigenfunction and eigenvalue of the 1 st -FPC were estimated and the optimal LCMG were developed, the probability of VF progression for an individual with given series of MD and PSD could be calculated analytically. Figure 4 illustrated the predicted probability of VF progression at the end of study for two randomly selected individuals, one with a progressive eye and one with a stable eye. The upper panel showed the observed (solid lines) and estimated (broken lines) MD series measured 5 years prior to diagnosis ( Figure 4A), the observed and estimated PSD series ( Figure 4B), as well as the predicted probabilities and corresponding 95% confidence intervals of VF progression at the end of study (Figures 4C-4H). The two individuals could not be differentiated given MD and PSD series measured 5 years prior to diagnosis. Given a series of MD and PSD measured 2 years prior to diagnosis, the progressive participant had a relatively higher probability of VF progression, but there was substantial overlap in the 95% confidence intervals (Figure 4, Middle panel). However, the two participants were well separated given MD and PSD series up to 1 year post diagnosis (Figure 4, Lower panel).
To investigate the overall discriminating ability of the prediction model, the method was then applied to all individuals given different length of MD and PSD series, and the ROC curves and area under ROC (AUC) from the estimated probabilities were compared. A logistic regression including the 5 baseline variables plus MD at diagnosis was also fitted as a comparison. Figure 5 showed that the ROC using prediagnosis trajectories had a slightly greater area (AUC=0.721) than the ROC using covariates at diagnosis only (AUC=0.705). It also showed that the predictive accuracy was improved in ROCs including more post-diagnosis VFs (with AUC=0.777, 0.845 and 0.879 for up to 1-, 2-, 3-year post-diagnosis MD/PSD series respectively). We admit that the predictive accuracy in Figures 4 and 5 may be optimistic because these individuals were also used in the model development. However, an external validation is in progression using the fellow eye of the OHTS participants as well as data from the Collaborative Initial Glaucoma Treatment Study (CIGTS), another randomized phase III trial to assess the treatment efficacy on patients with newly diagnosed glaucoma [18].

Discussion and Conclusion
In this paper, we developed a model to predict a binary outcome using multivariate longitudinal data. We used a latent class growth model (LCGM) to capture the dependence between longitudinal data and binary outcome as well as the association among multiple biomarkers themselves. LCGM is a semi-parametric statistical technique that describes individual trajectories over time after accounting for unobserved heterogeneity within the population. This model allows us to predict the probability of clinical outcome for a subject with particular covariates and series of biomarkers. We apply the method to dynamically predict visual field (VF) progression in patients with newly diagnosed POAG. The results showed that these participants with progressive visual loss could be accurately differentiated from those with stable vision by incorporating longitudinal biomarkers.
Many methods have been proposed in recent years to incorporate longitudinal biomarkers under latent class model framework [8][9][10][11] and most of them take a full parametric approach to describe the change of longitudinal biomarkers over time. In contrast, we performed a dimension reduction to convert repeated measurements into a scalar by a functional principal component (FPC) analysis, one of basic tools in functional data analysis (FDA). FDA is a collection technique about the analysis of information on curves or functions [15]. Unlike conventional methods such as ANOVA that only focus on a limited time points, FDA analyzes data about curves, surfaces, or anything else varying over a continuum. Measured data in a traditional FDA are often recorded over a densely sampled grid, i.e., in the format of infinite-dimension functions or curves such as those generated from automated sensing equipment. In a typical longitudinal study like OHTS, however, the data are often measured sparsely at discrete time points. The emerging techniques such as conditional expectation (PACE) [12] have provided a valuable bridge between functional data and classical longitudinal data. As illustrated in the OHTS data, PACEbased estimates of 1 st -FPC scores well approximated the trajectories of MD and PSD, and provided a flexible and yet powerful tool to incorporate multivariate longitudinal data.
There were several limitations in this study. One limitation was that it took a multi-stage approach to calculate the probability of VF progression and failed to account for the uncertainty associated with the estimated FPC scores. One particular feature of the OHTS is that both longitudinal predictors and clinical outcome are based on VF tests. That is, the status of VF progression is determined using VF tests by a point-wise linear regression, while the longitudinal predictors (MD and PSD) are two global indices derived from the same VF tests. A regression approach results in circular use of information and may be inappropriate for OHTS data. We therefore took a stage-wise approach to develop prediction model and the LCGM was only based on baseline covariates and longitudinal MD/PSD. Another limitation of this study was that, due to relatively short  post-diagnosis follow-up, VF progression was only determined as a single snapshot at the end of study and failed to take the onset time of progression into consideration. To deal with this issue, an external validation is in progression using data from the Collaborative Initial Glaucoma Treatment Study (CIGTS). In the validation study (with a median follow-up of 9 years), the status of VF progression will be determined not only at the end of study but also by a moving window over time.
In summary, despite its limitations, the proposed method provided a flexible and yet powerful tool to incorporate even complex multivariate trajectories for predicting distal clinical outcomes. Although the model in this paper is developed for the prediction of VF progression in patients with newly diagnosed POAG, the methodology is applicable to any chronic diseases with multiple longitudinal predictors. When the clinical outcome are defined independent of longitudinal predictors, however, we expect that a joint modelling approach that classifies participants based on the patterns of clinical outcome and longitudinal biomarkers simultaneously could better account for the association between two random processes and improve the predictive accuracy.