Estimating correlation between multivariate longitudinal data in the presence of heterogeneity

Background Estimating correlation coefficients among outcomes is one of the most important analytical tasks in epidemiological and clinical research. Availability of multivariate longitudinal data presents a unique opportunity to assess joint evolution of outcomes over time. Bivariate linear mixed model (BLMM) provides a versatile tool with regard to assessing correlation. However, BLMMs often assume that all individuals are drawn from a single homogenous population where the individual trajectories are distributed smoothly around population average. Methods Using longitudinal mean deviation (MD) and visual acuity (VA) from the Ocular Hypertension Treatment Study (OHTS), we demonstrated strategies to better understand the correlation between multivariate longitudinal data in the presence of potential heterogeneity. Conditional correlation (i.e., marginal correlation given random effects) was calculated to describe how the association between longitudinal outcomes evolved over time within specific subpopulation. The impact of heterogeneity on correlation was also assessed by simulated data. Results There was a significant positive correlation in both random intercepts (ρ = 0.278, 95% CI: 0.121–0.420) and random slopes (ρ = 0.579, 95% CI: 0.349–0.810) between longitudinal MD and VA, and the strength of correlation constantly increased over time. However, conditional correlation and simulation studies revealed that the correlation was induced primarily by participants with rapid deteriorating MD who only accounted for a small fraction of total samples. Conclusion Conditional correlation given random effects provides a robust estimate to describe the correlation between multivariate longitudinal data in the presence of unobserved heterogeneity (NCT00000125).


Background
Estimating the correlation coefficients between outcome variables is one of the most important analytical tasks in epidemiological and clinical research. When independent observations are available for each outcome variable, Pearson's correlation coefficients are often used. In many epidemiological and clinical studies, however, the same individuals are also followed repeatedly over time for a series of measurements with regarding to the collection of outcome variables. Such multivariate longitudinal data provide a unique opportunity to study the joint evolution of these outcomes over time. A simple Pearson's correlation coefficient is no longer applicable for assessing correlation because a multivariate longitudinal model has to account for two types of correlations simultaneously, namely the serial correlation between observations at different time points within a subjects and the cross correlation between observations on different outcome variables at each time point [1].
During the last few decades, many statistical models have been proposed in statistical literature for the analysis of multivariate longitudinal data and the most popular one is the joint mixed model which links separate linear mixed models by allowing their model-specific random effects to be correlated [2]. The advantages of this approach include well-established theory [1], efficiency gains [3], and commercially available software packages for model fit [4,5].
More importantly, a joint random-effect model allows assessing correlation between different outcomes. It can provide a succinct summary for not only how the evolution of one outcome variable is correlated to the evolution of another outcome ("association of evolution"), but also how the correlation between outcomes changes over time ("evolution of association") [6].
Various extensions have been made in recent years in joint mixed models for a wide range of research fields [2,7]. A typical joint mixed model often consists very few (usually 2 or 3) continuous outcome variables, takes a linear parametric functional form over time, holds a multivariate normality distribution for the vector of random effects, and assumes that the outcome variables are mutually independent given random effects. Fieuws and Verbeke [6] relaxed the conditional independence assumption by allowing the error components of the outcomes to be correlated. Gueorguieva and Sanacora [3] developed a joint mixed model to incorporate different types of longitudinal outcomes and analyzed repeated measurements on ordinal and continuous variables measuring the same underlying disease severity over time. Fieuws and Verbeke [8] took a pairwise fitting approach to construct the variance-covariance matrix for the joint distribution of random effects and assessed the hearing threshold profiles (22 outcomes) in a nature aging study. Putter et al. [9] proposed a latent class joint model (e.g., assuming the vector of random effects following a multinomial rather than normal distribution) to identify lung cancer patients with distinct patterns regarding their evolvement of denial over time and to assess the association between denial patterns and the trajectories of other physical and emotional longitudinal measurements. Recently Luo et al. [10] extended the model to non-longitudinal setting and proposed a bivariate linear mixed model to estimate correlation coefficients in cross-sectional data from a family-type clustered design. For other approaches in the analysis of multivariate longitudinal data as well as for more details in recent development, see the comprehensive reviews by McCulloch [7], Bandyopadhyay et al. [11], Verbeke et al. [2] and the references therein.
Almost all of the aforementioned joint mixed models (except Putter et al. [9]) also assume that subjects are drawn from a single homogenous population and that the individual trajectories are smoothly distributed around the population average. In this article, we intended to address the issue when such a one-size-fitall assumption is violated, using participants with newly diagnosed primary open angle glaucoma (POAG) from the Ocular Hypertension Treatment Study (OHTS). POAG is a chronic progressive optic neuropathy and the rate of vision deterioration can vary substantially from patient to patient. For example, a natural history study on a cohort of patients newly diagnosed with glaucoma found out that the mean deviation (MD) index, a global summary measure for visual field test, in some patients can deteriorate at an alarming rate of 10 dB (dB) per year, while in others the MD virtually did not change in 6 years without any treatment [12]. It is therefore important to take potential heterogeneity into consideration when correlation is assessed.
In this article, we demonstrated strategies to assess the correlation between multivariate longitudinal data in the presence of potential heterogeneity. Specifically, bivariate linear mixed model (BLMM) was fitted to assess both the "association of evolution" (i.e., correlation between random effects) and "evolution of association" (i.e., marginal correlation over time), and then conditional correlation (i.e., marginal correlation given random effects) was calculated to describe how the association between longitudinal outcomes evolved over time within a "true" but unobserved subset of individuals. The disease heterogeneity was also approximated by subgroups (latent classes) from latent class analysis (LCA) where population variability is captured by differences across subgroups in the shape and level of their trajectories [13]. These subgroups were incorporated into BLMMs to further understand the impact of heterogeneity on correlation. Our method was similar to the latent class joint model by Putter et al. [9] except that our primary goal focused on correlations rather than trajectories. The remainder of this paper was structured as follows. Section 2 described the OHTS data in more detail. Section 3 specified the bivariate linear mixed model (BLMM) model to assess correlation among multivariate longitudinal data. The method was applied to data from OHTS in Section 4 and a simple simulation study was also performed to assess the impact of heterogeneity on correlation in Section 5. Finally, we concluded with a discussion in Section 6.

Study cohort: Ocular hypertension treatment study (OHTS)
Between 1994 and 2009, the OHTS enrolled 1636 participants with ocular hypertension but with no evidence of glaucomatous damage and randomized to either observation or treatment with ocular hypotensive medication. The participants were followed for a median of 13 years and the disease progression was monitored regularly every 6 months. In OHTS, 362 eyes from 279 participants developed POAG during study and constituted the largest cohort of POAG with known date of diagnosis, with a median pre-diagnosis follow-up of 8 years and median post-diagnosis follow-up of 4.8 years. The design and methods of the OHTS have been described in detail elsewhere [14]. This paper only considered data measured during postdiagnosis period. As in many clinical trials on chronic diseases, multiple longitudinal outcomes were collected in OHTS to fully explore the multidimensional impairment caused by POAG. Our analysis restricted to 2 longitudinal outcomes, namely mean deviation (MD) and clinical visual acuity (VA). MD is a global index from Humphrey 30-2 visual field (VF) tests and reflects functional damage. VA is measured by the standard ETDRS acuity testing (ETDRS ranged 0 to 110, with the conventional 20/20 vision corresponding to ETDRS score of 100) and a low ETDRS score indicates a poor visual ability. MD was measured every 6 months while VA was measured annually. This study has been approved by Washington University Institutional Review Board. Our analysis cohort consisted the first eye of 269 OHTS participants who developed POAG and had at least 2 post-diagnosis measurements in both MD and VA. To reduce the potential influence of cases with greater attrition rates, we excluded all the measures taken 7.5 years after diagnosis and ended up with at least 30 observations in each outcome at any given time points. Figure 1 showed the raw data of 50 randomly selected participants and there was a considerable variability in the trajectories for both MD and VA. Besides these two longitudinal outcomes, following demographic and clinical characteristics were also included in the data: age at diagnosis (years), gender, race (African American vs. Others), randomization groups (Observation vs. Treatment), intraocular pressure (IOP, mmHg), central corneal thickness (CCT, μm), and horizontal cup/disc ratio (HCD).

Methods
Since MD is arguably the most important index for monitoring POAG progression in clinical practice, in this article we focused on the heterogeneity of this pivotal variable and its impact on the correlation between MD and VA.

Bivariate linear mixed model (BLMM) between MD and VA
Let Y 1i (t) and Y 2i (t) denote the MD and VA for i th participant at time t, Z i = {AGE, Gender, Race, Observation group, CCT, IOP, HCD} be the vector of baseline covariates of i th participant, then each outcome can be described by a linear mixed model, Where {α 0 , β 0 } is the vector of intercepts, {α 1 , β 1 }is the vector of slopes during post-diagnosis period, while {α Z , β Z } is the vector of fixed effects for all other covariates at diagnosis. {a 0i , a 1i }and{b 0i , b 1i , b 2i }represent random intercept and slope for longitudinal MD and VA respectively, and the BLMM is constructed by linking the two outcomes via a joint distribution of the random effects, The error terms assume following bivariate normal distribution and independent of random effects, Once the BLMM has been fitted and the variancecovariance matrix is obtained, the correlation between

Association of evolution (correlation between random effects)
The correlation between random effects (intercepts and post-diagnosis slopes) summarizes how the evolution of MD is associated with the evolution of VA [6],

Evolution of association (marginal correlation)
The marginal correlation during post-diagnosis period allows answering the question how the association between MD and VA evolves over time [6], Conditional correlation (marginal correlation given random effects) The conditional marginal correlation given specific random effects allows us to assess how the association between MD and VA evolves over time within certain clinically relevant subgroups. Since a majority of POAG participants tends to have a relatively stable vision function for a long period over time after initial diagnosis, the correlation between MD and VA in the participants with stable vision would provide a useful tool to assess the impact of heterogeneity. Let γ 0i = Z i α Z + α 0 + a 0i and γ 1i = α 1 + a 1i denote the random intercept and slope of MD for i th participant, let Y 1i (t) and Y 2i (t) denote MD and VA for i th participant at time t, then the joint distribution of {Y 1i (t), Y 2i (t), γ 0i , γ 1i } at given time t also follows a normal distribution, Once the bivariate linear mixed model has been fitted and its variance-covariance matrix is obtained, the joint distribution of {Y 1i (t), Y 2i (t), γ 0i , γ 1i } at given time t can be obtained by plugging in the estimated parameters. The conditional correlation at a given time t is denoted as ρ(Y 1i (t), Y 2i (t)| γ 0i > c 1 , γ 1i > c 2 ), where c 1 and c 2 represent clinical relevant thresholds for the intercept and slope of MD respectively. In the clinical practice of POAG management, an MD level of -5 dB or above marks a mild damage and MD < −5 dB is deemed as moderate/advanced damage. An MD slope of −1 dB/year is often regarded as clinically significant deterioration. Since this paper only includes participants with newly diagnosed POAG, it is believed that a change of −0.5 dB/year is also worthy close attention [15]. We therefore choose c 1 = −5 dB and c 2 = −0.5 dB/years throughout the remainder of this paper. Once the parameters from bivariate mixed models have been determined, the conditional correlation ρ(Y 1i (t), Y 2i (t)| γ 0i > c 1 , γ 1i > c 2 ) can be estimated using parametric bootstrap resampling method with the following 4 steps.

Pearson's correlation coefficient between MD and
VA is calculated among these subjects who satisfy the conditions (i.e., with MD at diagnosis > −5 dB and slope of MD > −0.5 dB/year). 3. The above 2 steps are repeated 10,000 times to obtain the average conditional correlation at time t. The 95% confidence interval is also estimated as the 2.5 and 97.5 percentiles of the resultant correlations. 4. The above steps are repeated over different time points to describe the change of marginal correlations over time.
The assumption of independent errors in the above model could be relaxed if necessary [6]. In the case of correlated errors, , the formula for correlations between random effects remain unchanged, but the marginal correlation over time is calculated as follows and the term v 12 (t) for conditional marginal correlation also needs to be updated accordingly, and In this article, the 95% confidence interval (CI) of conditional marginal correlation was estimated using 10,000 parametric bootstrap samples, while the 95% CIs of other correlations were obtained from 500 empirical bootstrap samples using participants as resampling units. All the BLMMs were fitted using the MIXED procedure in SAS statistical package 9.4 (SAS Institutes, Cary, NC).

Results
Application to ocular hypertension treatment study (OHTS) Table 1 showed the summary statistics for baseline demographic characteristics, the estimated regression coefficients, and the estimated parameters of the variance-covariance from univariate mixed model to each outcome. Among these baseline covariates, age was significantly associated with both outcomes and these elderly participants had poor outcomes. Those participants with higher intraocular pressure (IOP) at diagnosis or with lower central corneal thickness (CCT) had significantly worse MD. Participants self-identified as African-American also tended to be worse in both outcomes though the difference was only significant in VA. Note that all the continuous variables (Age, CCT, IOP, and HCD) were standardized to have mean 0 and variance 1 in all the models throughout this paper and hence the regression coefficients actually represented the effect per 1-SD change.
Primary analysis for the correlation between longitudinal MD and VA Two bivariate linear mixed models (BLMM) were fitted to describe the relationship between MD and VA, one assuming correlated error terms and the other with independent errors. The likelihood ratio test showed that the model with independent errors would provide an adequate fit (X 2 = 2.9, df = 1, p = 0.09). Hence, the BLMM with independent error terms was selected as the final model. The estimated fixed and random effects from this BLMM were presented in Table 1. The results showed that the univariate and bivariate mixed models produced very close estimates, especially the fixed effects. Following Fieuws et al. [6], the correlation between longitudinal MD and VA was summarized in terms of the association between subject-specific evolutions (as measured by random intercepts and slopes) as well as the evolution of association (as measured by marginal correlation over time). Table 2 presented the estimated correlation coefficients for random intercepts and slopes and their 95% confidence intervals (CI). There was a significant positive correlation in both random intercepts (ρ = 0.278, 95% CI: 0.121-0.420) and random slopes (ρ = 0.579, 95% CI: 0.349-0.810). Figure 2 plotted the marginal correlations (solid lines) and the corresponding 95% confidence intervals (broken lines). Pearson correlation coefficient estimated at each time point was also used as a naïve approximation to the marginal correlation over time. In general, the marginal correlation at diagnosis (time 0) approximated the correlation between random intercepts, and the marginal correlation converged to the correlation between random slopes as the time departures from 0 [6]. As comparing to the unconditional correlation, the conditional correlation in participants with stable vision function (i.e., given MD at diagnosis above −5 dB and MD slope > −0.5 dB/year) reduced substantially, indicating that the observed correlation between MD and VA was mainly induced by these participants with more deteriorating conditions. Figure 2 also revealed a considerable variation from time to time in the pointwise Pearson correlation, even after we had imposed the restriction so that the analysis cohort contained at least 30 participants at any given time point.

Sensitivity analyses
Several sensitivity analyses were further performed to better understand the impact of heterogeneity on assessing correlations. The disease heterogeneity was approximated by 4 subpopulations from a previous study using the same OHTS dataset [16]. Briefly, the trajectories of MD and pattern standard deviation (PSD, another global index of visual field tests) were summarized by a nonparametric functional principal component (FPC) analysis [17], and a latent class analysis (LCA) was performed to the first FPC scores to identify subgroups (latent classes) of individuals with distinct patterns of MD and PSD trajectories. Unlike a linear mixed model that described the individual trajectories by assuming them as random effects following a continuous distribution, the LCA approach used latent classes (unobserved trajectory groups) as an approximation of the unknown distribution. In the other words, the latent classes could be thought of as longitudinal strata where population variability was captured by differences across groups in the shape and level of their trajectories [13]. Table 4 summarized the class-specific intercepts and slopes for MD and VA respectively, after adjusting the baseline covariates. It showed that the profiles in the 4 thclass were substantially different from the other classes. Since MD < −5 dB marks a moderate/advanced vision damage and a deteriorating rate of −0.5 dB/year is deemed clinically significant in newly diagnosed POAG [15], we therefore labeled Classes 1 to 4 as "Stable, High MD", "Stable, Low MD", "Progression", and "Rapid Progression", respectively. Following 3 BLMMs were fitted as sensitivity analyses. The average correlation coefficients between random effects and their 95% confidence intervals from each model were presented in Table 2. The estimated parameters of variance and covariance from each model were also presented in Table 3.
Sensitivity Analysis #1: adjusting the severity of vision damage at diagnosis. The first BLMM incorporated an indicator for moderate/advanced glaucoma (MD < −5 dB) into the primary analysis. The model lead to similar estimates as these from primary analysis, though the strength of estimated correlation in the sensitivity analysis decreased slightly towards null. Sensitivity Analysis #2: excluding those participants who are labeled as "Rapid Progression". The 2 nd BLMM was fitted after excluding these 16 participants who were labeled as "Rapid Progression". Although these participants only accounted for 6% of total sample size, excluding them lead to a substantial decrease in the estimated correlation, especially between random slopes. Sensitivity Analysis #3: full adjustment of heterogeneity. The 3 rd sensitivity analysis fit a BLMM including both class-specific intercepts and classspecific slopes. The results showed that none of the correlation was significantly different from 0, indicating a conditional independence between MD and VA given the heterogeneity of MD trajectory.
In summary, the results revealed that the assumption of homogeneous population in BLMM is important for Table 2 Estimated correlation coefficients and their 95% confidence intervals in the random intercepts and random slopes, from the primary analysis and three sensitivity analyses using bivariate linear mixed models assessing correlation between multivariate longitudinal data. As illustrated in different sensitivity analyses, the estimated correlations could be substantially distorted in the presence of unobserved heterogeneity. In general, the adjustment of heterogeneity tended to reduce the strength of correlations, but all the estimates had a relatively wide range of 95% confidence interval, especially between random slopes. In such a real-world data analysis, however, we do not know the "true" population parameters and the estimates could be influenced by sampling errors. In the section follows, simulated data were used to quantify to what extent the "true" correlation between longitudinal outcomes could be distorted due to heterogeneity.

Simulation study
The simulated data were a mixture of stable and progressive subjects, generated from following bivariate linear mixed models,   and The group indicator (g = 1, 2) represented stable and progressive participants, and the fixed-effect parameters α g 0 ; α were retrieved from the estimated parameters in the 2nd-and 4th-class (Table 4) respectively. The parameters for random effects Σ and Σ e were obtained from BLMM in the 3rd sensitivity analysis and assumed common across the two mixture groups. t = {0, 0.5, …, 5} denoted the semi-annual measuring times and we assumed that all individuals had exactly the same follow-up time in this simulation. Unless otherwise specified, we assumed independent error terms (i.e.,ρ = 0) in all the simulations.
Each simulation generated 500 random samples and each sample consisted 300 subjects. In each simulation, the average correlation coefficients were calculated for both marginal correlation (unconditional and conditional) and correlation between random effects. Three scenarios were considered to assess to what extent the true correlation could be distorted in the presence of heterogeneity or model misspecification.
1. Scenarios A: Impact of size of heterogeneity. In the simulated data, the progressive subjects (i.e., those with g = 2) accounted for 5%, 10%, 15% and 20% of total sample size respectively. Figure 3 A1 showed the estimated average marginal correlation over time under different proportions of progressive cases. We saw that even the presence of a small fraction of heterogeneity could dramatically distort the estimated correlation and, as expected, greater proportion of progressive cases imposed stranger influence. In contrast, the influence of heterogeneity on the conditional correlation was much smaller, and the impact was almost ignorable in the presence of only small fraction of progressive cases (Fig. 3  A2). The results also showed that the presence of heterogeneity could lead to substantial overestimation of correlations between random effects, especially in the random slopes (Table 5). 2. Scenarios B: Impact of magnitude of heterogeneity.
In each simulation, the progressive subjects accounted for only 5% of the total sample size, but the slopes in the progressive subjects varied as 50%, 75%, 125% and 150% of what were used under Scenario A. Figure 3 B1 showed that a higher magnitude of deterioration (like that of 4th-class in the OHTS data) could substantially distort the marginal correlations over time while a lower deterioration (like that of 3rd-class in the OHTS data) would have little impact on the estimated correlation. Again, the estimated conditional correlation was relatively robust to the presence of rapid progressive cases (Fig. 3 B2). The results also showed that higher deteriorating rate could lead to substantial overestimation of correlations between random slopes (Table 5). 3. Scenarios C: Impact of correlation between error terms. Since a candidate BLMM in the primary analysis showed a weak correlation between error terms (ρ = 0.07) with a trend towards significance (p = 0.09), we also assessed the potential impact of independence assumption of error terms on correlation. In each simulated dataset, the progressive subjects accounted for only 5% of the total sample size and the slopes of progressive subjects were the same as those under Scenario A. The data were generated under a variety degree of correlations (ρ = 0.2, 0.4, 0.6, 0.8) between error terms, but analyzed assuming independent errors. The result showed that both the unconditional and conditional marginal correlations were rather insensitive to the degree of correlation between error terms (Fig. 3 C1 and C2). However, ignoring the correlation between error terms could lead to a substantial overestimation of correlations between random effects, especially in the random slopes. The impact was trivial when there was only a weak correlation between error terms, but became much strong in the presence of moderate/high betweenerror correlations ( Table 5).
In summary, the simulated data confirmed the relative robustness of conditional correlation to potential heterogeneity and indicated that these participants in the Class 4 were more likely strong influential cases for assessing the correlation between longitudinal MD and VA. The sensitivity analysis excluding these 16 participants (who only account for 6% of the total samples) resulted in substantial decrease of correlation.

Discussion
In clinical trials and epidemiologic studies, it is quite common to have two or more outcomes measured repeatedly over time. These multivariate outcomes are likely to be correlated and the statistical analysis often requires taking such associations into account. A number of approaches for analyzing multivariate longitudinal data have been proposed in the statistical literature, ranging from the most naïve approach of ignoring the association to the full joint model that specifies the joint distribution and correlation structure among different outcomes [1]. The choice for a specific type of model is often guided by the specific characteristics of data such as the structure of data (balance or unbalanced), the scale of outcome measures (continuous, ordinal, or binary), as well as the research questions of interest (the average evolution over time or the association structure) [11]. When the average evolution over time (fixed effects) is of primary interest, for example, specification of the full joint distribution may be avoided by using a generalized estimating equation (GEE) approach where association structure is treated as a nuisance, and a valid inference regarding fixed effects can still be obtained even when the within-subject associations are misspecified. When the primary interests focus on the association structure itself, a full joint model of outcomes is often preferred [2]. Among various joint models,   multivariate random-effect mixed models provide a versatile tool in estimating the association among different outcomes as well as how the association evolves over time. These models do not require a balanced data structure and thus allow each subject to have different number of observations taken at different time points. They also enable reducing the effect of measurement error attenuation in estimating the correlation between the rates of change over time in different growth curves [1]. However, as illustrated by the OHTS data and simulated data, the assumption of homogeneous population in such a multivariate random-effect mixed model is important for estimation and inference about correlation. The estimated correlation could be substantially distorted even in the presence of a small proportion of heterogeneity.

Unconditional: % of progressive cases
In this paper, we demonstrated various strategies to relax the rather restrictive one-size-fit-all assumption in the bivariate linear mixed models. We showed that conditional correlation given random effects provides a robust estimate to describe the correlation over time in the presence of unobserved heterogeneity. The conditional correlation works better when there is only small fraction of strong influential cases and/or when the difference in trajectories is not so massive among heterogeneous cases. The study also revealed that latent class analysis approach provides a useful tool to explore disease heterogeneity and to assess whether the individual trajectories are smoothly distributed around the population average. Since the existence of latent classes can be because of real heterogeneity (mixture of distinct subpopulations) or simply due to non-normality distribution [13], we therefore expect that the incorporation of latent classes into mixed model could also improve normality assumption. In the presence of massive heterogeneity, however, the assumption of multivariate normality distribution in BLMM is violated and all the estimated correlations including conditional ones will be substantially distorted. In such a case, the dual-trajectory model [18] may provide a better alternative to describe the connections between developmental trajectories of two longitudinal outcomes. It allows the trajectories to evolve contemporaneously or over different non-overlapping time periods. The correlation between different outcomes is represented by the conditional probabilities (i.e., the chance of a given trajectory in one outcome conditional on the trajectory of another) or the joint probabilities (i.e., the chance for a given cotrajectory).
Substantively, our study showed that there is a significant positive correlation between longitudinal MD and VA, and that the correlation constantly strengthens as the time increased. However, further analysis revealed Table 5 Averages and 95% confidence intervals for correlations between random intercepts and correlation between random slopes based on simulated data, where data were generated under 3 different scenarios (Scenarios A: different proportions of progressive cases; Scenarios B: different magnitudes of deteriorating rates in progressive cases; Scenarios C: various strength of between-error correlations) that the correlation is induced primarily by participants with rapid deteriorating MD who only accounted for a small fraction of the total sample size. This finding was consistent to the recent observation that the overall average of VA in participants with newly diagnosis POAG was not significantly different from those without POAG [19]. Our study had some limitations. In this study we only assessed the potential impact of heterogeneity. As long as the estimation and inference on correlation are of primary interest, the results can also be influenced by a variety of other assumptions such as linearity and homoscedasticity. For this consideration, our analysis cohort was only restricted to the post-diagnosis data to better fulfil the linearity assumption, and the baseline demographic and clinical characteristics were included in the mixed models to reduce the unexplained variance. Another potential issue is the influence of missing data. Although these joint mixed models do not require a balanced data structure, they assume that data are missing at random. Inclusion of cases with greater attrition rates may weaken statistical precision and potentially introduce bias if such an assumption is incorrect. To reduce the potential influence of cases with greater attrition rates, in this study we only included data from these visits with at least 30 subjects.

Conclusion
Bivariate linear mixed model (BLMM) is a versatile tool with regard to assessing correlation between multivariate longitudinal data and the conditional correlation given random effects provides a robust estimate to describe the correlation in the presence of unobserved heterogeneity.