Analysis of asynchronous longitudinal data with partially linear models

: We study partially linear models for asynchronous longitudinal data to incorporate nonlinear time trend eﬀects. Local and global estimating equations are developed for estimating the parametric and nonpara- metric eﬀects. We show that with a proper choice of the kernel bandwidth parameter, one can obtain consistent and asymptotically normal parame- ter estimates for the linear eﬀects. Asymptotic properties of the estimated nonlinear eﬀects are established. Extensive simulation studies provide nu- merical support for the theoretical ﬁndings. Data from an HIV study are used to illustrate our methodology.


Introduction
Asynchronous longitudinal data refers to the data structure that measurement times for the longitudinal response and longitudinal covariates are mismatched within individuals. For example, educational studies associate subjective evaluations of students' performance and objective test results. However, subjective information is usually collected through interviews or phone calls at different times from the exams. In clinical epidemiology, one may study the links between biomarkers, sampled repeatedly at lab visits, with self-reported measures of function and quality of life, captured via outpatient phone interviews. With the adoption of electronic medical records, an individual's information may be pooled from different sources to make treatment decisions, creating an asynchronous longitudinal data structure.
While regression analysis for synchronous longitudinal data (Liang and Zeger, 1986;Diggle et al., 2002) has been widely studied, there has been limited work on the analysis of regression models using asynchronous longitudinal data. Xiong and Dubin (2010) employed an ad hoc binning step to synchronize covariates and response measurements to use existing methods for synchronous data. Sentürk et al. (2012) explicitly addressed the asynchronous setting for generalized varying coefficient models with one covariate but did not provide the theoretical properties of the estimators. Cao et al. (2015) proposed a nonparametric kernel weighting approach for the generalized linear models to explicitly deal with the asynchronous structure and rigorously established the consistency and asymptotic normality of the resulting estimates. This was extended to a more general set up in Cao et al. (2016).
Despite these progresses, one limitation of marginal generalized linear models associating longitudinal response with corresponding longitudinal covariates is that non-linear time trends cannot be taken into account. Partially linear models provide a convenient vehicle to retain the parsimonious description of the relationship between the response variable and its covariates and to use a flexible nonparametric function to capture the nonlinear time trend effects. In this paper, we study the partially linear models for asynchronous longitudinal data: E{Y (t)|X(t)} = α(t) + X(t) T β, (1.1) where t is a univariate time index, Y (t) is the response scalar, X(t) is the p × 1 covariate vector at time t, α(t) is an unspecified baseline function of t and β is vector of unknown regression coefficients. Model (1.1) does not require data analysts to parameterize the baseline function which may be difficult in practice. It keeps the flexibility of the nonparametric models for the baseline function, while maintaining the exploratory power of parametric models. Therefore, model (1.1) and its variations have been extensively studied in the literature for synchronous longitudinal data (Zeger and Diggle, 1994;Moyeed and Diggle, 1994;Lin and Carroll, 2001;Wang et al., 2005). Lin and Ying (2001) proposed estimation procedures under the framework of counting processes. Fan and Li (2004) developed difference-based method and profile least squares approach for the estimation of β. Due to the asynchronous data structure, their approaches cannot be directly used for our purpose.
In this paper, we propose a system of global and local estimating equations for model (1.1). Furthermore, under appropriate regularity conditions, we show that within a range of choices of the smoothing parameter (the kernel bandwidth), the estimator of β has the same rate of convergence as in the generalized linear model (Cao et al., 2015), slower than the parametric rate n −1/2 for synchronous data (Lin and Ying, 2001;Fan and Li, 2004). The asymptotic normality is also obtained for the estimates of the linear parameter and nonlinear function. Inferences on β are based on a resampling approach and we use bootstrap for the inference of the nonparametric function α(·). What distinguishes our work from Cao et al. (2015) is that a non-parametric time trend effect is explicitly modeled with associated theoretical properties rigorously established. Unlike Lin and Ying (2001); Fan and Li (2004) for synchronous longitudinal data, a global estimating equation is proposed to down-weigh those observations which are distant in time and enables the use of all covariate observations for each observed response with asynchronous longitudinal data. Our approach is related to some existing quasi/partial-likelihood based algorithms for partially linear models (Cai et al., 2007(Cai et al., , 2008Lu and Zhang, 2010). However, the asynchronous longitudinal data structure poses new challenges for both model estimation and inferences.
The remainder of the paper is organized as follows. In Section 2, we describe the global and local estimating equations for the parameter β and function α(·) in model (1.1), and provide the corresponding theoretical findings. In Section 3, we conduct simulation studies to examine the finite-sample properties of the new procedure and apply our method to dataset from an HIV study. Concluding remarks are given in Section 4. All technical proofs are relegated in the Appendix.

Model estimation and inference
In this section, we propose a system of estimating equations to simultaneously estimate β and α(t). The main idea is to use a global estimating equation for β by kernel weighting and the local polynomial technique (Fan and Gijbels, 1996) to construct a set of locally weighted estimating equations for α(t).

Estimating equations
Suppose there are n subjects in the study. We formulate the observation process using a bivariate counting process for the observation times of the covariate and response variables, similar to Martinussen and Scheike (1999); Lin and Ying (2001); Cao et al. (2015Cao et al. ( , 2016. Specifically, for subject i = 1, . . . , n, the observation times of the longitudinal covariate process X i (t) and response process Y i (s) are generated from a bivariate counting process.
counts the number of observation times up to t on the response and up to s on the covariates, where {T ij , j = 1, . . . , L i } are the observation times of the response and {S ik , k = 1, . . . , M i } are the observation times of the covariates.
If the nonparametric function α(t) were known, partially linear models reduce to the linear models, and thus we can adopt methods in Cao et al. (2015) to estimate β. Motivated by this, we first propose an estimating equation to solve β, with α(s) being fixed at its current value. The estimating equation is is a symmetric kernel function, usually taken to be the Epanechnikov kernel K(t) = 0.75(1 − t 2 ) + and h is the bandwidth. In (2.1), longitudinal measurements of the nonparametric function α(·) are the same as the longitudinal covariates X(·) without loss of generality. The proposed method equally works if longitudinal measurements of the nonparametric function α(·) are the same as the longitudinal response Y (·).
The basic idea behind the foregoing algorithm is intuitive: estimate α(s) locally via (2.3) and (2.4), and then use all of the data and (2.1) to estimate β, withα(s) replacing α(s). The proposed algorithm above is similar in spirit to Cai et al. (2007Cai et al. ( , 2008; Lu and Zhang (2010) for partially linear hazards regression models with multivariate survival data and transformation models with univariate survival data. All these algorithms, in their own contexts, alternatively optimize the global and local quasi-likelihood functions or estimating equations until convergence.
In the synchronous case where s = t, our algorithm is different from that in Fan and Li (2004), where a difference-based method or a profile least squares approach was used to get a closed form estimate of β with parametric rate of convergence. The baseline function α(·) was estimated by smoothing the partial residuals using a local linear fit. As the rate of convergence forβ is faster than that for the nonparametric estimator, the errors in estimating β were negligible and the value of β can be considered as known in the nonparametric estimation of α(·). The asynchronous structure greatly complicates the estimation and inference procedure and we cannot obtain a closed form expression for β even in the linear model case. Therefore, inferential procedures in Fan and Li (2004) are not applicable.
The estimation procedure involves choosing smoothing parameters on two quite different levels. In step 1 of the algorithm the aim is estimation of the nonparametric part α(t). However, in step 2, the goal is to estimate the parametric part β. In step 1, we use bandwidth that falls into the theoretically allowable range specified in Theorem 2.2 below. In step 2, due to the asynchronous structure of the observed longitudinal data, commonly used approaches such as cross-validation do not work here. We propose to minimize the mean squared errors by calculating the bias and variance ofβ separately. Simulation studies show that our procedure is quite robust with different choices of bandwidths.

Asymptotic properties
In this section, we establish the asymptotic properties of the estimatorsβ and γ 0 (t). For subject i, we allow the observations of X i (·) and the observations of Y i (·) to be arbitrarily correlated. We specify our assumptions on the covariance structure as follows. For Observe that the conditional variances and correlations of Y are completely unspecified and may depend on X. We need the following conditions.
has positive Lebesgue measure with probability 1 and for s ∈ F, X(s) is not a constant function with probability 1.
Condition (A1) requires that the observation process to be independent of both the response and covariates. We require that λ(t, s) to be positive when s = t for non-zero measure of time s. The requirement on the intensity function λ(t, s) is quite mild. Condition (A2) ensures identifiability of β and condition (A3) posits smoothness assumptions on the expectation of some functionals of X(s) and α 0 (s). It provides additional regularity conditions on the observation intensity λ(·, ·). These assumptions can be relaxed similar to Cao et al. (2016) that allows the intensity function to depend on covariate process X(·). Condition (A4) and (A5) specify valid kernels and bandwidths.
The following theorem, which is established in the appendix, states the asymptotic properties ofβ.
The asymptotic result has the same rate of convergence as that in Cao et al. (2015) when the identity link function is used but larger variance. This extra variability is due to estimation of the nonparametric function α 0 (·). In the special case that E{X(t)} = 0, ∀t ∈ [0, 1] our results coincide with those in Cao et al. (2015). The asymptotic bias is of order h 2 from the proof in the Appendix and vanishes under condition (A5). Compared with partially linear models with synchronous longitudinal data (Fan and Li, 2004), we have a slower rate of convergence.
We need additional conditions to establish the limiting distribution of the nonparametric function α(·).
The bias is of order h 2 1 + h 2 2 and vanishes through condition (A5 * ). The rate of convergence is slower than the regular nonparametric n 2/5 rate. This is the price to pay for the asynchronous data structure. The variance has a similar form to that in Fan and Li (2004) except we have a bivariate counting process and intensity function.

Practical implementation
As shown in Theorem 1, the asymptotic variance ofβ has a standard sandwich form A −1 ΣA −1 . However, it is challenging to estimate parameters A and Σ without imposing additional assumptions on the covariate process and intensity function of the observation times. Estimating equation based method used in Cao et al. (2015) is not applicable as A cannot be estimated with available data. Therefore, it is desirable to have a feasible computation approach to approximate the asymptotic variance ofβ. In this section, we propose to use a resampling scheme similar to that in Lu and Zhang (2010) to approximate the asymptotic distribution ofβ.
The resampling algorithm proceeds as follows. First, we generate n iid exponential random variables {ξ i , i = 1, . . . , n} with mean 1 and variance 1. Fixing data at their observed values, we solve the following ξ i -weighted estimating equations and denote the solutions as β * and α * (s):

L. Chen and H. Cao
The estimates β * and α * (s) can be obtained using the same iterative algorithm proposed in Section 2.1. In the following theorem, we establish the validity of the proposed resampling method.
Based on Theorem 3, by repeatedly generating {ξ 1 , . . . , ξ n } many times, we may obtain a large number of realizations of β * . The variance estimate ofβ can be obtained by referring to the empirical distribution of β * . Point-wise inference onγ 0 (t;β) can be similarly obtained by the resampling method and we omit the details here. The proposed Monte Carlo scheme is similar to the bootstrap in terms of sample space, convergence, etc. Specifically, the sample space for β * is conditional on the data whereas that ofβ is unconditional; the distribution of (nh) 1/2 (β * − β) converges to the same limit as that of (nh) 1/2 (β − β) for almost all realizations of the data. Bootstrap would be a possible alternative to the proposed approach. However, it is not clear how to justify that the bootstrap is valid for the present problem.
Our method depends on the selection of bandwidth. To estimate β 0 , theoretically speaking, condition (A5) says that the bandwidth cannot be too small; otherwise, the variance will be quite large. On the other hand, to eliminate the asymptotic bias, one requires a small bandwidth. Theorem 2.1 indicates that the allowable range of bandwidth is (n −1 , n −1/5 ). Our numerical studies show that the proposed method has robust performance in the range (n −0.7 , n −0.6 ). In the estimation of α 0 (s), if we let h 1 = h 2 = h α , based on condition (A5 * ), a valid bandwidth is in the range (n −1/2 , n −1/6 ). Our numerical studies show that bandwidths in the range (n −0.4 , n −0.3 ) have decent performance. In general, asynchronous estimators converge more slowly and are less efficient than those based on synchronous data.
An automatic bandwidth selection procedure forβ can be carried out by calculating the bias and variability ofβ separately. To be specific, as the bias of β is of order h 2 , we first regressβ(h) on h 2 in a reasonable range of h to obtain the slope estimateĈ. To obtain the variance, we split the data randomly into two parts and obtain regression coefficient estimatesβ 1 (h) andβ 2 (h) based on each half sample. The variance ofβ(h) is estimated byV (h) = (β 1 (h)−β 2 (h)) 2 /4. We thus calculate the mean squared error asĈ 2 h 4 +V (h) and select the bandwidth h minimizing this mean squared error.

Simulation
In this section, we examine the finite sample performance of the proposed estimators. We first study the performance of the parametric regression coefficient β. We generated 1, 000 datasets, each consisting of n = 200, 900 or 1, 600 subjects. The numbers of observation times for the response Y (t) and covariate X(t) were Poisson distributed with intensity rate 6. With these two numbers of measurements, the observation times for the response and covariate were generated from uniform distribution U(0, 1) independently. The covariate process was Gaussian, with values at fixed time points being multivariate normal with mean 0, variance 1 and correlation e −|tij −t ik | , where t ij was the jth measurement time and t ik was the kth measurement time for the covariates, both on subject i. At the data generating stage, in order to generate the response, we included the response observation times with the covariate observation times when generating the covariates needed for simulating response at the response observation times. The responses were generated from where β is the regression coefficient, (t) is Gaussian, with mean 0, variance 1, and cov{ (s), (t)} = 2 −|t−s| , and α(t) is the nonparametric baseline function. Once the response was generated, we removed the covariate measurements at the response times from the observed covariate values. In this simulation, we used the Epanechnikov kernel K(t) = 0.75(1 − t 2 ) + , β = −2, α(t) = sin(2πt), √ t and 0.4t + 0.5 and the bandwidth for estimating α(·) was set to be h 1 = h 2 = h α = n −0.4 . Our goal was to assess the performance ofβ. The results were very similar for other choices of kernel functions, βs, functional forms of α(·) and h α . For comparison, we implemented last value carried forward approach to synchronize covariates and response measurements and used method in Fan and Li (2004). Specifically, for a response observed at time t ij , the covariate at time t ij was taken to be the covariate observed at time s = max(x ≤ t ij , x ∈ {s i1 , . . . , s iMi }). This corresponds to the most recent observation time relative to the response. For a response, if no covariate was observed prior to the response's observation time, the observed response was omitted from the analysis. Table 1 summarized the simulation results. We can see that as sample size increases, biases and standard deviations decrease, across different functional forms of α(·). The standard deviation estimate is close to the empirical one and coverage probabilities are close to the nominal one. This is evident even for moderately large sample size. In contrast, last value carried forward approach exhibits large bias, which does not attenuate with increased sample size. On the other hand, with last value carried forward approach, as sample size increases, variance estimate gets smaller, and consequently, the coverage probability deteriorates.
We next examine the performance ofα(t). Local linear regression is used to estimate the baseline function. Simulation set up is exactly the same as before.  Note: "BD" represents different bandwidths in the estimation of β, "Bias' is the empirical bias, "SD" is the sample standard deviation, "SE" is the average of the standard error estimates, "CP"/100 represents the coverage probability of the 95% confidence interval for β, "auto" represents the results by automatic bandwidth selection procedure, and "LVCF" represents last value carried forward approach.
We assess the performance ofα(·) by the square root of average squared errors (RASE), where {t k , k = 1, . . . , n grid } are the grid points at which the baseline function α(·) is estimated. In our simulation n grid = 100 with t k , k = 1, . . . , n grid equally spaced between (0, 1). The bandwidth for estimating β is set to be h = n −0.6 . The RASEs for different functional forms of α(·) and bandwidths based on 100 replications are listed in Table 2. The results are very typical. RASE gets smaller with smaller bandwidths. The performance improves with increased sample sizes. Figure 1 depicts the typical estimated curves of α(t) along with its point-wise confidence interval. The standard error of α(·) is calculated through bootstrap (Efron and Tibshirani, 1993). Specifically, we draw 100 independent bootstrap samples from the generated data and compute α(t) based on the bootstrap, denoted asα * (b) (t), b = 1, . . . , 100. The standard error estimate of α(t) is obtained througĥ Note: "BD" represents different bandwidths in the estimation of α(·), "MEAN" represents empirical average of RASE, and "SD" represents empirical standard deviaiton of RASE. whereα * (·) (t) = 1 100 100 b=1α * (b) (t).
We look at bias, variance and pointwise coverage probabilities at t = 0.25, 0.5 and 0.75 with different functional forms of α(·) and bandwidth h α . Bandwidth for the estimation of β is set to be n −0.6 .
The results are summarized in Table 3. We can see that the biases are small, standard error is close to the sample standard deviation and coverage probabilities are close to the nominal 95%. This pattern is consistent across different time points, functional form of α 0 (t) and sample sizes. As sample size increases, both bias and variability attenuate. Therefore, we can conduct valid point-wise inference for the function α(t).  Note: "BD" represents different bandwidths in the estimation of α(·), "Bias" is the empirical bias, "SD" is the sample standard deviation, "SE" is the average of the standard error estimates and "CP"/100 represents the coverage probability of the 95% confidence interval forβ.

An application to HIV dataset
In this section, we apply our method to an HIV dataset. A total of 191 HIV patients were followed from July 1997 to September 2002 in a University hospital. Details of the study design, methods and medical implications are given in Wohl et al. (2005). During this study, all patients were scheduled to have their measurements taken during the semi-annual visits, with HIV viral load and CD4 counts obtained separately at different labs. Because many patients did not visit the clinic at the scheduled time or even missed those scheduled visits, and the HIV infection occurred randomly during the study, there are unequal numbers of repeated measurement on HIV viral load and CD4 counts and the measurement times for these two variables are different. Under missing at random assumption (Little and Rubin, 2002), we eliminate patients who do not have any records on either HIV or CD4 counts and use 181 subjects in our analysis. We take log transformed CD4 counts as covariate and log transformed HIV viral load as response. The bandwidth for nonparametric estimate of α(t) is set at h α = 2(Q 3 − Q 1 )n −0.35 = 226, where Q 3 is the 0.75 quantile and Q 1 is the 0.25 quantile of the pooled sample of measurement times for the covariate and response and n is the number of patients. This adjusts the range of observation times to be compatible with the simulations studies. To estimate β, we use bandwidths h = 2(Q 3 −Q 1 )n −γ , where γ = 0.5 or 0.6. Results based on last value carried forward implemented using methods proposed in Fan and Li (2004) are presented as a comparison. From Table 4, we can see the statistically significant negative relationship between CD4 counts and HIV viral load through the proposed method. This negative association is consistent with medical literature (Phillips et al., 2001). LVCF also produced a statistically significant negative association between CD4 counts and HIV viral load. Figure  2 depicts the estimated baseline function α(t) using data adaptive bandwidth h = 62 in the estimation of β. It also plots the estimated baseline function plus/minus 1.96 standard errors, which can serve as a point-wise confidence interval ignoring the bias of the nonparametric fit. From Figure 2, we can see that there is no significant time trend effect within the first 3 years. However, there is a significant decreasing trend starting from around 3 years until the end of the study. Nonparametric estimation of the baseline function removes such time trend to avoid biased estimates of the regression coefficient β for the association between CD4 counts and HIV viral load.

Concluding remarks
In this paper, we propose a semiparametric partially linear model for asynchronous longitudinal data analysis. This allows the exploration of curvature and nonlinear time trend effect of longitudinal response. Local polynomial is used to approximate the nonparametric function. Our analysis is based on a set of global and local estimating equations with an iterative algorithm that has good convergency properties. The rate of convergence for the parametric part remains the same as in the parametric model with asynchronous longitudinal data but the variance is larger due to the estimation of nonparametric baseline function. For asynchronous longitudinal data, longitudinal covariate and response are mismatched even for the same subject. Essentially for each longitudinal response, kernel smoothing approach only uses covariate in its neighborhood controlled by bandwidth h. Consequently, we have (nh) 1/2 rate of convergence. It is not clear whether n 1/2 rate of convergence could be achieved under stronger assumptions.
Numerical studies support our theoretical predictions and show the newly proposed method yield valid inference in contrast to the ad-hoc last value carried forward approach. On the other hand, we can simply use a weighted last value carried forward method to obtain valid inference. The main idea is that the further the last observed longitudinal covariate is from the current longitudinal response, the less it should contribute to the estimating equation. This can be handled formally by weighting the last observation as a decreasing function of the time between the most recently observed longitudinal covariate and current longitudinal response.
Our method is based on the working independence assumption similar to GEE (Liang and Zeger, 1986). Both GEE with synchronous data and our proposed analytical strategy for asynchronous longitudinal data are valid when the data are missing at random (Little and Rubin, 2002). In GEE, with time-dependent covariates, Pepe and Anderson (1994) showed that parameter estimates are generally biased unless (i) the mean for the response at time t given all past, present, and future covariate values is equal to that given the covariate values observed at t or unless (ii) independence estimating equations are used. The condition (i) is a strong assumption. When data are missing at random, (i) cannot be verified with the observed data and (ii) is a conservative approach which ensures valid inference using complete data observations regardless whether (i) holds. When (ii) is adopted, it is challenging to improve efficiency, since the correlation structure in the data cannot be exploited in the working covariance matrix.
Similar issues arise in our asynchronous longitudinal data set-up, with further work warranted to understand the extend to which valid estimation may be achieved with non-diagonal working covariance matrices and whether efficiency gains might be achievable.
In some applications, both time-independent and time-dependent covariates exist. Cao et al. (2015) requires that the observation times of time-dependent covariates are the same and treat time-independent covariates the same as timedependent covariates. It is of great interest to explore whether improved rate of convergence could be obtained for the time-independent covariate in such mixed covariate models. To test whether α(t) is really time-varying, a simultaneous confidence band will be needed. Cao et al. (2017) studied the construction of smooth simultaneous confidence band for varying coefficient models with sparse longitudinal data. Simultaneous inference for the nonparametric function α(t) for asynchronous longitudinal data remains open and warrant additional development. Consequently, Under assumption (A1) and (A2), β 0 is the unique solution to u(β). Asβ solves the estimating equation U (β) = 0, it follows from the convexity lemma (Andersen and Gill, 1982) thatβ → β 0 in probability. We next show the asymptotic normality ofβ. The key idea is to establish the following relationship sup |β−β0|<M (nh) −1/2 (nh) 1/2Ũ (β) − (nh) 1/2 [Ũ (β 0 ) − E{Ũ (β 0 )} − (nh) 1/2 A(β − β 0 )] = Dn 1/2 h 5/2 + o p (n 1/2 h 5/2 ) + o p (1), (A.12) where A is given in Theorem 1 and D is a constant.
To obtain (A.12), first, using P n and P to denote the empirical measure and the true probability measure respectively, we obtain (nh) 1/2Ũ (β) = (nh) 1/2 (P n − P) − X(s) T β dN (t, s) For the second term on the right hand side of (A.13), we have II = −(nh) 1/2 A(β − β 0 ) + Cn 1/2 h 5/2 + o p (n 1/2 h 5/2 ). (A.14) From (A2), A is a positive-definite matrix, thus non-singular. For the first term on the right hand side of (A.13), we consider the class of functions for a given constant . It can be shown that this class is a P-Donsker class (van der Vaart and Wellner, 1996). As a result, we obtain that the first term on the right hand side of (A.13) for |β − β 0 | < M(nh) −1/2 is equal to (nh) 1/2 (P n − P)

A.3. Proof of Theorem 3
The proof of Theorem 3 can be similarly derived as that for Theorem 1 and hence is omitted here.