On last observation carried forward and asynchronous longitudinal regression analysis

In many longitudinal studies, the covariates and response are often intermittently observed at irregular, mismatched and subject-specific times. Last observation carried forward (LOCF) is one of the most commonly used methods to deal with such data when covariates and response are observed asynchronously. However, this can lead to considerable bias. In this paper, we propose a weighted LOCF estimation using asynchronous longitudinal data for the generalized linear model. We further generalize this approach to utilize previously observed covariates in addition to the most recent observation. In comparison to earlier methods, the current methods are valid under weaker assumptions on the covariate process and allow informative observation times which may depend on response even conditional on covariates. Extensive simulation studies provide numerical support for the theoretical findings. Data from an HIV study is used to illustrate our methodology.


Introduction
Longitudinal data arise in many scientific inquiries, such as epidemiological studies, clinical trials and educational studies, among others.In such studies, data are often collected at subject specific time points and the number of measurements varies across subjects.Last observation carried forward (LOCF) is one of the most commonly used approaches for analyzing incomplete longitudinal data.This method imputes the most recent observation as the current observation and then employs standard analyses treating the imputed covariate as the true covariate.Such analyses are problematic, see Lavori [4] and Molenberghs et al. [9].First, it is assumed that the longitudinal measurement does not change from the time of the last measurement.Second, no distinction is made between those subjects who had a valid measurement and those subjects with imputed values, artificially increasing the amount of information in the data.These issues can induce substantial biases in parameter estimates and lead to inaccurate inferences, see Verbeke and Molenberghs [18].To circumvent these problems, likelihood based approaches such as Verbeke and Molenberghs [18] and Cook et al. [2], inverse probability weighting such as Robins et al. [12] and Robins et al. [13] and multiple imputation such as Rubin [14] have been proposed as more principled and preferred methods for analysis.However, such methods impose stringent modeling assumptions and the inferences they produce are typically highly dependent on untestable and often implicit assumptions regarding the distribution of the unobserved measurements given the observed measurements.
In this paper, our focus is regression with so-called asynchronous longitudinal data as in Cao et al. [1], where the measurement times for a longitudinal response and a longitudinal covariate are mismatched.We propose an intuitively appealing weighting approach which retains the simplicity of LOCF imputation for the current value of the covariate, enabling the use of methods for synchronous data where response and covariate are measured at the same time points.As an example, in a dialysis study of end-stage renal disease patients, infection-related hospitalization status and serum C-reactive protein are obtained at distinct time points within each patient.In clinical epidemiology, measurements of vital signs and lab tests are often conducted at different times for the same individual.In electronic medical records, a subject's information may be pooled from different sources to make treatment decisions, creating an asynchronous longitudinal data structure.We consider the estimation of generalized linear models which relate the current value of a longitudinal outcome to the current value of a longitudinal covariate.We modify estimating equation techniques in Liang and Zeger [5] for synchronous data to obtain unbiased inferences based on LOCF imputation of the current value of the longitudinal covariate with mismatched measurement times.
While regression analysis using estimating equations for synchronous longitudinal data such as Liang and Zeger [5] has been widely studied, there has been limited work on the analysis of regression models using asynchronous longitudinal data.Xiong and Dubin [20] employed an ad hoc binning step to synchronize covariates and response measurements to use existing methods for synchronous data.Sentürk et al. [15] explicitly addressed the asynchronous setting for generalized varying coefficient model with one covariate but did not provide the theoretical properties of the estimators.Cao et al. [1] proposed a nonparametric kernel weighting approach for the generalized linear model to explicitly deal with the asynchronous structure and rigorously established the consistency and asymptotic normality of the resulting estimates.In this paper, we formalize simple LOCF in a rigorous manner using weighting techniques similar to those in Cao et al. [1].We show that the weighted LOCF is also consistent and asymptotically normal but is valid under weaker assumptions on the covariate and observation time processes, as detailed in the sequel.
The main idea of the weighting is that the further the last observation is from the current observation, the less it should contribute to the estimating equation.This is handled formally by weighting the last observation as a decreasing function of the time between the observed and missing measurement occasions.We show that this may be generalized using the half kernel weighting to utilize all previously observed covariates in addition to the most recent observation.In contrast to full kernel method in Cao et al. [1] which uses both previous and future covariate measurements, the proposed estimators are valid under weaker assumptions on the covariate processes and allow observation times to depend on the response even conditionally on covariates.We also relax the conditions for the validity of the full kernel methods, permitting covariate processes with independent increments and allowing observation times to depend on covariates but not responses, similarly to the conditional independence approach to longitudinal data in Lin and Ying [7], Lin et al. [6] and Sun et al. [16].Interestingly, under independent increments, the rate of convergence of the estimators differs from that without independent increments.
The paper is organized as follows.In section 2, we recap results from Cao et al. [1], discuss the proposed weighted LOCF and half kernel estimators and corresponding theoretical findings.Section 3 reports simulation studies that compare the proposed methods with Cao et al. [1].The new methods demonstrate improved performance in situations where the assumptions from the previously proposed estimator are violated, particularly with informative observation times.Interestingly, there is little loss of information in weighted LOCF versus using all previously observed covariate values in the estimating equations.Application to an HIV dataset illustrates the practical utility of the methods.Concluding remarks are given in Section 4. Proofs of results from Section 2 are relegated in the Appendix.

Full kernel estimation
This subsections presents the main results from [1].We consider the generalized linear model: where g is a known, strictly increasing and continuously twice-differentiable function, t is a univariate time index, X(t) is a vector of time-varying covariates plus intercept term, Y (t) is a time-varying response and β is an unknown timeinvariant regression parameter.For subject i = 1, . . ., n, the observation times of the longitudinal covariate process X i (t) and response process Y i (t) may be generated from a bivariate counting process like [7], where counts the number of observation times up to t on the response and up to s on the covariates, where {T ij , j = 1, . . ., L i } are the observation times of the response and {S ik , k = 1, . . ., M i } are the observation times of the covariates.In order to use existing methods for synchronous longitudinal data, where L i = M i and T ij = S ij , j = 1, . . ., L i , for each observed response, one may carry forward the most recently observed covariate.This ad hoc approach incurs substantial bias as shown in [1].Furthermore, [1] proposed an estimating equation for β in (2.1) where K h (t) = K(t/h)/h, K(t) is a symmetric kernel function, usually taken to be the Epanechnikov kernel K(t) = 0.75(1 − t 2 ) + and h is the bandwidth.The response Y i (t) may be a continuous, categorical, or count variable, while the covariate X i (t) may include time-independent covariates, such as an intercept term, in addition to time-varying covariates.The main requirement for the validity of (2.2) is that if the time-varying covariates in X i (t) are multivariate, then the different covariates are measured at the same time points.The kernel weighting accounts for the fact that the covariate and response are mismatched and permits contributions to U f n (β) from all possible pairings of response and covariate observations.We solve U f n (β) to obtain an estimate for β, denoted βf .We next present the asymptotic properties of βf .We specify our assumptions on the covariance structure as follows.For s, t ∈ [0, τ], let var{Y (t)|X(t)} = σ{t, X(t)} 2 and cov{Y (s), Y (t)|X(s), X(t)} = r{s, t, X(s), X(t)}, where τ is the maximum follow-up time.
We need the following conditions.
(A1) N i (t, s) is independent of (Y i , X i ) and moreover, E{dN i (t, s)} = λ(t, s)dtds, where λ(t, s) is a twice-continuous differentiable function for any 0 ≤ t, s ≤ τ .In addition, Borel measure for ) If there exits a vector γ such that γ T X(s) = 0 for any s ∈ G with probability one, then γ = 0. (A3) For any β in a neighborhood of β 0 , the true value of The following theorem states the asymptotic properties of βf .

Weighted LOCF estimation
In this subsection, we propose a weighted LOCF for the asynchronous longitudinal data.For the LOCF approach using generalized estimating equations for synchronous data as in [3], for a response at time t ij , the covariate at time t ij is taken to be the covariate observed at time s = max(x < t ij , x ∈ {s i1 , . . ., s imi }).This method assumes that either the subject's response or the subject's covariate is constant from the most recent observation time and does not account for the variability inherent in this imputation.These assumptions may not hold in practice and violations can confound covariates with time, which in turn can bias estimates of covariate effects and their standard errors.As a result, the magnitude and even the direction of bias from LOCF is extremely difficult, if not impossible, to determine a priori.
We propose to remedy this bias by adopting a simple weighting strategy, downweighing imputed values which are far in time from the current response.To be specific, for a sample of n independent subject, the weighted generalized estimating equation for β is where 1] and h is the bandwidth.Covariates are aggregated into the estimating equation (2.5) and for a response, there are multiple covariates contributing to the estimating equation with different weights.If there is no covariate measured before a response, such response does not contribute to the estimating equation (2.5).As the measurement times for the covariates and response are random and asynchronous, incorporating the correlation structure into the estimating equation to improve efficiency is unclear.
For s < t, the measurement times are allowed to depend on covariates through This assumption permits dependence on Y (t) at times t < s, that is, future covariate observation times may depend on previous values of the response.The estimator presented above is valid under such informative observation times, which differs from [1], which did not allow dependence of the bivariate observation process on Y (t) at any time points as can be seen from (A1).Additional regularity conditions are stated in condition (C0) below.
Before we present our asymptotic results, we need some notations and assumptions.The observations of X(•) can be arbitrarily correlated.We specify our assumptions on the covariance structure as follows.For s, t, Its first order partial right and left derivative are By the same token, its second order partial right and left derivative are Moreover, denote K β (s, t) = E[X(s)g{X(s) T β}λ{t, s; X(s)}] and its first order partial right and left derivative are defined in exactly the same way and denoted by Kβ (s, s+) and Kβ (s, s−).Let Kβ (s, s+) and Kβ (s, s−) be its second order partial right and left derivative.In addition, denote G β (s 1 , s 2 , t 1 , t 2 ) = E X(s 1 )X(s 2 ) T g{X(t 1 ) T β}g{X(t 2 ) T β}λ{t 2 , s 2 ; X(s 2 )} , its first order partial right and left derivative as Ġβ (s 1 , s 2 , s 1 +, s 2 +) and Ġβ (s its first order partial right and left derivative as Jβ (s 1 , s 2 , s 1 +, s 2 +) and Jβ (s 1 , s 2 , s 1 −, s 2 −).
We need the following conditions.
(C2) For any β in a neighborhood of β 0 , |g {X(t) T β}| ≤ q( X(t) ) for some q(•) satisfying E{ X(t) 4 q( X(t) ) 2 } is uniformly bounded in t.Moreover, E X(s)X(s) T σ{t, X(t)} 2 λ{t, s; X(s)} is continuous and has partial right and left derivative with respect to t.Additionally, E{ X(t) 4 For s < t, the condition (C0) requires conditionally independent observation times in which the expectation of the bivariate counting process at times t and s is conditionally independent of the responses at time t and s given the covariates at times t and s.No assumptions are needed for t < s, unlike that specified in (A1).In addition, this condition assumes the existence of right continuous derivatives on F β (s, t).Such assumptions are satisfied by X(t) with independent increments, such as Poisson process and Brownian motion.These stochastic processes do not satisfy (A3).We show below that the estimator's asymptotic behavior, in particular, the rate of convergence, may depend critically on the smoothness of the covariate processes.The other assumptions (C1)-(C3) are similar to those in Theorem 1.
The following theorem, which is proved in the appendix, states the asymptotic properties of β from U n (β) in (2.5) under the weaker conditions specified above.

Theorem 2. Under (C0)-(C3), we have
where From Theorem 2, the bias is generally of higher order h and we achieve a rate of convergence n 1/3 , slower than n 2/5 specified in Theorem 1, where the bias is of order h 2 .This is the price to pay for only requiring right continuous differentiability of certain functionals as specified in (C0).The increased bias resembles the boundary bias phenomenon in classical nonparametric regresion due to the boundary asymmetric kernel.To reduce the bias, one might employ boundary adjustment approaches which have been well studied in the nonparametric literature.
Regarding the computation, once the kernel function K has been chosen and the bandwidth has been fixed, the estimating equation can be solved using a standard Newton-Raphson implementation for generalized linear models, with good convergence properties.
Along the lines of [1], the variance of the estimators may be obtained using a sandwich formula . The consistency proof of Σ is in the appendix.Automatic bandwidth selection may be achieved as in [1], with bias of order h, as described in the appendix.

Half kernel estimation
To improve efficiency, the weighted LOCF can be extended to include information on all previously observed covariate, not only the most recently observed covariate.This is achieved by applying kernel weighting to all covariates observed before the response.The half kernel estimating equation is where This accounts for the fact that the covariates and response are mismatched and only covariates that are observed before the response are used.If the observation times for covariates and response are close to each other, the kernel weight is close to 1; on the other hand, if they are far apart, the contribution to the estimating equation (2.8) may be 0. We solve U * n (β) = 0 to obtain an estimate for β, denoted by β * .We modify the assumption on the bivariate counting process for simple weighted LOCF using half kernel estimation.
For s ≤ t, the bivariate counting process N (t, s) satisfies (2.9) Similar to condition (2.6), (2.9) allows an informative observation process for times t < s.Hence, the half kernel estimation procedure above enjoys a robustness similar to that of weighted LOCF, which is not shared by the full kernel approach as in Theorem 1.

.10)
where A * (β 0 ) is obtained by replacing λ{s, s; X(s)} by λ * {s, s; X(s)} in A(β 0 ), where Ḟ * β0 and K * β0 are obtained by replacing λ{t, s; X(s)} by λ * {t, s; X(s)} in Ḟβ0 and Kβ0 , respectively, and For the half kernel approach, the bias and variance are generally of the same order as in the weighted LOCF.Improved bias properties are also possible under the following condition which may be satisfied by processes with independent increments: (C4) Ḟ * β0 (s, s+) = 0 and K * β0 (s, s+) = 0.When X(t) follows a homogeneous Poisson process or the Brownian motion and g is the identify function, Ḟ * β (s, s+) = 0 holds for all β.If λ * {t, s; X(s)} in (2.9) is constant, K * β (s, s+) = 0 for all β.Consequently, C * = 0 and the estimation bias for the half kernel based estimator β * is of order O(h 2 ) as specified in the following corollary.
Corollary 2. Under same conditions as in Theorem 3 and with the addition of (C4), we have where where F * β0 and K * β0 are obtained by replacing λ{t, s; X(s)} by λ * {t, s; X(s)} in Fβ0 and Kβ0 , respectively, and Σ * is the same as in Theorem 2.
This improvement in the convergence rate is shared by the weighted LOCF estimator.One might expect that half kernel estimation has smaller variance than the weighted LOCF, owing to the use of all previously observed covariates.While it is not possible to show that the half kernel estimator generally has smaller theoretical variance than weighted LOCF, simulations reported in Section 3 evidence some improvements.Interestingly, these differences are fairly small and diminish with large sample sizes.Variance estimation and bandwidth selection for half kernel estimation follow that for the weighted LOCF case and are omitted.

Revisiting full kernel estimation
For further efficiency improvement, a full kernel approach may be employed, as in section 2.1.We here consider the properties of this estimator under weaker conditions than those specified in section 2.1 We relax (A1) by allowing the measurement times to depend on covariates through We solve U f n (β) = 0 to obtain an estimate for β, denoted by β * f .We require a stronger assumption than (C0) for weighted LOCF and for half kernel estimation, as specified below.
(C0*) The intensity function of the counting process N (t, s) is specified in (2.12) for all s, t.In addition, for any β in a neighborhood of β 0 , the true regression coefficient, This assumption is stronger than those for weighted LOCF and half kernel estimation.This assumption does not permit dependence of the bivariate observation process on the response Y (t) at any times s and t.It is weaker than (A3) in that it permits the observation process to depend on the covariate process.It also weakens (A3) by relaxing the smoothness conditions on X(s), covering the important special case of independent increments.The following theorem, which is proved in the appendix, states the asymptotic properties of the full kernel estimator under these more general conditions.

Theorem 4. Under (C0*), (C1)-(C3), we have
where and For full kernel approach, the bias is of the same order as that in weighted LOCF and half kernel approach.However, the variance is smaller if λ{s, s; X(s)} = λ * {s, s; X(s)} = λ f {s, s; X(s)}.This is accomplished through utilizing both lagged and forward observations.
A special case of Theorem 4 gives the result in Theorem 1.

Corollary 3. Under the special case that
Theorem 3 is the same as Theorem 1.

Simulation studies and a real example
We conducted extensive simulation studies to evaluate the properties of the proposed estimators in practical settings.We first study the performance of LOCF estimate, the proposed weighted LOCF estimate, half kernel estimate and full kernel estimate when assumptions in Cao et al. [1] hold.We generate 1, 000 dataset, each consisting of n = 400 or 1000 subjects.The numbers of observation times for the response Y (t) and covariate X(t) are generated from Poisson distribution with intensity rate 5.The observation times for the response and covariates are generated from uniform distribution U(0, 1) independently.The covariate process is Gaussian, with values at observed time points being multivariate normal with mean 0, variance 1 and correlation e −|tij −t ik | , where t ij and t ik are jth and kth measurement time for the response, both on subject i.The response process was generated from where β 0 is the intercept, β 1 is regression coefficient and (t) is Gaussian, with mean 0, variance 1 and cov{ (s), (t)} = 2 −|t−s| .Once the response is generated, we remove the covariate measurements at the response observation times to create the asynchronous data structure.In this simulation, we set β 0 = 1.5 and β 1 = 1.5 and assess the performance of β1 .The results are very similar for other choices of βs.
For weighted LOCF, half kernel and full kernel estimation, the kernel function is the Epanechnikov kernel, which is K(x) = 0.75(1 − x 2 ) + , with the automatic bandwidth selection described in the appendix used in the estimation.Similar results were obtained using other kernel functions.A total of 1000 simulated dataset were analyzed.
Table 1A and 2A summarize the results of these simulations.Additional results with n = 200 can be found in the Appendix.For standard LOCF, the biases and coverage probabilities are −0.192 and 4.4% when n = 400 and −0.189 and 0 when n = 1000.Weighted LOCF, half kernel estimation, and full kernel estimation perform satisfactorily in terms of bias, variance, and coverage probability, particularly with larger sample sizes.In this setting, where the assumptions in Theorem 1 are satisfied, full kernel estimation exhibits smaller bias and variance than either weighted LOCF or half kernel estimation.
We next study the case that covariates follow Poisson process with intensity 3. The independent increments set-up violates the assumptions in Theorem 1 and our theory suggests an improved rate of convergence for half kernel estimation versus full kernel estimation.The data generation scheme is otherwise the same.For LOCF, the biases and coverage probabilities are −0.053 and 69.6% when n = 400 and −0.055 and 38.0% when n = 1000.From Table 1B and 2B, we observe that the full kernel approach has substantially larger bias than the weighted LOCF and half kernel approach, as predicted by Corollary 2. The empirical variances and variance estimates are in good agreement.The coverage probabilities for the weighted LOCF and half kernel approach are close to the nominal level.Those for the full kernel are much lower than the nominal level, owing to the large biases.
We then study informative observation times depending on responses.We first generate the observation time of one response t 0 , which is U(0, 1) distributed.The raw observation times for covariates follows Poisson{exp(3)} uniformly distributed in the 0.3 neighborhood of t 0 .The rest of data generation is exactly the same as before.We use a thinning algorithm to determine whether to keep the covariate observation times, where the probability of keeping covariates observed before the response is 0.2 and after the response is min[1, 15exp{Y (t)/3}].For LOCF, the biases and coverage probabilities are −0.123 and 14.2% when n = 1000 and −0.123 and 0 when n = 5000.The assumptions in Theorems 2 and 3 on the bivariate intensity process for weighted LOCF and half kernel estimation are satisfied whereas those for full kernel estimation in Theorem 4 are violated.This is evidenced in Table 1C and 2C, where the bias is larger and the coverage probability is poor for full kernel.Weighted LOCF and half kernel approach have small bias, good agreement between estimated and empirical standard errors and coverage probabilities close to the nominal level.As full kernel approach uses more data, it has smaller standard error compared with weighted LOCF and half kernel approach.Note: "BD" represents different bandwidths, "Bias' (% )' is the empirical bias, "RB (%)" is the "Bias" divided by the true β 1 , "SD" is the sample standard deviation, "SE" is the average of the standard error estimates and "CP (%)" represents the coverage probability of the 95% confidence interval for β1 .Note: "Bias" is the empirical bias, "RB" is the "Bias" divided by the true β 1 , "SD" is the sample standard deviation, "SE" is the average of the standard error estimates and "CP (%)" represents the coverage probability of the 95% confidence interval for β1 .
not satisfied.For any subject, the covariate and response observation times are mismatched and the probability of observing complete data is 0. The data generation and implementation are detailed in the Appendix and the results are summarized in Table 3.We can see that IPW incurs substantial bias, which does not attenuate as sample size increases and therefore should not be used to analyze asynchronous longitudinal data.We now illustrate the proposed inferential procedures on a dataset from an HIV study [19], previously analyzed in [1].A total of 190 patients were followed from July 1997 to September 2002 in a university hospital.There are unequal numbers of repeated measurements on viral load and CD4 count and there are different measurement times for these two variables.In our analysis, we take log transformed CD4 counts as covariate and log transformed HIV viral load as response.We use estimating equations (2.5) and (2.8) with bandwidths h = 2(Q 3 − Q 1 )n −γ , where Q 3 is the 0.75 quantile and Q 1 is the 0.25 quantile of the pooled sample of measurement times for the covariate and response, n is the number of patients and γ = 0.3, 0.5, 0.7.The results are summarized in Table 3 with fixed bandwidths and data adaptive bandwidth.Results based on Theorem 1 are presented for comparison.
Earlier work has shown that LOCF produces a weak positive association between CD4 counts and HIV viral load, which is in an opposite direction to the known relationship between these variables.From Table 4, we see that weighted LOCF and half kernel produce similar point estimates and standard deviations, especially when bandwidths are small (n −0.5 or n −0.7 ).Full kernel approach in [1] has similar point estimates, but smaller standard deviation, due to the fact that it uses both forward and lagged covariates.Analysis based on newly proposed procedure and earlier method in [1] all showed statistically significant association between CD4 counts and HIV viral load, consistent with findings in the medical literature [11].

Concluding remarks
In this paper, we provide an intuitively appealing and rigorous formalization of LOCF for regression analysis with asynchronous longitudinal data.The resulting estimators are consistent and asymptotically normal, but with a rate of convergence which is slower than the usual parametric rate.The procedure performed well in simulations, evidencing substantial improvements in bias and coverage properties over the naïve LOCF.Its ease of implementation suggests that it has the potential to be practically useful in applications where LOCF is currently the method of choice, requiring only the addition of a weight to the generalized estimating equations.
Interestingly, the simulation studies demonstrated only a small loss of efficiency relative to half kernel estimation, which utilizes all previous covariate observations.Our intuition is that without stronger assumptions than those in the current paper, the most recently observed covariate contains the majority of information about the previous covariate values.As the sample size increases and the bandwidth shrinks, the half kernel estimation procedure only uses the most recent values of the covariates, similarly to weighted LOCF.Further work is needed for a rigorous comparison of the theoretical variances of these estimators.
Both weighted LOCF and half kernel estimation performed well with informative observation times, while full kernel estimation exhibited large biases and poor coverage.This lack of robustness should be weighted against the improved efficiency which may occur when necessary regularity conditions are satisfied.A related loss of efficiency may occur with covariates having independent increments, in which case both theoretical and simulation studies point to the superior performance of weighted LOCF and half kernel estimation.Additional numerical work would be valuable in further elucidating these issues.
Both GEE with synchronous data and our proposed approach for asynchronous data are valid when the data are missing completely at random as in [5,8].In GEE, with time-dependent covariates, [10] showed that parameter estimates are generally biased unless (i) the mean for the response at time t given all past, present, and future covariate values is equal to the that given the covariate values observed at t or unless (ii) independence estimating equations are used.The condition (i) is a strong assumption.When data are missing at random, (i) cannot be verified with the observed data and (ii) is a conservative approach which ensures valid estimation using complete data observations regardless whether (i) holds.When (ii) is adopted, it is challenging to improve efficiency, since the correlation structure in the data cannot be exploited in the working covariance matrix.Similar issues arise in our asynchronous data set-up, with further work needed to understand the extent to which valid estimation may be achieved with non-diagonal working covariance matrices and whether efficiency gains might be achievable.The efficiency issue is complicated by the fact that the asynchronous data estimators converge more slowly than the usual parametric rate obtained by GEE with synchronous data.
A key assumption for the proposed estimators is that the measurement times for previous covariates are independent of the current and future observed responses.This assumption excludes certain missing at random settings under which GEE with synchronous data might yield valid estimation based only on complete observations.On the other hand, GEE does not allow non-ignorable missingness.This contrasts with our approach, in which the missingness mechanism is specified by the bivariate intensity for the measurement times for the re-Note F β0 (s, t) = E[X(s)g{X(t) T β 0 }λ{t, s; X(s)}, K β (s, t) = E[X(s)g{X(s) T β} λ{t, s; X(s)}].After Taylor expansion of F β0 (s, s + hz) and K β (s, s + hz), we obtain where we use the assumptions that F β0 (s, t) and K β (s, t) are continuous functions for (s, t) ∈ [0, 1] ⊗2 and they have continuous left and right derivatives specified in (C0).Let h → 0 and since 1 0 K(z) = 0.5, we extract the main terms Moreover, A(β 0 ) is a positive-definite matrix from (C1), thus non-singular.
Similarly, we can show that Σ * → Σ * in probability and Σf → Σ f in probability.

A.4. Automatic bandwidth selection
Our method depends on the selection of bandwidth.We propose a data adaptive bandwidth selection procedure due to the fact that traditional cross-validation methods are not applicable as we have asynchronous measurement times for the covariates and response.Based on (2.7), (2.10) and (2.13), we first regress

Table 2
[1]ults of 1000 simulations under different scenarios: A: assumptions in Cao et al.[1]are satisfied; B: covariates follow Poisson process; C: informative observation time.

Table 3
Results of 1000 simulations based on IPW.A: last observed covariate; B: nearest covariate.