Varying coefficient models having different smoothing variables with randomly censored data

The varying coefficient model is a useful alternative to the classical linear model, since the former model is much richer and more ﬂexible than the latter. We propose estimators of the coefficient functions for the varying coefficient model in the case where different coefficient functions depend on different covariates and the response is sub ject to random right censoring. Since our model has an additive structure and requires multivariate smoothing we employ a smooth backﬁtting technique, that is known to be an effective way to avoid “the curse of dimensionality” in structured nonparametric models. The estimators are based on synthetic data obtained by an unbiased transformation. The asymptotic normality of the estimators is established, a simulation study illustrates the reliability of our estimators, and the estimation procedure is applied to data on drug abuse.


Introduction and model
Investigating a relation between a response and a set of covariates is a key issue in many statistical problems. Among others, mean regression models extract central trends of data by specifying the conditional mean function of a response variable given values of the covariates. A number of regression models and estimation methods have been proposed in the literature. The most traditional and simplest way to model the relation is to employ the classical linear regression model. However, this model is often too restrictive and unable to capture complicated characteristics which might exist in the data. From this point of view, the varying coefficient model is a very useful alternative. It was first proposed by [9] and takes the following form: m(X, Z) = Z 1 α 1 (X 1 ) + · · · + Z d α d (X d ), (1) where X = (X 1 , . . . , X d ) ⊤ , Z = (Z 1 , . . . , Z d ) ⊤ and m(x, z) is a conditional mean function of some response given X = x and Z = z. The α j 's are unknown coefficient functions. This model allows each coefficient function to depend on different covariates, which is not the case for many other models available in the literature. It makes the model much more flexible compared to the classical linear model, since each coefficient function is modelled nonparametrically. Moreover, by considering this model, one can incorporate nonlinear interaction effects into the model. The structure of the model is simple, since the conditional mean function is still linear in the Z j variables. If all coefficient functions are constant functions, the model reduces to the classical linear model. On the other hand, situations in which a response is not fully observed due to random right censoring are often encountered, for example, in medical research where patients may leave a study for various reasons. In this case, well-known regression techniques are not directly applicable since the response is only partially observed. To deal with this random right censoring, the synthetic data approach based on unbiased transformations has been studied by many authors. [11] and [15] first proposed estimators based on different types of transformations in the classical linear model, and they were further studied by [27,28,23,12] and many others. [6] extended these results to nonparametric regression models and they considered a more general transformation including the transformations given in [11] and [15] as special cases. [5] further generalized the transformation and adapted the method to dependent censored data. By using a synthetic data method, one first transforms data preserving the conditional mean, and one then applies existing regression techniques as if the responses were not censored.
In this paper, for a response variable Y which is subject to random right censoring, we consider the problem of estimating the conditional mean regression function of φ(Y ) given covariates for some known function φ. We assume that the regression function has the varying coefficient structure, that is, Note that we are estimating the conditional mean of φ(Y ) rather than that of Y . In accordance with one's interest, various choices are possible for φ. For example, the choice φ(y) = I(y ≤ t) (for fixed t) corresponds to the estimation of the conditional probability function and letting φ be the identity function leads to the estimation of the conditional mean of Y . For the estimation of our model, we employ a smooth backfitting (SBF) technique that is known to be an effective estimation method for structured nonparametric models. Note here that model (1) has an additive structure similar to the additive model. The SBF method was originally introduced by [21] for the additive model, and [13] studied it under the varying coefficient model. Unlike marginal integration methods, see for example [26], it is known that the SBF method is free of the curse of dimensionality which usually arises when multivariate smoothing is required, since it requires only one and two dimensional smoothing. It is worthwhile to mention that model (1) has some advantages over the additive model. As pointed out before, nonlinear interaction effects can be dealt with in the former model but not in the latter model. The additive model assumes that each covariate affects the response separately. Another advantage is that the former model allows discrete variables, whereas all covariates of the latter model have to be continuous. The major hurdle of model (1) is that covariates need to pair up, which sometimes appears to be artificial. Nevertheless, even if this model is not true, it may still be used to approximate the true regression function. Recently, [14] introduced a more flexible varying coefficient model. They allow the cases where the model can contain all possible interaction effects between Z j variables and X j variables. In this paper, we restrict our attention to model (1) in the censored data context. We believe that our results can be extended to the model given in [14].
As mentioned before, regression models with censored data have been extensively studied, but most attention has been given to the case of univariate covariates. Recently, there have been several papers in the context of more than one covariate. Among others, [17] and [19] considered single index modelling, which is known to be a useful dimension reduction technique. They proposed an estimator of the parameter vector in this model when random right censoring is present, and they derived their asymptotic properties. [2] studied the partially linear varying coefficient model with random right censoring, which is an extension of [8] to the censored data context. In fact, the model studied in [2] becomes a particular case of our model by letting X 1 = · · · = X d in model (1) if we ignore the parametric part. Its estimation is substantially simpler than ours, since each coefficient function depends on the same univariate covariate so that only univariate smoothing techniques are required. Additive regression modelling with censored data was studied in [4] based on the backfitting algorithm proposed by [22]. However, theoretical properties have not been established. The purpose of this paper is to offer a very flexible model and to study its estimation with censored data when there are several covariates.
The rest of the paper is organized as follows. In Section 2 we introduce a well-known unbiased data transformation technique. Section 3 presents our main theoretical results. The proposed method based on local linear fitting is described and its asymptotic properties are established. In Section 4 we briefly show the extension of the results to local polynomial fitting. Section 5 is devoted to numerical studies, in Section 6 we discuss how to choose the bandwidth parameter, and in Section 7 the estimation procedure is applied to data on drug abuse. We conclude by giving some discussion in Section 8. The proofs of the theorems and lemmas are given in the Appendix at the end of the paper.

Transformation of data
be the vector of covariates and let Y and C be a response and censoring variable, respectively. For randomly right censored data, we observe (T i , δ i , U i ) i = 1, . . . , n, a random sample of (T, δ, U), where T = Y ∧ C and δ = I(Y ≤ C), and where a∧b denotes the minimum value of a and b. Here, the problem is that the Y i 's are not fully observed due to censoring so that E(φ(Y )|X = x, Z = z) cannot be estimated in a direct way. For the unbiased transformation of the data, the following assumptions are needed: These are common assumptions made when one uses the Kaplan-Meier estimator for the censoring distribution. We consider the transformation given by [11]: where G is the distribution function of the censoring variable C. Under the above assumptions, we have so that the conditional mean is preserved under this transformation. The variable Y G is observable as long as G is known. So, with Y G i instead of φ(Y i ) one can apply existing regression techniques for uncensored data.
We impose another assumption on the function φ: (A3) Let τ be the right endpoint of the support of T , and let I = (−∞, τ 0 ] for some τ 0 < τ . We assume that φ is bounded on I, and equals 0 outside the interval I. This kind of truncation is common in the context of censored regression. It is necessary to deal with the lack of information in the right tail of the distribution of Y . See, for example, [19] and [5]. The choice of the truncation point τ 0 should be done carefully. If τ 0 is too large, it may produce a very large synthetic response in (2), possibly resulting in an estimator with large variance. In any case, τ 0 should not be larger than the largest observed time. On the other hand, a relatively small τ 0 may truncate data too much, which means loosing more information. See Remark 3 below for more about this subject and for a discussion on how to choose τ 0 using the data. We assume that (A1)-(A3) hold throughout the paper.

Estimation method with local linear fitting
We start with the (unrealistic) case where the distribution G is known. In a second step we will verify what changes when G needs to be estimated.

Smooth backfitting with censored data when G is known
In kernel regression, it is widely known that procedures based on local linear fitting have better theoretical properties than those based on local constant fitting, which suffer from boundary problems. The local linear method corrects the boundary problem. Moreover, the local constant SBF estimator does not have the oracle property, but the local linear SBF estimator does. Here, the oracle property means that the estimator of each component function has the same asymptotic distribution as if we knew all other remaining coefficient functions. This is demonstrated in [21] for the additive model and in [13] for the varying coefficient model. In this section, we introduce the local linear SBF method based on the unbiased transformation introduced in the previous section when G is known. For this, we present the estimation method along the lines of [13] and explain how one can apply the existing method to our case. The local linear estimation technique can be applied to estimate the coefficient functions via the approximation α j ( . Next, we consider the following least squares criterion weighted by a kernel function: where Z i,j and X i,j denote the jth component of Z i and X i , . , x d ) ⊤ and h = (h 1 , . . . , h d ) ⊤ is a bandwidth vector. Observe that the criterion is a smoothed version of the kernel weighted local least squares criterion obtained by doing integration. This is why this method is called the "smooth" backfitting. A boundary corrected kernel is used for this estimation as in [21] and [13]. It is given by for some base kernel function K. If we let

then, (3) can be rewritten as
When G is known, our estimator, let's sayα G , is defined as the minimizer of (5) over f , when this least squares criterion has finite values. The SBF method can be better understood by the projection theory. So we represent our estimator in the context of the projection theory. We define some spaces of tuples of functions as follows: Note that (5) has finite values if and only if f 2M < ∞. Therefore, our minimization problem takes place in the space H(M). Note further that (5) can be decomposed into two parts as (see [13]) which is the minimizer of (5) in the space L 2 (M). Equipped with the norm · M , the spaces defined above are Hilbert spaces and our estimatorα G can be expressed as follows: where the operator Π(·|S) stands for a projection onto S. Note thatα G is unique since it is defined as a projection onto the Hilbert space H(M). Moreover, by considering Gâteaux derivatives, one can show thatα G = (α G 1 , . . . ,α G d ) ⊤ satisfies the following SBF equation: Note that in generalα The solution of (6) is given by the following SBF algorithm: One can iterate the above algorithm for r = 1, 2, . . . , with some initial valueŝ α , until it converges. Then, the limit of the algorithm is the estimate of the coefficient function. Note that, here the first component of where ⊗ denotes the Kronecker product and p is the density function of X. [13] introduced the SBF algorithm to solve the SBF equation for non-censored data. Using the same arguments as therein, one can show that, under Assump- whereÛ =P d · · ·P 1 ,P j = Π(·|H j (M) ⊥ ), where S ⊥ stands for the orthogonal complement of S and H j (M) (j = 1, . . . , d) are subspaces of H(M) defined as Formula (9) is very useful to derive asymptotic results since it gives an explicit formula for the limit of the SBF algorithm. In the following, we collect the assumptions needed for the convergence of the SBF algorithm and the asymptotic results.
Assumption B.
The density p of X is bounded away from zero and is continuous on [0, 1] d . (B4) K is a bounded and symmetric density function supported on [−1, 1] and is Lipschitz continuous. (B5) The bandwidth h j satisfies h j → 0 and log n/nh j → 0 as n → ∞ for j = 1, . . . , d.
Assumption C.
Under Assumption (B), one can show that Û op < 1 and r G M < ∞ with probability tending to one, where · op denotes the operator norm defined in the space H(M). If we choose a starting point satisfying α G,[0] M < ∞, then it can be shown that (9) is indeed the unique solution of the SBF equation (6). The following Lemma is a direct application of Theorems 3 and 4 in [13]. [r] converges to the unique solutionα G of (6) with probability tending to one provided that the initial point satisfies

under Assumptions (B) and (C)
, if h j and n −1/5 are of the same order, then for any x ∈ (0, 1) d and for j = 1, . . . , d,α G j (x j ) are asymptotically independent, and , the asymptotic variance V j (x j ) is larger than the corresponding asymptotic variance for the uncensored case. This is a common situation that arises in censored data since the synthetic data method inflates uncensored observations.

Smooth backfitting with censored data when G is unknown
We defined our estimator and derived its asymptotic distribution in the previous section as if we knew the censoring distribution G. However in practice G is, unfortunately, unknown, but it can be consistently estimated by the following Kaplan-Meier estimatorĜ given by .
VC models with censored data

235
Replacing G byĜ in (5) gives the following redefined loss function SLĜ(f ): where . Then we define our estimatorαĜ based on the estimated transformed data YĜ i as follows: The estimatorαĜ satisfies the SBF equation (6) with G being replaced bŷ G. Unlike the case where G is known, the direct application of the theorems in [13] is not valid when G is estimated by the Kaplan-Meier estimator since YĜ 1 , . . . , YĜ n are not independent. Below is a useful lemma for investigating the properties of the SBF algorithm. Recall that the definition ofα G j (x j ) is given in (7). The estimatorαĜ j (x j ) is defined by replacing G byĜ.
This lemma tells us that the difference betweenαĜ j (x j ) andα G j (x j ) is uniformly bounded by the approximation error of the censoring distribution.

Convergence of the smooth backfitting algorithm
As in Section 3.1, one can find the solution of the SBF equation by the application of the SBF algorithm with G being replaced byĜ. Since Û op < 1 is already established in [13], to show the convergence of the SBF algorithm, it suffices to show that rĜ M < ∞, whererĜ = (I −Û )αĜ, with an initial value satisfying αĜ ,[0] M < ∞. There may exist many possible choices for the initial point. Among them, αĜ ,[0] = (αĜ j (x 1 ) ⊤ , . . . ,αĜ j (x d ) ⊤ ) ⊤ can be a good suggestion since with this choice, Remark 2. Theorem 1 holds for anyĜ satisfying sup t≤τ0 |Ĝ(t)− G(t)| = o p (1). This condition holds for the Kaplan-Meier estimator; see e.g. [24]. On the other hand, the uniform consistency ofĜ is not necessary for the convergence of the SBF algorithm. For that, one needs only to impose some finite moment condition on the estimated transformed response YĜ. However, the limit of the algorithm may not estimate the true coefficient functions consistently unlessĜ is consistent.

Asymptotic distribution of the smooth backfitting estimator
In this subsection, the asymptotic distribution of the SBF estimatorαĜ(x) will be presented. Recall that the asymptotic distribution ofα G (x) was already given in Lemma 1. If we show that the difference betweenαĜ(x) andα G (x) is negligible at a certain rate, the desired result will follow. This can be done by using the fact thatαĜ andα G are the further projections ofαĜ andα G onto H(M). We demonstrated in Lemma 2 that the difference betweenαĜ j (x j ) and α G j (x j ) is bounded by the approximation error of G in probability. Note here The same is true when G is replaced byĜ. However, sinceα G j (x j ) (i.e., the jth component function of the projection ofα G ) has the same asymptotic variance asα G j (x j ), we can expect it is also true forαĜ j (x j ) andαĜ j (x j ). We will prove the next lemma using this idea.
From Lemma 3, we conclude thatαĜ(x) −α G (x) = o p (n −2/5 ) for any x ∈ [0, 1] d if G is continuous, since G is approximated at the rate O p ((log n/n) 1/2 ) by the Kaplan-Meier estimator (see e.g. [16]). We already know the asymptotic distribution ofα G (x) and its rate of convergence. So, a direct application of Lemma 3 together with Lemma 1 gives the following theorem.

Theorem 2. Under Assumptions (B) and (C)
, if h j and n −1/5 are of the same order and if G is continuous, then for any x ∈ (0, 1) d and for j = 1, . . . , d, αĜ j (x j ) are asymptotically independent, and where β j (x j ) and V j (x j ) are defined in the statement of Lemma 1.
Remark 3. The results obtained so far are based on the truncation of the observed time beyond τ 0 . Choosing τ 0 by using the available information in the data would be more natural than some deterministic choice. One possible way is to set τ 0 = T n−[nκ],n where κ ∈ (0, 1) is fixed and T k,n denotes the kth order statistic and [r] the integer part of r. In this case, the uniform rate of convergence for the Kaplan-Meier estimator is the same as when we consider a fixed truncation point quantile of the distribution of T , both resulting in deleting 100 × κ% of the data corresponding to the largest observations. The theoretical choice of κ depends on the censoring mechanism, see [3], for more details. If censoring is "light", i.e., if condition (2.8) in [3] is satisfied, then one can choose τ 0 as large as T n,n . In practice, the most used truncation value is the largest uncensored observation.

Extension to local polynomial fitting
In this section, we extend the results studied in the previous section to the local polynomial setting. We focus on the case of odd orders since they are known to be preferable to the even order cases; see Section 3.3.2 in [7]. We will briefly show the results without proofs. The following is the redefined loss function to be minimized for the estimation of the coefficient functions, where we approximate the coefficient functions by a pth order Taylor expansion: andĜ is the Kaplan-Meier estimator of G. LetαĜ p be the minimizer of SLĜ p (f ) over f when SLĜ p (f ) < ∞. Then,αĜ p (x) = (αĜ p,1 (x 1 ) ⊤ , . . . ,αĜ p,d (x d ) ⊤ ) ⊤ is the local polynomial SBF estimator and satisfies the SBF equation analogous to (6), which we will not present in detail. Moreover, the SBF algorithm to find the minimizer can be given in the same way as in (8) and its convergence is guaranteed with probability tending to one under Assumption (B). Note that the estimator of the kth derivative of α j (x j ) is given by k!αĜ p,jk (x j )/h k j , whereαĜ p,j (x j ) = (αĜ p,j0 (x j ), . . . ,αĜ p,jp (x j )) ⊤ , sinceαĜ p,jk (x j ) is an estimator of We need an additional smoothness condition for the asymptotic distribution of the local polynomial SBF estimator: (C2 ′ ) The function α j , j = 1, . . . , d, is p + 1 times continuously differentiable on (0,1), and E(Z j Z k |X = x) is continuously partially differentiable in x ∈ (0, 1) d for j, k = 1, . . . , d.
The next lemma is analogous to Lemma 3 and gives the approximation error betweenαĜ p (x) andα G p (x). The proof is omitted.
for odd p.
Now, the following theorem follows from Lemma 4.

Simulation study
In this section, we will present the finite sample performance of the proposed estimator. We generate random samples from the following model: where m(X, Z) = Z 1 α 1 (X 1 ) + Z 2 α 2 (X 2 ) + Z 3 α 3 (X 3 ). The variables X 1 , X 2 and X 3 are generated from U [0, 1], and the vector (Z 2 , Z 3 ) ⊤ from a bivariate normal distribution with mean (0, 0) ⊤ , and variance 1 0.5 0.5 1 , independently of X = (X 1 , X 2 , X 3 ) ⊤ . We take Z 1 ≡ 1, α 1 (x) = 1 + exp(2x − 1), α 2 (x) = 0.5 cos(2πx) and α 3 (x) = x 2 . The standard deviation function is set to σ(x, z) = 0.5 + ). The error ǫ was generated from a normal distribution with mean 0 and standard deviation γ. A similar model was considered in [26] and [13]. We also generate a normal censoring variable with mean µ and variance 1.5. Here, µ was selected to control the percentage of censoring (PC). We set φ(t) = tI(t ≤ τ 0 ) so that the objective of this study is to estimate the truncated conditional mean of Y given X = x and Z = z. For truncation, τ 0 = 5 was used, which means that only a small proportion of the observed T i 's are truncated.
We examine the performance of our estimator for several choices of PC. We try three cases µ = 4.4197, 3.1083 and 2.2, which yields approximately 10%, 30% and 50% of censoring, respectively. The noncensored case is also considered to see how random right censoring affects the estimation in our model. The coefficient functions are estimated by the local linear SBF method. The trapezoidal rule with 51 equally spaced grid points on [0,1] is used for the numerical integration. We compute the estimated mean integrated squared error (MISE) of the regression function: where N stands for the number of replications, T for the size of a test sample andm [j] , j = 1, . . . , N, is the local linear SBF estimator for each replication. We choose N = 500 and T = 500. We try 8 3 bandwidth choices (h 1 , h 2 , h 3 ) ∈ {0.05, 0.15, . . . , 0.75} 3 , and the Epanechnikov kernel is used for the kernel K. We run simulations for different sample sizes, different noise levels and different censoring percentages. Tables 1 and 2 report the results for sample sizes n = 200 and 400, and for different values of γ and PC. Each time we report the result for the bandwidth vector which minimizes the MISE. With the optimal bandwidth, which yields the optimal result for each setting, the MISE values for each coefficient function are also computed and presented in those Tables. As expected we find overall increasing patterns in MISE as PC and γ increase. In the censored cases, there is a tendency for the ratio IV/IB 2 to decline when PC=50%, which could be counterintuitive. One possible reason is that, with high Table 1 Optimal results when estimating the regression function m for n = 200. Here, γ is the standard deviation of the error, PC is the percentage of censoring, FUN is the function of interest, MISE is the mean integrated squared error, IV is the integrated variance, and IB 2 is the integrated squared bias PC, optimal bandwidths are selected to be very large to control the explosion of the variance, which results in relatively large biases. We also find that the MISE decreases as n doubles, and that the rate of decrease is close to 2 −4/5 ≈ 0.57. Note here that the asymptotic MISE of our estimator is proportional to n −4/5 with the optimal bandwidth rate h j ∼ n −1/5 . These results confirm that the proposed estimator works rather well.

Bandwidth parameter selection
In this section, we introduce a data-driven bandwidth selector for local linear fitting, which is based on the method given in [13]. They proposed to estimate the unknown quantities which appear in the optimal bandwidth minimizing the Table 2 Optimal results when estimating the regression function m for n = 400. Here, γ is the standard deviation of the error, PC is the percentage of censoring, FUN is the function of interest, MISE is the mean integrated squared error, IV is the integrated variance, and IB 2 is the integrated squared bias asymptotic mean integrated squared error by fitting some polynomial regression models. We simply adapt their method to the censored data context. From Theorem 2, the optimal bandwidth when local linear fitting is applied is given We estimate , with c j,k being the minimizers of and where ρ is a given loss function and s is the degree of the polynomial used to approximate m(X i , Z i ). Note that, to deal with censoring, the estimated synthetic response is used instead of the response itself. Other unknown quantities can be estimated in a similar manner. A natural choice for ρ would be the squared loss function ρ(u) = u 2 . However, with this loss function, selected bandwidths produced unsatisfactory results. Note that, in formulae (11) and (12), only E(Z 2 j σ 2 G (X, Z)|X j = x j ) is affected by censoring, which means that, theoretically, other quantities are invariant regardless of the occurrence of censoring. Nevertheless, some large values of YĜ i inflated by the unbiased transformation may cause a significant increase of the estimates of α ′′ j (x j ) 2 p j (x j )dx j as the percentage of censoring increases. To address this problem, we use the following Huber loss function instead of the squared loss function for the estimation of α ′′ j (x j ) 2 p j (x j )dx j : This function is typically used in robust estimation. By employing this loss function, we expect that large values of YĜ i can be prevented from having too much effect on estimating α ′′ j (x j ) 2 p j (x j )dx j . Tables 3 and 4 show the performance of the above bandwidth selection procedure. We generate 500 random samples from the same model as in Section 5. The Gaussian kernel is used for the multivariate local linear kernel estimator, since the Epanechnikov kernel gives very poor estimates due to its compact support. To estimate the unknown quantities, we use a cubic polynomial for α j (x j ) and a linear polynomial for the other functions. The tuning parameter k is set  Table 3 shows how the automatic bandwidth selector works. We compute the ratio of the MISE obtained with bandwidthsĥ opt andĥ a respectively, that is, MISE(ĥ opt )/MISE(ĥ a ). Here,ĥ opt is the optimal bandwidth described in Section 5 andĥ a is the data-driven bandwidth proposed in this section. It follows from Table 3 that our bandwidth selector works reasonably well, since the values of the ratios are not so far from 1. The selection procedure is influenced by censoring, however, the noise level (γ) has no or only very limited effect. The ratios with censoring have relatively large values compared to the noncensored case. An interesting finding is that the ratios do not have an increasing trend in PC. It means that our selection procedure works well, and does not break down even with high PC. In Table 3, there is a ratio smaller than 1. Indeed, this can happen in finite sample studies, since our automatic bandwidth selector gives dataadaptive bandwidths for each sample whereas the optimal bandwidth is selected as the best one among a set of bandwidths that are the same for all samples.
We also compare our SBF estimator based on the above automatic bandwidth selector to the multivariate local linear kernel (MK) estimator based on the np package in R. The np package offers a bandwidth selector based on the crossvalidation principle. In Table 4, we see that our SBF estimator outperforms the MK estimator. In particular, the integrated variance of the MK estimator increases very rapidly compared to the integrated bias as PC becomes high.

Real data example
In this section, we analyze a dataset that comes from the University of Massachusetts AIDS Research Unit (UMARU) IMPACT Study. This is a 5-year collaborative research project about drug abuse. A detailed description of the study can be found in [10]. There were two different treatment programs done on two different sites (A and B). In this example, we focus on the study done on site A. Here, our objective is to study how subject's characteristics and the length of treatment affect time to return to drug use, without making strong and restrictive assumptions about the underlying regression model. There are 398 observations, excluding two extreme points and subjects having missing values for some covariates. The covariates that we consider are: AGE (age in years), BECK (Beck depression score), IVHX (drug use history; 0=Never, 1=Present), NDT (number of prior drug treatments) and LOT (length of treatment in days). We consider a logarithmic transformation for the variable NDT to get rid of a sparse region. The observed time is TTRD (time to return to drug use in days), which is right-censored with a percentage of censoring of about 20%. In our model, the response variable is Y = log(TTRD/365.25). From the data, we calculate the synthetic response YĜ, see (2), using τ 0 which corresponds to the 98% empirical quantile. We also tried other values of τ 0 and the results were quite similar.
Since LOT is of great importance for the study and since IVHX is binary, we consider the following family of varying coefficient models: Depending on the choice of the covariates X 2 , X 3 and Z 3 , there are 6 possible models of the above form. To select one of them, we divide the sample into two parts: The first part is used to estimate the models and the second part is used to assess the performances of the fitted models. For the latter, a test sample of size 80 was randomly drawn from the whole sample in order to estimate the estimated prediction errors (EPR): The final model that gives the smallest EPR is Here, the bandwidths obtained by our automatic selection procedure, see Section 6, are (ĥ 1 ,ĥ 2 ,ĥ 3 ) = (0.148, 0.341, 0.603). Figure 1 depicts the estimated coefficient functions. As can be seen, the returned time to drug use increases as LOT increases. The increase is sharper at a lower level of LOT than for a higher level. The number of days of treatment would be of great benefit. The second picture shows that the coefficient of IVHX is negative for all values of BECK. Therefore, if some patient has a drug use history, he/she tends to return to drug use earlier. It also tells us that time to return to drug use decreases with Beck depression score. Lastly, the third estimated coefficient function seems to be nearly linear, which indicates that AGE and NDT have a linear interaction effect. Interestingly, this function passes through 0 around AGE=46. This means that for young patients (less than 46 years old), NDT has a negative effect on the time to return to drug use, i.e. they tend to return to drug use earlier if they experienced many drug treatments. An opposite trend is observed for older patients. This seems reasonable, since large values of NDT (Number of prior drug treatments) for young people means that they are strongly addicted to drugs.

Discussion
In this paper, we propose a smooth backfitting (SBF) estimator for the coefficient functions in a varying coefficient model having different covariates as smoothing variables when there is random right censoring in the response. We focus on the case where the censoring does not depend on the covariates, which is the case, for example, when the censoring occurs at the end of the study. However, if there is some belief that censoring is affected by the characteristics of the subjects, then considering the dependency between the censoring variable and the covariates in the estimation procedure could be appealing. In this case, the synthetic response is given by where G U denotes the conditional distribution of C given U = (X ⊤ , Z ⊤ ) ⊤ , that is, G U (·) = P (C ≤ ·|U). Note that U rather than its value u is used here, since the SBF method is minimizing a global criterion induced by integration. This approach also preserves the conditional mean of Y given the covariates if we replace assumptions (A1) and (A2) by the conditional independence assumption between Y and C given U. Similar ideas have been used in the literature. See [25] for an example. In the dependent censoring case, the Beran estimator [1] can be used as an estimator of G U . Nevertheless, this may cause the well-known "curse of dimensionality" problem, because in our model the dimension of the covariates is large in general. Recall that the motivation for employing the SBF method is to avoid "curse of dimensionality" in fitting coefficient functions. In this case, it is possible to restrict attention to a proper subset of covariates as variables to estimate G U . Another alternative is to consider parametric or semiparametric models to avoid high dimensional smoothing.
Proof. This follows from a slight modification of Proposition 4 in [20], if we substitute the kernel function (1/h)K((u − v)/h) therein by the boundary corrected kernel K h (u, v).
Proof of Theorem 1. First note that, by using similar arguments as in [13], rĜ M ≤ C d j=1 αĜ j0 (x j ) 2 q j (x j )dx j + µ 2 (K) · αĜ j1 (x j ) 2 q j (x j )dx j 1 2 , for some constant C > 0 with probability tending to one, whereαĜ jk (x j ), k = 0, 1, is the estimated version ofα G jk (x j ) with G being replaced byĜ. We only prove that αĜ j0 (x j ) 2 q j (x j )dx j < ∞ with probability tending to one. The proof for αĜ j1 (x j ) 2 q j (x j )dx j < ∞ can be done similarly. For all j = 1, . . . , d, The fact that the first term of (16) is bounded with probability tending to one was established in [13]. For the second term, observe that by Lemma 2. Therefore rĜ M < ∞ with probability tending to one. This completes the proof.