Additive models in censored regression
Introduction
Let be a lifetime which is observed under censoring from the right. Let be a vector of covariates. Put for the regression function of on , so the model becomes where the error term satisfies . Here denotes a time transformation such as the logarithm. Taking is useful in regression analysis because is no longer restricted to . Indeed, under (1) we have, provided that and are independent, where and are the cumulative distribution functions of given and of the transformed error , respectively. This is the so-called accelerated failure time model, widely used to analyze survival data in the regression framework. Note that an increasing value of results in a decreasing value of the time acceleration factor , thus leading to a better survival prognosis.
In the censored setup, we observe independent observations with the same distribution as , where , is the right-censoring variable assumed to be independent of , and . Unlike in the “iid” scenario (in which each observation receives mass or weight ), the weight associated to the th observation under censoring will be typically the jump of the Kaplan–Meier estimator at each point (), namely where is the rank of among the ordered ’s and where (in the case of ties) uncensored observations are assumed to precede the censored ones. When the error distribution is unknown, an approach that leads to consistent estimators is choosing in order to minimize where the family represents the a priori knowledge on the true regression. See Stute, 1993, Stute, 1996, Stute, 1999 for the parametric linear and nonlinear case, in which it is assumed , and see Orbe et al. (2003) for the partly linear case . A different estimator based on ranks was proposed by Chen et al. (2005) for the partial linear model. Gannoun et al. (2005) used the Kaplan–Meier weights in the context of quantile regression. Another possible approach (which we will not follow here) is that based on the so-called synthetic data, see for example Leurgans (1987) and Qin and Jing (2000) who considered the parametric linear case and the partial linear model, respectively (see also Liang and Zhou (1998), for the latter setup).
In some applications the linear model can be very restrictive. This constraint can be avoided by replacing the linear structure by a nonparametric structure. Here we consider a flexible approach to estimate the regression function through a semiparametric model under which the effect of each covariate on the response is represented in an additive way, the qualitative form of this effect being unknown otherwise. We assume the additive model (Hastie and Tibshirani, 1990), where is a constant and are one-dimensional functions. If the influence of the covariates is linear, then the corresponding partial functions can be expressed parametrically as . Therefore, the model given in (3) nests the linear model. Moreover, on assuming that effects are additive, these types of models maintain the interpretability of linear models. Yet, at the same time, they incorporate the flexibility of nonparametric smoothing methods because, rather than following a fixed parametric form, the effect of each of the covariates, , depends on a totally unknown function, , which is only required to possess a certain degree of smoothness so that it can be estimated. The compromise between flexibility, dimensionality and interpretability, ranks these types of models among the statistical tools with the greatest capacity for data analysis in different fields of research.
Additive models have been used for relaxing the linear hypothesis in the scope of Cox proportional hazards model, which is the most popular regression model when analyzing censored survival data. For example, Huang (1999) introduced efficient estimation for a partly additive Cox model. Huang and Liu (2006) considered a nonparametric link function which controls for the effect of the parametric predictor under proportional hazards. See also Ganguli and Wand (2006) and the references therein for extensions of the Cox proportional hazards model via additive regression. However, the proportional hazards assumption may not hold in some applications, and hence there is some need of additive models which can be a valid alternative to Cox regression. To the best of our knowledge, additive models in the scope of the censored accelerated failure time model have not been explored so far. This is the gap we fill through the present work.
The layout of this paper is as follows. In Section 2 we give a description of the weighted kernel smoothing backfitting we use for fitting additive models with censored response. Moreover, in Section 2.1 we discuss the bandwidth selection problem and some related practical issues. To assess the validity of this estimation procedure, a simulation study is performed in Section 3. In Section 4 we apply the proposed methodology to real data. Finally, we conclude with a discussion in Section 5.
Section snippets
Fitting censored additive models
This section describes an algorithm for fitting the model effects in (3) for censored response. The algorithm discussed below is a modification of the backfitting algorithm (Opsomer, 2000) used for fitting additive models. The backfitting algorithm cycles through the covariates (), and estimates each by applying local linear kernel smoothers to the partial residuals. These residuals are obtained by removing the estimated effects of the other covariates. Although our focus
Simulation study
A simulation study was conducted to assess the finite sample behavior of our proposed algorithm. Given the vector of covariates in , the response variable was generated according to the model with . The censored variable was drawn independently from a Uniform. Note that the constant determines the expected proportion of censored responses. We have chosen several values for in order to get censoring percentages of about 0%, 15%, 33%, 50% and
Application to real data
Between January 1974, and May 1984, the Mayo Clinic conducted a double-blinded randomized trial in Primary Biliary Cirrhosis (PBC) of the liver. A total of patients agreed to participate in this clinical trial. The data were analyzed in 1986 for presentation in clinical literature (see Fleming and Harrington (1991)). Main variable of interest (the ) was the number of days between registration and death, possibly censored because of end of study or liver transplantation. By July 1986, 125
Discussion
In this paper we introduce a new approach to the estimation of additive models in censored regression. Specifically, we propose an extension of the accelerated failure time regression model through additive regression, which constitutes a novelty in the context of censored regression. Weighted backfitting based on kernel smoothers has been used for estimating the model, and the smoothing windows were selected employing the cross-validation techniques. Using cross-validation bandwidths implies
Acknowledgements
We thank two anonymous referees for their careful reading of the paper and suggestions which have improved the presentation of the paper. Javier Roca Pardiñas was supported by grants MTM2005-00818 (European FEDER funding included) and SEJ2004-04583 /ECON. Jacobo de Uña Álvarez was supported by grants MTM2005-01274 (European FEDER funding included) and MTM2008-03129. Both authors also acknowledge support by grant PGIDIT07PXIB300191PR.
References (36)
- et al.
Simultaneous selection of variables and smoothing parameters in structured additive regression models
Computational Statistics & Data Analysis.
(2008) - et al.
Generalized structured additive regression based on Bayesian P-splines
Computational Statistics & Data Analysis.
(2006) Asymptotic properties of backfitting estimators
Journal of the Multivariate Analysis
(2000)- et al.
Asymptotic properties for estimation of partial linear models with censored data
Journal of Statistical Planning and Inference
(2000) - et al.
Testing the link when the index is semiparametric—A comparative study
Computational Statistics and Data Analysis
(2007) Consistent estimation under random censorship when covariables are present
Journal of Multivariate Analysis
(1993)Selection of components and degrees of smoothing via lasso in high dimensional nonparametric additive models
Computational Statistics & Data Analysis
(2008)- et al.
Effect measures in nonparametric regression with interactions between continuous exposures
Statistics in Medicine
(2006) - et al.
Rank estimation in partial linear model with censored data
Statistica Sinica
(2005) - et al.
Simple incorporation of interactions into additive models
Biometrics
(2001)
Flexible smoothing with B-splines and penalties
Statistical Science
Fast implementation of nonparametric curve estimators
Journal of Computational and Graphical Statistics
Counting Processes and Survival Analysis
Additive models for geo-referenced failure time data
Statistics in Medicine
Non-parametric quantile regression with censored data
Scandinavian Journal of Statistics
Inference in smoothing spline analysis of variance
Journal of the Royal Statistical Society Series B
Bootstrap inference in semiparametric generalized additive models
Econometric Theory
Generalized Additive Models
Cited by (9)
Low dimensional semiparametric estimation in a censored regression model
2011, Journal of Multivariate AnalysisCitation Excerpt :Burman [3] estimated generalized additive models by splines. Alvarez de Uña and Roca-Pardiñas [2] provided a randomly weighted version of the backfitting algorithm that allows for the nonparametric estimation of the effects of the covariates on the response. Lewbel and Linton [17] introduced an estimator for censored and truncated regression.
seq2R: An R Package to Detect Change Points in DNA Sequences
2023, MathematicsA new lack-of-fit test for quantile regression with censored data
2021, Scandinavian Journal of StatisticsEstimation in additive models with fixed censored responses
2019, Journal of Nonparametric StatisticsSemiparametric principal component poisson regression on clustered data
2017, Communications in Statistics: Simulation and Computation