Skip to main content

Advertisement

Log in

Sparse regression techniques in low-dimensional survival data settings

  • Published:
Statistics and Computing Aims and scope Submit manuscript

Abstract

In high-dimensional data settings, sparse model fits are desired, which can be obtained through shrinkage or boosting techniques. We investigate classical shrinkage techniques such as the lasso, which is theoretically known to be biased, new techniques that address this problem, such as elastic net and SCAD, and boosting technique CoxBoost and extensions of it, which allow to incorporate additional structure. To examine, whether these methods, that are designed for or frequently used in high-dimensional survival data analysis, provide sensible results in low-dimensional data settings as well, we consider the well known GBSG breast cancer data. In detail, we study the bias, stability and sparseness of these model fitting techniques via comparison to the maximum likelihood estimate and resampling, and their prediction performance via prediction error curve estimates.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  • Binder, H.: CoxBoost: Cox models by likelihood based boosting for a single survival endpoint or competing risks. R package version 1.1 (2009)

  • Binder, H., Schumacher, M.: Adapting prediction error estimates for biased complexity selection in high-dimensional bootstrap samples. Stat. Appl. Genet. Mol. Biol. 7(1), 12 (2008a)

    MathSciNet  Google Scholar 

  • Binder, H., Schumacher, M.: Allowing for mandatory covariates in boosting estimation of sparse high-dimensional survival models. BMC Bioinformatics 9, 14 (2008b)

    Article  Google Scholar 

  • Binder, H., Schumacher, M.: Adapting the degree of sparseness for estimation of high-dimensional risk prediction models. Manuscript (2009a)

  • Binder, H., Schumacher, M.: Incorporating pathway information into boosting estimation of high-dimensional risk prediction models. BMC Bioinformatics 10, 18 (2009b)

    Article  Google Scholar 

  • Binder, H., Tutz, G.: A comparison of methods for the fitting of generalized additive models. Stat. Comput. 18(1), 87–99 (2008)

    Article  MathSciNet  Google Scholar 

  • Bondell, H.D., Reich, B.J.: Simultaneous regression shrinkage, variable selection, and supervised clustering of predictors with OSCAR. Biometrics 64(1), 115–123 (2008)

    Article  MATH  MathSciNet  Google Scholar 

  • Bøvelstad, H.M., Nygård, S., Størvold, H.L., Aldrin, M., Borgan, Ø., Frigessi, A., Lingjærde, O.C.: Predicting survival from microarray data—a comparative study. Bioinformatics 23(16), 2080–2087 (2007)

    Article  Google Scholar 

  • Bühlmann, P., Yu, B.: Boosting with the L2 loss: regression and classification. J. Am. Stat. Assoc. 98(462), 324–339 (2003)

    Article  MATH  Google Scholar 

  • Candes, E., Tao, T.: The Dantzig selector: statistical estimation when p is much larger than n. Ann. Stat. 35(6), 2313–2351 (2007)

    Article  MATH  MathSciNet  Google Scholar 

  • Chen, C.-H., George, S.: The bootstrap and identification of prognostic factors via Cox’s proportional hazards regression model. Stat. Med. 4(1), 39–46 (1985)

    Article  Google Scholar 

  • Copas, J.B.: Regression, prediction and shrinkage. J. R. Stat. Soc., Ser. B (Methodol.) 45(3), 311–354 (1983)

    MATH  MathSciNet  Google Scholar 

  • Denison, D.: Boosting with Bayesian stumps. Stat. Comput. 11(2), 171–178 (2001)

    Article  MathSciNet  Google Scholar 

  • Efron, B., Hastie, T., Johnstone, I., Tibshirani, R.: Least angle regression. Ann. Stat. 32(2), 407–499 (2004)

    Article  MATH  MathSciNet  Google Scholar 

  • Fan, J., Li, R.: Variable selection via nonconcave penalized likelihood and its oracle properties. J. Am. Stat. Assoc. 96(456), 1348–1360 (2001)

    Article  MATH  MathSciNet  Google Scholar 

  • Fan, J., Li, R.: Variable selection for Cox’s proportional hazards model and frailty model. Ann. Stat. 30(1), 74–99 (2002)

    Article  MATH  MathSciNet  Google Scholar 

  • Frank, I.E., Friedman, J.H.: A statistical view of some chemometrics regression tools. Technometrics 35(2), 109–135 (1993)

    Article  MATH  Google Scholar 

  • Gelman, A.: Scaling regression inputs by dividing by two standard deviations. Stat. Med. 27(15), 2865–2873 (2008)

    Article  MathSciNet  Google Scholar 

  • Gerds, T.A., Schumacher, M.: Efron-type measures of prediction error for survival analysis. Biometrics 63(4), 1283–1287 (2007)

    MATH  MathSciNet  Google Scholar 

  • Goeman, J.: Penalized: L1 (lasso) and L2 (ridge) penalized estimation in GLMs and in the Cox model. R package version 0.9-21 (2008)

  • Hoerl, A.E., Kennard, R.W.: Ridge regression: biased estimation for nonorthogonal problems. Technometrics 12(1), 55–67 (1970)

    Article  MATH  MathSciNet  Google Scholar 

  • Johnson, B.A., Peng, L.: Rank-based variable selection. J. Nonparametr. Stat. 20(3), 241–252 (2008)

    Article  MATH  MathSciNet  Google Scholar 

  • Leeb, H., Pötscher, B.M.: Can one estimate the conditional distribution of post-model-selection estimators? Ann. Stat. 34(5), 2554–2591 (2006)

    Article  MATH  Google Scholar 

  • Park, M.Y., Hastie, T.: L1 regularization path algorithm for generalized linear models. J. R. Stat. Soc., Ser. B (Stat. Methodol.) 69(4), 659–677 (2007)

    Article  MathSciNet  Google Scholar 

  • Porzelius, C., Binder, H., Schumacher, M.: Parallelized prediction error estimation for evaluation of high-dimensional models. Bioinformatics 25(6), 827–829 (2009). doi:10.1093/bioinformatics/btp062

    Article  Google Scholar 

  • Qiu, X., Xiao, Y., Gordon, A., Yakovlev, A.: Assessing stability of gene selection in microarray data analysis. BMC Bioinformatics 7(1), 50 (2006)

    Article  Google Scholar 

  • R Development Core Team: R: A language and environment for statistical computing. Vienna, Austria. ISBN 3-900051-07-0 (2008)

  • Rosenwald, A., Wright, G., Chan, W.C., Connors, J.M., Campo, E., Fisher, R.I., Gascoyna, R.D., Muller-Hermelink, H.K., Smeland, E.B., Staudt, L.M.: The use of molecular profiling to predict survival after chemotherapy for diffuse large-B-cell lymphoma. New Engl. J. Med. 346(25), 1937–1946 (2002)

    Article  Google Scholar 

  • Sauerbrei, W.: The use of resampling methods to simplify regression models in medical statistics. J. R. Stat. Soc., Ser. C: Appl. Stat. 48(3), 313–329 (1999)

    Article  MATH  Google Scholar 

  • Sauerbrei, W., Royston, P.: Building multivariable prognostic and diagnostic models: Transformation of the predictors by using fractional polynomials. J. R. Stat. Soc., Ser. A: Stat. Soc. 162(1), 71–94 (1999)

    Article  Google Scholar 

  • Sauerbrei, W., Schumacher, M.: A bootstrap resampling procedure for model building: application to the Cox regression model. Stat. Med. 11(16), 2093–2109 (1992)

    Article  Google Scholar 

  • Schmid, M., Hothorn, T.: Flexible boosting of accelerated failure time models. BMC Bioinformatics 9(1), 269 (2008)

    Article  Google Scholar 

  • Schumacher, M., Bastert, G., Bojar, H., Hübner, K., Olschewski, M., Sauerbrei, W., Schmoor, C., Beyerle, C., Newmann, R.L.A., Rauschecker, H.F.: Randomized 2×2 trial evaluating hormonal treatment and the duration of chemotherapy in node-positive breast cancer patients. J. Clin. Oncol. 12(10), 2086–2093 (1994)

    Google Scholar 

  • Schumacher, M., Holländer, N., Schwarzer, G., Sauerbrei, W.: Prognostic factor studies. In: Crowley, J., Pauler Ankerst, D. (eds.) Handbook of Statistics in Clinical Oncology, pp. 289–333. Chapman & Hall/CRC, London (2006)

    Google Scholar 

  • Smola, A., Scholkopf, B.: A tutorial on support vector regression. Stat. Comput. 14(3), 199–222 (2004)

    Article  MathSciNet  Google Scholar 

  • Tibshirani, R.: Regression shrinkage and selection via the lasso. J. R. Stat. Soc., Ser. B (Methodol.) 58(1), 267–288 (1996)

    MATH  MathSciNet  Google Scholar 

  • Tibshirani, R.: The lasso method for variable selection in the Cox model. Stat. Med. 16(4), 385–395 (1997)

    Article  Google Scholar 

  • Tutz, G., Binder, H.: Boosting ridge regression. Comput. Stat. Data Anal. 51(12), 6044–6059 (2007)

    Article  MATH  MathSciNet  Google Scholar 

  • Tutz, G., Ulbricht, J.: Penalized regression with correlation-based penalty. Stat. Comput. 19(3), 239–253 (2008)

    Article  Google Scholar 

  • Vach, K., Sauerbrei, W., Schumacher, M.: Variable selection and shrinkage: comparison of some approaches. Stat. Neerl. 55(1), 53–75 (2001)

    Article  MATH  MathSciNet  Google Scholar 

  • van Wieringen, W.N., Kun, D., Hampel, R., Boulesteix, A.-L.: Survival prediction using gene expression data: a review and comparison. Comput. Stat. Data Anal. 53(5), 1590–1603 (2009)

    Article  Google Scholar 

  • Verweij, P.J.M., van Houwelingen, H.C.: Cross-validation in survival analysis. Stat. Med. 12(24), 2305–2314 (1993)

    Article  Google Scholar 

  • Verweij, P.J.M., van Houwelingen, H.C.: Penalized likelihood in Cox regression. Stat. Med. 13(23–24), 2427–2436 (1994)

    Article  Google Scholar 

  • Zhang, H.H., Lu, W.: Adaptive lasso for Cox’s proportional hazards model. Biometrika 94(3), 691–703 (2007)

    Article  MATH  MathSciNet  Google Scholar 

  • Zou, H., Hastie, T.: Regularization and variable selection via the elastic net. J. R. Stat. Soc., Ser. B 67(2), 301–320 (2005)

    Article  MATH  MathSciNet  Google Scholar 

  • Zucknick, M., Richardson, S., Stronach, E.A.: Comparing the characteristics of gene expression profiles derived by univariate and multivariate classification methods. Stat. Appl. Genet. Mol. Biol. 7(1), 7 (2008)

    MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Christine Porzelius.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Porzelius, C., Schumacher, M. & Binder, H. Sparse regression techniques in low-dimensional survival data settings. Stat Comput 20, 151–163 (2010). https://doi.org/10.1007/s11222-009-9155-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11222-009-9155-6

Keywords

Navigation