Skip to main content

Complexity Selection with Cross-validation for Lasso and Sparse Partial Least Squares Using High-Dimensional Data

  • Conference paper
  • First Online:

Abstract

Sparse regression and classification methods are commonly applied to high-dimensional data to simultaneously build a prediction rule and select relevant predictors. The well-known lasso regression and the more recent sparse partial least squares (SPLS) approach are important examples. In such procedures, the number of identified relevant predictors typically depends on a complexity parameter that has to be adequately tuned. Most often, parameter tuning is performed via cross validation (CV). In the context of lasso penalized logistic regression and SPLS classification, this paper addresses three important questions related to complexity selection: (1) Does the number of folds in CV affect the results of the tuning procedure? (2) Should CV be repeated several times to yield less variable tuning results?, and (3) Is complexity selection robust against resampling?

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  • Ancona, N., Maglietta, R., Piepoli, A., et al. (2006). On the statistical assessment of classifiers using DNA microarray data. BMC Bioinformatics, 7, 387.

    Article  Google Scholar 

  • Bernau, C., & Boulesteix, A. L. (2010). Variable selection and parameter tuning in high-dimensional prediction. In electronic COMPSTAT Proceedings, Paris.

    Google Scholar 

  • Braga-Neto, U., Dougherty, E. R. (2004). Is cross-validation valid for small-sample microarray classification? Bioinformatics, 20, 374–380.

    Article  Google Scholar 

  • Chun, D., & Keles, S. (2010). Sparse partial least squares regression for simultaneous dimension reduction and variable selection. Journal of the Royal Statistical Society, 72, 3–25.

    Article  MathSciNet  Google Scholar 

  • Chung, D., & Keles, S. (2010). Sparse Partial Least Squares Classification for High Dimensional Data. Statistical Applications in Genetics and Molecular Biology, 9, 17.

    Google Scholar 

  • Dougherty, E. R., Zollanvari, A., Braga-Neto, U. M. (2011). The illusion of distribution-free small-sample classification in genomics. Current Genomics, 12, 333–341.

    Article  Google Scholar 

  • Fan, J., & Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American Statistical Association, 96, 1348–1360.

    Article  MathSciNet  MATH  Google Scholar 

  • Hanczar, B., Hua, J., Dougherty, E. R. (2007). Decorrelation of the true and estimated classifier errors in high-dimensional settings. EURASIP Journal on Bioinformatics and Systems Biology, 2007, 38473.

    Article  Google Scholar 

  • Scherzer, C. R., Eklund, A. C., Morse, L. J. et al. (2007). Molecular markers of early Parkinson’s disease based on gene expression in blood. Proceedings of the National Academy of Science, 104, 955–960.

    Article  Google Scholar 

  • Singh, D., Febbo, P. G., Ross, K. (2002). Gene expression correlates of clinical prostate cancer behavior. Cancer Cell, 1, 203–209.

    Article  Google Scholar 

  • Tang, B. M., McLean, A. S., Dawes, I. W. et al. (2009). Gene-expression profiling of peripheral blood mononuclear cells in sepsis. Critical Care Medicine, 37, 882–888.

    Article  Google Scholar 

  • Tibshirani, R. (1996). Regression shrinkage and selection via the Lasso. Journal of the Royal Statistical Society, Series B, 58, 267–288.

    MathSciNet  MATH  Google Scholar 

  • Wang, Y., Klijn, J. G., Zhang, Y. et al. (2005). Gene-expression profiles to predict distant metastasis of lymph-node-negative primary breast cancer. Lancet, 365, 671–679.

    Google Scholar 

  • Zou, H. (2005). Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society B, 67, 301–320.

    Article  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Anne-Laure Boulesteix .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer International Publishing Switzerland

About this paper

Cite this paper

Boulesteix, AL., Richter, A., Bernau, C. (2013). Complexity Selection with Cross-validation for Lasso and Sparse Partial Least Squares Using High-Dimensional Data. In: Lausen, B., Van den Poel, D., Ultsch, A. (eds) Algorithms from and for Nature and Life. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Cham. https://doi.org/10.1007/978-3-319-00035-0_26

Download citation

Publish with us

Policies and ethics