Complexity Selection with Cross-validation for Lasso and Sparse Partial Least Squares Using High-Dimensional Data

Boulesteix, Anne-Laure; Richter, Adrian; Bernau, Christoph

doi:10.1007/978-3-319-00035-0_26

Complexity Selection with Cross-validation for Lasso and Sparse Partial Least Squares Using High-Dimensional Data

Anne-Laure Boulesteix²¹,
Adrian Richter²¹ &
Christoph Bernau²¹

Conference paper
First Online: 01 January 2013

2912 Accesses
7 Citations

Part of the book series: Studies in Classification, Data Analysis, and Knowledge Organization ((STUDIES CLASS))

Abstract

Sparse regression and classification methods are commonly applied to high-dimensional data to simultaneously build a prediction rule and select relevant predictors. The well-known lasso regression and the more recent sparse partial least squares (SPLS) approach are important examples. In such procedures, the number of identified relevant predictors typically depends on a complexity parameter that has to be adequately tuned. Most often, parameter tuning is performed via cross validation (CV). In the context of lasso penalized logistic regression and SPLS classification, this paper addresses three important questions related to complexity selection: (1) Does the number of folds in CV affect the results of the tuning procedure? (2) Should CV be repeated several times to yield less variable tuning results?, and (3) Is complexity selection robust against resampling?

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Ancona, N., Maglietta, R., Piepoli, A., et al. (2006). On the statistical assessment of classifiers using DNA microarray data. BMC Bioinformatics, 7, 387.
Article Google Scholar
Bernau, C., & Boulesteix, A. L. (2010). Variable selection and parameter tuning in high-dimensional prediction. In electronic COMPSTAT Proceedings, Paris.
Google Scholar
Braga-Neto, U., Dougherty, E. R. (2004). Is cross-validation valid for small-sample microarray classification? Bioinformatics, 20, 374–380.
Article Google Scholar
Chun, D., & Keles, S. (2010). Sparse partial least squares regression for simultaneous dimension reduction and variable selection. Journal of the Royal Statistical Society, 72, 3–25.
Article MathSciNet Google Scholar
Chung, D., & Keles, S. (2010). Sparse Partial Least Squares Classification for High Dimensional Data. Statistical Applications in Genetics and Molecular Biology, 9, 17.
Google Scholar
Dougherty, E. R., Zollanvari, A., Braga-Neto, U. M. (2011). The illusion of distribution-free small-sample classification in genomics. Current Genomics, 12, 333–341.
Article Google Scholar
Fan, J., & Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American Statistical Association, 96, 1348–1360.
Article MathSciNet MATH Google Scholar
Hanczar, B., Hua, J., Dougherty, E. R. (2007). Decorrelation of the true and estimated classifier errors in high-dimensional settings. EURASIP Journal on Bioinformatics and Systems Biology, 2007, 38473.
Article Google Scholar
Scherzer, C. R., Eklund, A. C., Morse, L. J. et al. (2007). Molecular markers of early Parkinson’s disease based on gene expression in blood. Proceedings of the National Academy of Science, 104, 955–960.
Article Google Scholar
Singh, D., Febbo, P. G., Ross, K. (2002). Gene expression correlates of clinical prostate cancer behavior. Cancer Cell, 1, 203–209.
Article Google Scholar
Tang, B. M., McLean, A. S., Dawes, I. W. et al. (2009). Gene-expression profiling of peripheral blood mononuclear cells in sepsis. Critical Care Medicine, 37, 882–888.
Article Google Scholar
Tibshirani, R. (1996). Regression shrinkage and selection via the Lasso. Journal of the Royal Statistical Society, Series B, 58, 267–288.
MathSciNet MATH Google Scholar
Wang, Y., Klijn, J. G., Zhang, Y. et al. (2005). Gene-expression profiles to predict distant metastasis of lymph-node-negative primary breast cancer. Lancet, 365, 671–679.
Google Scholar
Zou, H. (2005). Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society B, 67, 301–320.
Article MATH Google Scholar

Download references

Author information

Authors and Affiliations

Institut für Medizinische Informationsverarbeitung, Biometrie und Epidemiologie, Universität München (LMU), Munich, Germany
Anne-Laure Boulesteix, Adrian Richter & Christoph Bernau

Authors

Anne-Laure Boulesteix
View author publications
You can also search for this author in PubMed Google Scholar
Adrian Richter
View author publications
You can also search for this author in PubMed Google Scholar
Christoph Bernau
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Anne-Laure Boulesteix .

Editor information

Editors and Affiliations

University of Essex Department of Mathematical Sciences, Colchester, United Kingdom
Berthold Lausen
Ghent University Department of Marketing, Ghent, Belgium
Dirk Van den Poel
University of Marburg Databionics, FB 12, Marburg, Germany
Alfred Ultsch

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Boulesteix, AL., Richter, A., Bernau, C. (2013). Complexity Selection with Cross-validation for Lasso and Sparse Partial Least Squares Using High-Dimensional Data. In: Lausen, B., Van den Poel, D., Ultsch, A. (eds) Algorithms from and for Nature and Life. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Cham. https://doi.org/10.1007/978-3-319-00035-0_26

Download citation

DOI: https://doi.org/10.1007/978-3-319-00035-0_26
Published: 16 July 2013
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-00034-3
Online ISBN: 978-3-319-00035-0
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics