Skip to main content

Assessment Methods

  • Chapter

Part of the book series: Studies in Fuzziness and Soft Computing ((STUDFUZZ,volume 207))

Abstract

This chapter aims at providing the reader with the tools required for a statistically significant assessment of feature relevance and of the outcome of feature selection. The methods presented in this chapter can be integrated in feature selection wrappers and can serve to select the number of features for filters or feature ranking methods. They can also serve for hyper-parameter selection or model selection. Finally, they can be helpful for assessing the confidence on predictions made by learning machines on fresh data. The concept of model complexity is ubiquitous in this chapter. Before they start reading the chapter, readers with little or old knowledge of basic statistics should first delve into Appendix A; for others, the latter may serve as a quick reference guide for useful definitions and properties. The first section of the present chapter is devoted to the basic statistical tools for feature selection; it puts the task of feature selection into the appropriate statistical perspective, and describes important tools such as hypothesis tests - which are of general use - and random probes, which are more specifically dedicated to feature selection. The use of hypothesis tests is exemplified, and caveats about the reliability of the results of multiple tests are given, leading to the Bonferroni correction and to the definition of the false discovery rate. The use of random probes is also exemplified, in conjunction with forward selection. The second section of the chapter is devoted to validation and cross-validation; those are general tools for assessing the ability of models to generalize; in the present chapter, we show how they can be used specifically in the context of feature selection; attention is drawn to the limitations of those methods.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   259.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD   329.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • D.M. Allen. The relationship between variable selection and prediction. Technometrics, 16:125–127, 1974.

    Article  MATH  MathSciNet  Google Scholar 

  • C. Ambroise and G. J. McLachlan. Selection bias in gene extraction on the basis of microarray gene-expression data. PNAS, (99):6562–6566, 2002.

    Article  MATH  Google Scholar 

  • U. Anders and O. Korn. Model selection in neural networks. Neural Networks, 12: 309–323, 1999.

    Article  Google Scholar 

  • J. Bengio and Y. Grandvalet. No unbiased estimator of the variance of K-fold cross-validation. Journal of Machine Learning Research, 5:1089–1105, 2003.

    MathSciNet  Google Scholar 

  • Y. Benjamini and Y. Hochberg. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. Roy. Stat. Soc. B, 85:289–300, 1995.

    MathSciNet  Google Scholar 

  • J. Bi, K.P. Bennett, M. Embrechts, C.M. Breneman, and M. Song. Dimensionality reduction via sparse support vector machines. Journal of Machine Learning Research, 3:1229–1243, 2003.

    Article  MATH  Google Scholar 

  • A. Björck. Solving linear least squares problems by gram-schmidt orthogonalization. Nordisk Tidshrift for Informationsbehadlung, 7:1–21, 1967.

    MATH  Google Scholar 

  • L. Breiman. Bagging predictors. Machine Learning, 24:123–140, 1996.

    MATH  MathSciNet  Google Scholar 

  • S. Chen, S.A. Billings, and W. Luo. Orthogonal least squares methods and their application to non-linear system identification. International Journal of Control, 50:1873–1896, 1989.

    Article  MATH  MathSciNet  Google Scholar 

  • C. Cortes and M. Mohri. Confidence intervals for area under ROC curve. In Neural information Processing Systems 2004, 2004.

    Google Scholar 

  • D. R. Cox and D. V. Hinkley. Theoretical Statistics. Chapman and Hall/CRC, 1974.

    Google Scholar 

  • B. Efron and R.J. Tibshirani. Introduction to the bootstrap. Chapman and Hall, New York, 1993.

    MATH  Google Scholar 

  • C.R. Genovese and L. Wasserman. Operating characteristics and extensions of the false discovery rate procedure. J. Roy. Stat. Soc. B, 64:499–518, 2002.

    Article  MATH  MathSciNet  Google Scholar 

  • G.C. Goodwin and R.L. Payne. Dynamic system identification: experiment design and data analysis. Academic Press, 1977.

    Google Scholar 

  • I. Guyon, J. Makhoul, R. Schwartz, and V. Vapnik. What size test set gives good error rate estimates? IEEE Transactions on Pattern Analysis and Machine Intelligence, 20:52–64, 1998.

    Article  Google Scholar 

  • K. Jong, E. Marchiori, and M. Sebag. Ensemble learning with evolutionary computation: Application to feature ranking. In 8th International Conference on Parallel Problem Solving from Nature, pages 1133–1142. Springer, 2004.

    Google Scholar 

  • P. Langley. Selection of relevant features in machine learning, 1994.

    Google Scholar 

  • I.J. Leontaritis and S.A. Billings. Model selection and validation methods for nonlinear systems. International Journal of Control, 45:311–341, 1987.

    Article  MATH  Google Scholar 

  • G. Monari and G. Dreyfus. Local overfitting control via leverages. Neural Computation, 14:1481–1506, 2002.

    Article  MATH  Google Scholar 

  • A. Y. Ng. On feature selection: learning with exponentially many irrelevant features as training examples. In 15th International Conference on Machine Learning, pages 404–412. Morgan Kaufmann, San Francisco, CA, 1998.

    Google Scholar 

  • M. Opper and O. Winther. Advances in large margin classifiers, chapter Gaussian processes and Support Vector Machines: mean field and leave-one-out, pages 311–326. MIT Press, 2000.

    Google Scholar 

  • L. Oukhellou, P. Aknin, H. Stoppiglia, and G. Dreyfus. A new decision criterion for feature selection: Application to the classification of non destructive testing signatures. In European SIgnal Processing COnference (EUSIPCO’98), Rhodes, 1998.

    Google Scholar 

  • Y. Oussar, G. Monari, and G. Dreyfus. Reply to the comments on ”local overfitting control via leverages” in ”jacobian conditioning analysis for model validation”. Neural Computation, 16:419–443, 2004.

    Article  MATH  Google Scholar 

  • I. Rivals and L. Personnaz. MLPs (mono-layer polynomials and multi-layer perceptrons) for non-linear modeling. JMLR, 2003.

    Google Scholar 

  • G.A. Seber. Linear regression analysis. Wiley, New York, 1977.

    MATH  Google Scholar 

  • G.A. Seber and C.J. Wild. Nonlinear regression. John Wiley and Sons, New York, 1989.

    MATH  Google Scholar 

  • T. Söderström. On model structure testing in system identification. International Journal of Control, 26:1–18, 1977.

    Article  MATH  MathSciNet  Google Scholar 

  • M. Stone. Cross-validatory choice and assessment of statistical predictions. J. Roy. Stat. Soc. B, 36:111–147, 1974.

    MATH  Google Scholar 

  • H. Stoppiglia. Méthodes Statistiques de Sélection de Modèles Neuronaux; Applications Financières et Bancaires. PhD thesis, l’Université Pierre et Marie Curie, Paris, 1997. (available electronically at http://www.neurones.espci.fr).

    Google Scholar 

  • H. Stoppiglia, G. Dreyfus, R. Dubois, and Y. Oussar. Ranking a random feature for variable and feature selection. Journal of Machine Learning Research, pages 1399–1414, 2003.

    Google Scholar 

  • J.D. Storey and R. Tibshirani. Statistical significance for genomewide studies. Proc. Nat. Acad. Sci., 100:9440–9445, 2003.

    Article  MATH  MathSciNet  Google Scholar 

  • V. Vapnik. Statistical Learning Theory. John Wiley & Sons, N.Y., 1998.

    MATH  Google Scholar 

  • V.N. Vapnik. Estimation of dependencies based on empirical data. Springer, New-York, 1982.

    Google Scholar 

  • D. Wolpert and W.G. Macready. An efficient method to estimate bagging’s generalization error. Machine Learning, 35(1):41–55, 1999.

    Article  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Dreyfus, G., Guyon, I. (2006). Assessment Methods. In: Guyon, I., Nikravesh, M., Gunn, S., Zadeh, L.A. (eds) Feature Extraction. Studies in Fuzziness and Soft Computing, vol 207. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-35488-8_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-35488-8_3

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-35487-1

  • Online ISBN: 978-3-540-35488-8

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics