Skip to main content
Log in

The correctness problem: evaluating the ordering of binary features in rankings

  • Regular Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

In machine learning, feature ranking (FR) algorithms are used to rank features by relevance to the class variable. FR algorithms are mostly investigated for the feature selection problem and less studied for the problem of ranking. This paper focuses on the latter. A question asked about the problem of ranking given in the terminology of FR is: as different FR criteria estimate the relationship between a feature and the class variable differently on a given data, can we determine which criterion better captures the “true” feature-to-class relationship and thus generates the most “correct” order of individual features? This is termed as the “correctness” problem. It requires a reference ordering against which the ranks assigned to features by a FR algorithm are directly compared. The reference ranking is generally unknown for real-life data. In this paper, we show through theoretical and empirical analysis that for two-class classification tasks represented with binary data, the ordering of binary features based on their individual predictive powers can be used as a benchmark. Thus, allowing us to test how correct is the ordering of a FR algorithm. Based on these ideas, an evaluation method termed as FR evaluation strategy (FRES) is proposed. Rankings of three different FR criteria (relief, mutual information, and the diff-criterion) are investigated on five artificially generated and four real-life binary data sets. The results indicate that FRES works equally good for synthetic and real-life data and the diff-criterion generates the most correct orderings for binary data.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Notes

  1. From here onwards, the discussion will be from machine learning perspective unless stated otherwise.

  2. Also known as variables or attributes.

  3. Also known as examples, observations or samples.

  4. We focus only on FR algorithms in this paper though one can find work related to this issue in other domains such as [7, 36], which employ non-FR algorithms for ranking.

References

  1. Agarwal S, Dugar D, Sengupta S (2010) Ranking chemical structures for drug discovery: a new machine learning approach. J Chem Inf Model 50(5):716–731

    Article  Google Scholar 

  2. Agarwal S, Sengupta S (2009) Ranking genes by relevance to a disease. In: Proceedings of the 8th annual international conference on computational systems bioinformatics

  3. AIMS (2010) The mathematics of ranking. http://www.aimath.org/ARCC/workshops/mathofranking.html

  4. Arauzo-Azofra A, Aznarte J, Benitez J (2011) Empirical study of feature selection methods based on individual feature evaluation for classification problems. Expert Syst Appl 38(7):8170–8177

    Article  Google Scholar 

  5. Bhamidipati N, Pal S (2009) Comparing scores intended for ranking. IEEE Trans Knowl Data Eng 21(1):21–34

    Article  Google Scholar 

  6. Bishop C (2006) Pattern recognition and machine learning. Springer, Berlin

    MATH  Google Scholar 

  7. Boldi P (2005) TotalRank: ranking without damping. In: Special interest tracks and posters of the 14th international conference on world wide web, WWW ’05, pp 898–899

  8. Bolon-Canedo V, Sanchez-Marono N, Alonso-Betanzos A (2013) A review of feature selection methods on synthetic data. Knowl Inf Syst 34(3):483–519

    Article  Google Scholar 

  9. Clemencon S, Lugosi G, Vayatis N (2008) Ranking and empirical minimization of U-statistics. Ann Stat 36:844–874

    Article  MATH  MathSciNet  Google Scholar 

  10. Cohen W, Schapire R, Singer Y (1999) Learning to order things. J Artif Intell Res 10:240–270

    MathSciNet  Google Scholar 

  11. Conover W (1999) Practical nonparametric statistics, 3rd edn. Wiley, New York

    Google Scholar 

  12. Cover T, Thomas J (1991) Elements of information theory. Wiley, New York

    Book  MATH  Google Scholar 

  13. Duch W (2006) Feature extraction: foundations and applications. In: Guyon I, Nikravesh M, Gunn S, Zadeh L (eds) Foundations and applications. Springer, Berlin, pp 89–117

    Google Scholar 

  14. Duda R, Hart P, Stork D (2000) Pattern classification, 2nd edn. Wiley, New York

    Google Scholar 

  15. Dwork C, Kumar R, Naor M et al (2001) Rank aggregation methods for the web. In: Proceedings of the tenth international conference on World wide web (WWW10), pp 613–622

  16. Fagin R, Kumar R, Sivakumar D (2003). Comparing top \(k\) lists. In: ACM SIAM symposium on discrete algorithms, pp 28–36

  17. Frank A, Asuncion A (2010) UCI machine learning repository. http://archive.ics.uci.edu/ml

  18. Freund Y, Iyer R, Schapire R et al (2003) An efficient boosting algorithm for combining preferences. J Mach Learn Res 4:933–969

    Google Scholar 

  19. Gleich D, Langville A (2010) Suggested problems for discussion. http://www.stat.uchicago.edu/lekheng/meetings/mathofranking/problems/david-amy.txt

  20. Golub T, Slonim D, Tamayo P et al (1999) Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286:531–537

    Article  Google Scholar 

  21. Gustafson A, Snitkin E, Parker S et al (2006) Towards the identification of essential genes using targeted genome sequencing and comparative analysis. BMC Bioinform 7. http://www.biomedcentral.com/1471-2164/7/265/

  22. Guyon I, Aliferis C, Cooper G et al (2008) Design and analysis of the causation and prediction challenge. In: JMLR workshop and conference proceedings: causation and prediction challenge (WCCI 2008), vol. 3, pp 1–33

  23. Guyon I, Cawley G, Dror G et al (eds) (2011) Hands-on pattern recognition: challenges in machine learning, vol. 1. Microtome Publishing, Brookline. http://www.mtome.com/Publications/CiML/ciml.html

  24. Guyon I, Elisseeff A (2003) An introduction to variable and feature selection. J Mach Learn Res 3: 1157–1182

    Google Scholar 

  25. Guyon I, Saffari A, Dror G et al (2007) Agnostic learning vs. prior knowledge challenge. In: Proceedings of international joint conference on neural networks (IJCNN), pp 829–834

  26. Guyon I, Weston J, Barnhill S et al (2002) Gene selection for cancer classification using support vector machines. Mach Learn 46:389–422

    Article  MATH  Google Scholar 

  27. Hall M, Holmes G (2003) Benchmarking attribute selection techniques for discrete class data mining. IEEE Trans Knowl Data Eng 15(6):1437–1447

    Article  Google Scholar 

  28. Javed K (2012) Development of feature selection algorithms for high-dimensional binary data. Ph.D. thesis, Department of Electrical Engineering, University of Engineering and Technology, Lahore, Pakistan

  29. Javed K, Babri H, Saeed M (2012a) Evaluating rankings of mutual information and diff-criterion for high-dimensional binary data. In: Proceedings of the first Taibah University International on computing and information technology, pp 18–23

  30. Javed K, Babri H, Saeed M (2012b) Feature selection based on class-dependent densities for high-dimensional binary data. IEEE Trans Knowl Data Eng 24(3):465–477

    Article  Google Scholar 

  31. John G, Kohavi R, Pfleger K (1994) Irrelevant feature and the subset selection problem. In: Proceedings of the 11th international conference on machine learning (ICML), pp 121–129

  32. Jr EH, Ebecken N (2007) Towards efficient variables ordering for Bayesian networks classifier. Data Knowl Eng 63(2):258–269

    Google Scholar 

  33. Kalousis A, Prados J, Hilario M (2007) Stability of feature selection algorithms: a study on high-dimensional spaces. Knowl Inf Syst 12(1):95–116

    Article  Google Scholar 

  34. Kira K, Rendell L (1992). A practical approach to feature selection. In: Proceedings of the 9th international conference on machine learning (ICML), pp 249–256

  35. Langville A, Meyer C (2004) Deeper inside pagerank. Internet Math 1(3):335–380

    MATH  MathSciNet  Google Scholar 

  36. Lapata M (2006) Automatic evaluation of information ordering: Kendall’s Tau. Comput Linguist 32(4):471–484

    Article  MATH  Google Scholar 

  37. Lazar C, Taminau J, Meganck S et al (2012) A survey on filter techniques for feature selection in gene expression microarray analysis. IEEE/ACM Trans Comput Biol Bioinf 9(4):1106–1119

    Article  Google Scholar 

  38. Li H (2011) A short introduction to learning to Rank. IEICE Trans 94-D(10):1854–1862

    Google Scholar 

  39. Minka T (2003) A comparison of numerical optimizers for logistic regression. http://research.microsoft.com/minka/papers/

  40. Rosa KD, Metsis V, Athitsos V (2012) Boosted ranking models: a unifying framework for ranking predictions. Knowl Inf Syst 30(3):543–568

    Article  Google Scholar 

  41. Ruiz R, Aguilar-Ruiz J, Riquelme J et al (2005) Analysis of feature rankings for classification. In: Proceedings of the 6th international symposium on, intelligent data analysis, pp 362–372

  42. Saeys Y, Inza I, Larranage P (2007) A review of feature selection techniques in bioinformatics. Bioinformatics 23(19):2507–2517

    Article  Google Scholar 

  43. Saffari A, Guyon I (2006) Quick start guide for challenge learning object package (CLOP), Technical report, Graz University of Technology and Clopinet. http://clopinet.com/clop/

  44. Slavkov I, Zenko B, Dzeroski S (2010) Evaluation method for feature rankings and their aggregations for biomarker discovery. In: JMLR workshop and conference proceedings: machine learning in systems biology, vol. 8. pp 122–135

  45. Su Y, Murali T, Pavlovic V et al (2003) RankGene: identification of diagnostic genes based on expression data. Bioinformatics 19(12):1578–1579

    Article  Google Scholar 

  46. Wang B, Tang J, Fan W et al (2013) Query-dependent cross-domain ranking in heterogeneous network. Knowl Inf Syst 34(1):109–145

    Article  Google Scholar 

  47. Xia F, Liu T-Y, Wang J et al (2008) Listwise approach to learning to rank: theory and algorithm. In: Proceedings of the 25th international conference on machine learning (ICML), pp 1192–1199

  48. Xiao YHYY, Segal MR (2005) Identifying differentially expressed genes from microarray experiments via statistic synthesis. Bioinformatics 21(7):1084–1093

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kashif Javed.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Javed, K., Saeed, M. & Babri, H.A. The correctness problem: evaluating the ordering of binary features in rankings. Knowl Inf Syst 39, 543–563 (2014). https://doi.org/10.1007/s10115-013-0631-0

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-013-0631-0

Keywords

Navigation