A filter attribute selection method based on local reliable information

Martín, Ricardo; Aler, Ricardo; Galván, Inés M.

doi:10.1007/s10489-017-0959-3

A filter attribute selection method based on local reliable information

Published: 17 June 2017

Volume 48, pages 35–45, (2018)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

Ricardo Martín¹,
Ricardo Aler¹ &
Inés M. Galván¹

785 Accesses
3 Citations
Explore all metrics

Abstract

In this article, a filter feature weighting technique for attribute selection in classification problems is proposed (LIA). It has two main characteristics. First, unlike feature weighting methods, it is able to consider attribute interactions in the weighting process, rather than only evaluating single features. Attribute subsets are evaluated by projecting instances into a grid defined by attributes in the subset. Then, the joint relevance of the subset is computed by measuring the information present in the cells of the grid. The final weight for each attribute is computed by taking into account its performance in each of the grids it participates. Second, many real problems contain low signal-to-noise ratios, due to instance of high noise levels, class overlap, class imbalance, or small training samples. LIA computes reliable local information for each of the cells by estimating the number of target class instances not due to chance, given a confidence value. In order to study its properties, LIA has been evaluated with a collection of 18 real datasets and compared to two feature weighting methods (Chi-Squared and ReliefF) and a subset feature selection algorithm (CFS). Results show that the method is significantly better in many cases, and never significantly worse. LIA has also been tested with different grid dimensions (1, 2, and 3). The method works best when evaluating attribute subsets larger than 1, hence showing the usefulness of considering attribute interactions.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Addressing Low Dimensionality Feature Subset Selection: ReliefF(-k) or Extended Correlation-Based Feature Selection(eCFS)?

A Novel LtR and RtL Framework for Subset Feature Selection (Reduction) for Improving the Classification Accuracy

Toward naive Bayes with attribute value weighting

Article 28 February 2018

Notes

Specifically, p r o b _B(X ≤ c e l l.C ₁) = c o n f i d e n c e. This will be explained in more detail in the next paragraphs.
The GSL library has been used for computing CDFBINOM: https://www.gnu.org/software/gsl
Technically, this is done for all cells, but also, all cell groupings, or hiper-cells, with a paralelepide (rectangular-like) shape. The reason is that in some cases, the region that contains the information is larger than a single cell.
http://keel.es/
http://archive.ics.uci.edu/ml/
I R is the ratio between the number of instances of the majority class vs. the minority class.

References

Alcalá-Fdez J, Fernández A, Luengo J, Derrac J, García S, Sánchez L, Herrera F (2011) Keel data-mining software tool: Data set repository, integration of algorithms and experimental analysis framework. Journal of Multiple-Valued Logic & Soft Computing, 17
Asuncion A, Newman D (2007) Uci machine learning repository
Ben-Bassat M (1982) Pattern recognition and reduction of dimensionality. Handb Stat 2:773–910
Article MathSciNet Google Scholar
Bennasar M, Hicks Y, Setchi R (2015) Feature selection using joint mutual information maximisation. Expert Syst Appl 42(22):8520–8532
Article Google Scholar
Das S (2001) Filters, wrappers and a boosting-based hybrid for feature selection ICML, vol 1. Citeseer, pp 74–81
Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30
MathSciNet MATH Google Scholar
Forman G (2003) An extensive empirical study of feature selection metrics for text classification. J Mach Learn Res 3:1289– 1305
MATH Google Scholar
Guyon I, Elisseeff A (2003) An introduction to variable and feature selection. J Mach Learn Res 3:1157–1182
MATH Google Scholar
Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH (2009) The weka data mining software: an update. ACM SIGKDD Explorations Newsl 11(1):10–18
Article Google Scholar
Hall MA (1999) Correlation-based feature selection for machine learning. PhD thesis, The University of Waikato
Hall MA, Smith LA (1997) Feature subset selection: a correlation based filter approach International conference on neural information processing and intelligent information systems, pp 855–858
Google Scholar
Inza I, Larrañaga P, Blanco R, Cerrolaza AJ (2004) Filter versus wrapper gene selection approaches in dna microarray domains. Artif Intell Med 31(2):91–103
Article Google Scholar
Karegowda AG, Manjunath AS, Jayaram MA (2010) Comparative study of attribute selection using gain ratio and correlation based feature selection. Int J Inform Technol Knowl Manag 2(2):271–277
Google Scholar
Kira K, Rendell LA (1992) The feature selection problem: Traditional methods and a new algorithm AAAI, pp 129– 134
Google Scholar
Kohavi R, John GH (1998) The wrapper approach Feature extraction, construction and selection. Springer, pp 33–50
Kononenko I, Šximec E, Robnik-Šikonja M (1997) Overcoming the myopia of inductive learning algorithms with relieff. Appl Intell 7(1):39–55
Article Google Scholar
Liu H, Setiono R (1995) Chi2: Feature selection and discretization of numeric attributes Proceedings of the seventh IEEE international conference on tools with artificial intelligence , p 388
Google Scholar
Liu H, Yu L (2005) Toward integrating feature selection algorithms for classification and clustering. IEEE Trans Knowl Data Eng 17(4):491–502
Article MathSciNet Google Scholar
Liu H, Sun J, Liu L, Zhang H (2009) Feature selection with dynamic mutual information. Pattern Recogn 42(7):1330– 1339
Article MATH Google Scholar
Liu H, Xindong W, Zhang S (2014) A new supervised feature selection method for pattern classification. Comput Intell 30(2):342–361
Article MathSciNet Google Scholar
Loughrey J, Cunningham P (2005) Overfitting in wrapper-based feature subset selection: The harder you try the worse it gets Research and development in intelligent systems XXI. Springer, pp 33–43
Michalski RS, Carbonell JG, Mitchell TM (2013) Machine learning: An artificial intelligence approach. Springer Science & Business Media
Reunanen J (2003) Overfitting in making comparisons between variable selection methods. J Mach Learn Res 3:1371–1382
MATH Google Scholar
Saeys Y, Inza I, larrañaga P (2007) A review of feature selection techniques in bioinformatics. Bioinformatics 23(19):2507–2517
Article Google Scholar
Somol P, Baesens B, Pudil P, Vanthienen J (2005) Filter-versus wrapper-based feature selection for credit scoring. Int J Intell Syst 20(10):985–999
Article Google Scholar
Uysal AK (2016) An improved global feature selection scheme for text classification. Expert Syst Appl 43:82–92
Article Google Scholar
Xing EP, Jordan MI, Karp RM et al (2001) Feature selection for high-dimensional genomic microarray data ICML, vol 1, pp 601–608
Lei Y, Liu H (2003) Feature selection for high-dimensional data A fast correlation-based filter solution ICML, vol 3, pp 856–863
Lei Y, Liu H (2004) Efficient feature selection via analysis of relevance and redundancy. J Mach Learn Res 5:1205–1224
MathSciNet MATH Google Scholar

Download references

Acknowledgments

The authors acknowledge financial support granted by the Spanish Ministry of Science under contract ENE2014-56126-C2-2-R.

Author information

Authors and Affiliations

Computer Science Department, Carlos III University, Madrid, Spain
Ricardo Martín, Ricardo Aler & Inés M. Galván

Authors

Ricardo Martín
View author publications
You can also search for this author in PubMed Google Scholar
Ricardo Aler
View author publications
You can also search for this author in PubMed Google Scholar
Inés M. Galván
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ricardo Aler.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Martín, R., Aler, R. & Galván, I.M. A filter attribute selection method based on local reliable information. Appl Intell 48, 35–45 (2018). https://doi.org/10.1007/s10489-017-0959-3

Download citation

Published: 17 June 2017
Issue Date: January 2018
DOI: https://doi.org/10.1007/s10489-017-0959-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A filter attribute selection method based on local reliable information

Abstract

Access this article

Similar content being viewed by others

Addressing Low Dimensionality Feature Subset Selection: ReliefF(-k) or Extended Correlation-Based Feature Selection(eCFS)?

A Novel LtR and RtL Framework for Subset Feature Selection (Reduction) for Improving the Classification Accuracy

Toward naive Bayes with attribute value weighting

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A filter attribute selection method based on local reliable information

Abstract

Access this article

Similar content being viewed by others

Addressing Low Dimensionality Feature Subset Selection: ReliefF(-k) or Extended Correlation-Based Feature Selection(eCFS)?

A Novel LtR and RtL Framework for Subset Feature Selection (Reduction) for Improving the Classification Accuracy

Toward naive Bayes with attribute value weighting

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation